Logstash Parsing & Grok Configuration: A Complete Guide

Blog / Server Management · July 11, 2017 · Updated June 10, 2026 · 8 min read
Logstash Parsing & Grok Configuration: A Complete Guide

Logstash parsing is the process of turning raw, unstructured log lines into structured fields using a three-stage pipeline: input (where events come from), filter (where you parse and transform them), and output (where they go, usually Elasticsearch). The grok filter is the workhorse of that middle stage — it matches text against named patterns like %{TIMESTAMP_ISO8601:timestamp} and pulls each piece into its own field. This guide covers grok, the faster dissect alternative, the other core filters, and how Logstash fits into a modern (2026) logging stack.

Key takeaways

  • A Logstash config has three sections — input {}, filter {}, and output {} — and parsing happens in the filter block.
  • grok extracts fields with %{SYNTAX:semantic} patterns; reach for it when log formats vary.
  • dissect is faster and simpler for fixed, delimiter-based logs — prefer it when the layout never changes.
  • Chain mutate, date, json, kv, and geoip to clean and enrich events, and use if [field] conditionals to branch.
  • A failed grok match adds a _grokparsefailure tag instead of dropping the event — always handle it.
  • In 2026, Elastic Agent or Beats with Elasticsearch ingest pipelines replace standalone Logstash for simple cases; Logstash still wins for heavy transformation.

How does the Logstash pipeline work?

Every Logstash configuration is divided into three sections, and events flow through them in order — input to filter to output:

input {
  # where events are read from
}

filter {
  # how events are parsed and transformed
}

output {
  # where parsed events are sent
}
  • input — defines where events are read from. Common plugins: file, beats (the modern replacement for the old lumberjack input), kafka, syslog, http, and elasticsearch.
  • filter — parses and transforms each event. This is where grok, dissect, mutate, and date live.
  • output — decides what happens to parsed events. Common plugins: elasticsearch, stdout (handy for debugging), kafka, s3, and file.

If you split your config across several files in a directory, Logstash concatenates them in alphabetical order into one pipeline — so a filter declared in one file applies to inputs declared in another. Name files with numeric prefixes (10-input.conf, 30-filter.conf, 90-output.conf) to control that ordering deliberately.

What does the grok filter do?

grok parses arbitrary text into fields by matching it against patterns. The syntax is %{SYNTAX:semantic}, where SYNTAX is the name of a built-in pattern (a named regular expression) and semantic is the field name the captured value is stored under.

Logstash ships with 120+ built-in patterns. Frequently used ones include IP, IPORHOST, NUMBER, INT, WORD, USERNAME, TIMESTAMP_ISO8601, HTTPDATE, DATA, GREEDYDATA, and the all-in-one COMBINEDAPACHELOG for Apache/Nginx access logs.

Here is a filter that parses a syslog line:

filter {
  grok {
    match => {
      "message" => "%{TIMESTAMP_ISO8601:syslog_timestamp} %{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:\[%{POSINT:syslog_pid}\])?: %{GREEDYDATA:syslog_message}"
    }
  }
}

Given this event:

May 18 11:24:30 jagadeesh-pc /usr/lib/gdm3/gdm-x-session[8693]: Successfully activated service 'org.gnome.Terminal'

grok extracts these fields:

syslog_timestamp => May 18 11:24:30
syslog_hostname  => jagadeesh-pc
syslog_program   => /usr/lib/gdm3/gdm-x-session
syslog_pid       => 8693
syslog_message   => Successfully activated service 'org.gnome.Terminal'

How do I write a custom grok pattern?

When no built-in pattern fits, define your own. You can inline a named regex with Oniguruma syntax — (?<field_name>the regex here) — or keep reusable patterns in a file and point grok at it with patterns_dir:

# ./patterns/custom  (one "NAME regex" per line)
APP_ID [A-Z]{3}-[0-9]{4}

# logstash filter
filter {
  grok {
    patterns_dir => ["./patterns"]
    match => { "message" => "%{APP_ID:app_id} %{GREEDYDATA:msg}" }
  }
}

Other useful grok options: add_field and add_tag attach extra metadata, while overwrite replaces an existing field with a captured value instead of creating a new one:

grok {
  match     => { "message" => "%{SYSLOGBASE} %{GREEDYDATA:message}" }
  overwrite => [ "message" ]
}

What is the _grokparsefailure tag?

When a grok pattern doesn't match a line, Logstash doesn't drop the event — it adds the tag _grokparsefailure so you can find the problem later. A flood of this tag in Kibana means your pattern is wrong or your logs changed format. Route failures somewhere separate so they don't pollute your main index:

output {
  if "_grokparsefailure" in [tags] {
    file { path => "/var/log/logstash/grok-failures.log" }
  } else {
    elasticsearch {
      hosts => ["localhost:9200"]
      index => "logs-%{+YYYY.MM.dd}"
    }
  }
}

Build patterns iteratively in the Grok Debugger, found under Kibana → Dev Tools → Grok Debugger. Paste a sample line and your pattern, and it shows the captured fields live — far faster than restarting Logstash to test each change.

grok vs dissect: which should you use?

dissect is a second parsing filter that splits a line by delimiters and positions instead of regular expressions. Because it does no regex backtracking, it is significantly faster and uses less CPU — but it only works when the log layout is fixed. Reach for dissect when your format never changes (most application and access logs), and keep grok for messy or variable input.

grok dissect
Matching method Named regular expressions Delimiters / fixed positions
Performance Slower (regex backtracking) Faster, low CPU
Best for Variable / unpredictable formats Fixed, structured layouts
Missing fields Tolerant (optional patterns) Brittle — layout must match exactly
Learning curve Steeper (pattern library) Simple (%{field} plus literals)

A dissect mapping uses %{} placeholders separated by the exact literal characters from the log line (%{+field} appends to a previous capture):

filter {
  dissect {
    mapping => {
      "message" => "%{ts} %{+ts} %{level} [%{thread}] %{logger}: %{msg}"
    }
  }
}

Which Logstash filter plugins do what?

Parsing is rarely a single filter. A real pipeline chains several together — grok or dissect to extract fields, then date, mutate, and enrichment filters to clean up and add context. The most common ones:

Filter What it does
grok Extract fields from unstructured text via regex patterns
dissect Extract fields from fixed-format text via delimiters (faster)
mutate Rename fields, convert types, gsub, lowercase, and remove_field
date Parse a timestamp string into the event's @timestamp
json Parse a JSON string field into structured fields
kv Split key=value pairs into individual fields
geoip Add geographic location from an IP address
useragent Break a User-Agent string into browser, OS, and device

The date filter deserves special attention. By default @timestamp is the time Logstash received the event, not when it occurred. Always parse the real timestamp out of your log and into @timestamp, or every time-based query in Kibana will be wrong:

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }

  date {
    match  => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
    target => "@timestamp"
  }

  mutate {
    convert      => { "response" => "integer" "bytes" => "integer" }
    remove_field => [ "timestamp", "message" ]
  }

  geoip {
    source => "clientip"
  }

  if [response] >= 500 {
    mutate { add_tag => [ "server_error" ] }
  }
}

How do you run multiple pipelines and tune performance?

A single Logstash instance can run several independent pipelines, each with its own inputs, filters, and outputs. Define them in pipelines.yml (in your Logstash config directory) so they keep separate event flows instead of being concatenated into one:

# pipelines.yml
- pipeline.id: apache
  path.config: "/etc/logstash/conf.d/apache/*.conf"
  pipeline.workers: 4

- pipeline.id: syslog
  path.config: "/etc/logstash/conf.d/syslog/*.conf"
  queue.type: persisted
  • pipeline.workers sets how many parallel threads run the filter and output stages — it defaults to the number of CPU cores. Raise it for filter-heavy pipelines.
  • Persistent queues (queue.type: persisted) buffer events to disk so a crash or restart doesn't lose in-flight data — a safer choice for production than the default in-memory queue.

For a broader walkthrough of standing up the whole stack, see our getting-started guide to parsing logs with the ELK stack and its follow-up on more advanced patterns.

Is Logstash still the right tool in 2026?

The licensing picture changed, and it affects your stack choice:

  • In 2021, Elastic relicensed Elasticsearch, Kibana, and Logstash from Apache 2.0 to the SSPL (a source-available, non-OSI license). AWS responded by forking the projects into OpenSearch, with OpenSearch Data Prepper as its Logstash-equivalent ingest tool.
  • In 2024, Elastic added the AGPLv3 as an additional license option, restoring an OSI-approved choice for teams that need one.
  • Lightweight, vendor-neutral collectors — Fluentd, Fluent Bit, and Vector — have become popular Logstash alternatives, especially on Kubernetes.

Just as important: for many simple pipelines, Elastic Agent or Beats shipping to Elasticsearch ingest pipelines now does the parsing work that used to need a standalone Logstash node — with less infrastructure to run. Logstash still earns its place when you need heavy transformation, fan-out to many outputs, disk-backed buffering with persistent queues, or protocols Beats can't speak. Choose deliberately rather than out of habit.

If you'd rather hand off running and tuning this stack, our server maintenance services team has managed log pipelines and Elasticsearch clusters across 50+ projects since 2014. To lock down the cluster itself, see our walkthrough on securing an Elasticsearch instance and our checklist of Elasticsearch security measures.

Frequently Asked Questions

What is the difference between grok and dissect in Logstash?

grok matches log lines with named regular expressions, so it handles variable and messy formats but costs more CPU. dissect splits lines by fixed delimiters and positions, making it much faster but only suitable when the log layout never changes. Use dissect for predictable formats and grok for everything else.

How do I fix a _grokparsefailure tag?

That tag means your grok pattern didn't match the line. Copy a failing sample into the Grok Debugger (Kibana → Dev Tools), adjust the pattern until every field captures cleanly, then redeploy. Common causes are an extra space, a changed log format, or the wrong built-in pattern — for example using INT where you needed NUMBER.

Where can I test grok patterns?

Use the built-in Grok Debugger under Kibana → Dev Tools → Grok Debugger. Paste a sample line and your pattern, and it shows the extracted fields instantly, which is far quicker than restarting Logstash to test each change. Many developers also keep a small set of sample lines on hand to validate patterns before deploying.

Do I still need Logstash if I use Beats or Elastic Agent?

Often no. Beats and Elastic Agent can send data straight to Elasticsearch and parse it with ingest pipelines, which covers many simple cases without a Logstash node. Keep Logstash when you need heavy transformation, multiple outputs, disk-backed persistent queues, or inputs that Beats doesn't support.

Is Logstash free and open source in 2026?

Logstash moved from Apache 2.0 to the source-available SSPL in 2021, and Elastic added AGPLv3 as an option in 2024, so you can still use it at no cost under those terms. If you specifically need an OSI/Apache-style license, look at the OpenSearch fork (Data Prepper), or vendor-neutral collectors like Fluentd, Fluent Bit, and Vector.

How do I set @timestamp from a field in my log?

Use the date filter. Point it at the field that holds the real event time and give it a format string that matches — for example match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]. Without this step, @timestamp defaults to when Logstash received the event, which throws off every time-based query and dashboard in Kibana.

Share this article