The Elastic (ELK) Stack is an open-source toolset for collecting, parsing, storing, and visualizing logs in one place. "ELK" stands for Elasticsearch (stores and searches the data), Logstash (ingests and transforms it), and Kibana (the web UI for search and dashboards); the modern stack adds Beats and the unified Elastic Agent as lightweight shippers. To parse logs, raw lines flow through a pipeline — Filebeat collects them, a grok-based filter (in Logstash or an Elasticsearch ingest pipeline) splits each line into structured fields like timestamp, host, and severity, and Elasticsearch indexes the result so you can search and chart it in Kibana within seconds.
Key takeaways
- ELK = Elasticsearch + Logstash + Kibana, plus Beats / Elastic Agent for shipping logs from each host.
- The typical pipeline is Filebeat → (Logstash, optional) → Elasticsearch → Kibana.
- Parsing turns unstructured log lines into fields using grok, date, and mutate processors — run them in Logstash or in a lighter Elasticsearch ingest pipeline when you don't need Logstash.
- Elasticsearch listens on port 9200, Kibana on 5601; since the 8.x line, TLS and authentication are on by default — never expose 9200 to the public internet.
- Elastic Stack 9.x (9.4 in 2026) is current; 8.x is still widely deployed. After Elastic's 2021 relicensing, AWS forked OpenSearch (Apache-2.0); Grafana Loki is a lighter, label-indexed alternative.
- Use data streams + index templates + ILM to roll over and expire log indices automatically.
What is the ELK Stack?
The ELK Stack — now often called the Elastic Stack — is a set of components that work together to centralize logs and other machine data:
- Elasticsearch — a distributed search and analytics engine built on Apache Lucene. It stores your logs as JSON documents and makes them searchable in near real time.
- Logstash — a server-side data-processing pipeline with an input → filter → output model. It reads from many sources, parses and enriches events, then ships them onward.
- Kibana — the visualization layer. Use Discover to search raw logs and build dashboards, alerts, and saved searches.
- Beats — single-purpose, lightweight shippers (Filebeat for logs, Metricbeat for metrics, and others) that run on each host.
- Elastic Agent + Fleet — the modern, unified shipper that replaces juggling several Beats. One agent collects logs, metrics, and traces and is managed centrally from Kibana via Fleet.
| Component | Role | Default port |
|---|---|---|
| Elasticsearch | Stores, indexes, and searches log data | 9200 (HTTP API) |
| Logstash | Ingests and transforms events (input → filter → output) | 5044 (Beats input) |
| Kibana | Web UI: Discover, dashboards, alerting | 5601 |
| Beats / Elastic Agent | Collect and ship logs/metrics from each host | → 5044 or 9200 |
How the ELK log pipeline works
A log line's journey from disk to dashboard follows a consistent path:
- Collect — Filebeat (or Elastic Agent) tails files such as
/var/log/syslogon each server. - Transform (optional) — for heavy parsing, enrichment, or routing, events pass through Logstash on port 5044. Skip it when an ingest pipeline is enough.
- Store & index — events land in Elasticsearch (port 9200), which parses them into fields and indexes them.
- Visualize — open Kibana on port 5601 to search in Discover and build dashboards.
So the canonical flow is Filebeat → (Logstash optional) → Elasticsearch → Kibana. The big architectural decision is where you parse: in Logstash, or in a lighter Elasticsearch ingest pipeline. We cover both below.
Quick start: Elasticsearch + Kibana with Docker
The fastest way to try the stack locally is a single-node Docker Compose file. Elastic Stack 9.x is current (9.4 as of 2026), and 8.x remains widely deployed — both behave the same here. Since the 8.x line, security (TLS + authentication) is enabled by default, so the compose below sets a password and binds ports to localhost only.
# docker-compose.yml — single-node Elasticsearch + Kibana for local testing.
# Pin to the latest stable tag: 9.x (e.g. 9.4.2) is current in 2026; 8.x also works.
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:9.0.0
environment:
- discovery.type=single-node
- ELASTIC_PASSWORD=${ELASTIC_PASSWORD} # set this in a .env file
- xpack.security.enabled=true # on by default since 8.x
- ES_JAVA_OPTS=-Xms1g -Xmx1g
ports:
- "127.0.0.1:9200:9200" # bind to localhost ONLY — never expose 9200 publicly
volumes:
- esdata:/usr/share/elasticsearch/data
kibana:
image: docker.elastic.co/kibana/kibana:9.0.0
depends_on: [elasticsearch]
environment:
- ELASTICSEARCH_HOSTS=https://elasticsearch:9200
- ELASTICSEARCH_USERNAME=kibana_system
- ELASTICSEARCH_PASSWORD=${KIBANA_PASSWORD}
ports:
- "127.0.0.1:5601:5601" # Kibana UI
volumes:
esdata:Bring the stack up, then set the kibana_system password and copy out the auto-generated HTTP CA certificate so clients (Beats, Logstash, curl) can verify TLS:
# Start the stack (run from the folder with docker-compose.yml and .env)
docker compose up -d
# First run only: set the kibana_system password, then put it in KIBANA_PASSWORD
docker exec -it elasticsearch bin/elasticsearch-reset-password -u kibana_system -b
# Copy the HTTP CA cert so shippers and curl can verify the TLS connection
docker cp elasticsearch:/usr/share/elasticsearch/config/certs/http_ca.crt .
# Smoke test (https + the CA; 9200 is bound to localhost only)
curl --cacert http_ca.crt -u elastic:$ELASTIC_PASSWORD https://localhost:9200Security checklist: keep xpack.security enabled, put Elasticsearch behind a reverse proxy or private network, restrict ports 9200/5601 with a firewall, and rotate the built-in passwords. Treat the HTTP CA certificate as the trust anchor for every shipper. For ongoing patching, backups, and hardening of stacks like this, see our server maintenance and DevOps services.
Shipping logs with Filebeat or Elastic Agent
Filebeat is the classic, lightweight way to tail log files and forward them. In the 8.x/9.x line, use the filestream input (the older log input is deprecated). Point it straight at Elasticsearch for simple cases, or at Logstash when you need heavy transforms. For new deployments, Elastic Agent + Fleet is the recommended replacement — one centrally managed agent for logs, metrics, and traces.
# filebeat.yml — tail system logs and ship them to Elasticsearch
filebeat.inputs:
- type: filestream # the 'log' input is deprecated; use filestream
id: syslog
paths:
- /var/log/syslog
- /var/log/auth.log
# Pre-built modules parse common formats (nginx, system, postgres, ...)
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: false
output.elasticsearch:
hosts: ["https://localhost:9200"]
username: "elastic"
password: "${ELASTIC_PASSWORD}"
ssl.certificate_authorities: ["/etc/filebeat/certs/http_ca.crt"]
# --- OR send to Logstash when you need heavy parsing/routing ---
# output.logstash:
# hosts: ["localhost:5044"]Parsing logs: Logstash filters vs. ingest pipelines
Parsing means turning an unstructured line like Mar 25 22:14:01 web01 sshd[2931]: Accepted password for deploy into typed fields (timestamp, host, program, pid, message). The Elastic Stack gives you two places to do this.
Option A — Logstash filters
Logstash is the most powerful option: an input → filter → output pipeline where the filter block does the parsing. The three workhorses are grok (regex-based pattern matching into named fields), date (parse the log's own timestamp into @timestamp), and mutate (rename, convert, or drop fields).
# /etc/logstash/conf.d/syslog.conf
input {
beats {
port => 5044 # Filebeat / Elastic Agent connect here
}
}
filter {
grok {
match => {
"message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_host} %{DATA:program}(?:\[%{POSINT:pid}\])?: %{GREEDYDATA:syslog_message}"
}
}
date {
match => ["syslog_timestamp", "MMM d HH:mm:ss", "MMM d HH:mm:ss"]
target => "@timestamp" # use the log's own time, not ingest time
}
mutate {
remove_field => ["message"] # drop the raw line once it is parsed
}
}
output {
elasticsearch {
hosts => ["https://localhost:9200"]
user => "elastic"
password => "${ELASTIC_PASSWORD}"
ssl_enabled => true
ssl_certificate_authorities => ["/etc/logstash/certs/http_ca.crt"]
data_stream => "true" # write to a managed data stream
}
}%{SYSLOGTIMESTAMP:...} uses one of Logstash's built-in grok patterns; you can combine dozens of them or write your own. For a deep dive into grok, conditionals, and multi-source pipelines, read Understanding Logstash parsing configurations and options. To wire up the sending node and explore data in Kibana, continue with ELK Stack for parsing your logs — Part 2.
Option B — Elasticsearch ingest pipelines (lighter)
When you don't need Logstash's routing or buffering, define a grok ingest pipeline inside Elasticsearch itself. It runs the same grok / date / remove processors on the node, so Filebeat or Elastic Agent can write straight to Elasticsearch with no extra service to operate. Paste this into Kibana → Dev Tools:
# Kibana → Dev Tools console
PUT _ingest/pipeline/syslog
{
"description": "Parse syslog lines into fields",
"processors": [
{
"grok": {
"field": "message",
"patterns": ["%{SYSLOGTIMESTAMP:syslog_timestamp} %{SYSLOGHOST:syslog_host} %{DATA:program}(?:\\[%{POSINT:pid}\\])?: %{GREEDYDATA:syslog_message}"]
}
},
{ "date": { "field": "syslog_timestamp", "formats": ["MMM d HH:mm:ss", "MMM d HH:mm:ss"], "target_field": "@timestamp" } },
{ "remove": { "field": "message" } }
]
}Retention: data streams, index templates & ILM
Logs grow fast, so don't write to one ever-growing index. Use a data stream (an append-only, auto-rolling abstraction) governed by an index template and an Index Lifecycle Management (ILM) policy. ILM rolls indices over by age or size and deletes them when they expire — automating retention without cron jobs.
# Kibana → Dev Tools console
# 1) Retention policy: roll over weekly or at 50 GB, delete after 30 days
PUT _ilm/policy/logs-retention
{
"policy": {
"phases": {
"hot": { "actions": { "rollover": { "max_age": "7d", "max_primary_shard_size": "50gb" } } },
"delete": { "min_age": "30d", "actions": { "delete": {} } }
}
}
}
# 2) Index template that creates a data stream using that policy
PUT _index_template/logs-syslog
{
"index_patterns": ["logs-syslog-*"],
"data_stream": {},
"template": {
"settings": { "index.lifecycle.name": "logs-retention" }
}
}ELK vs. OpenSearch vs. Grafana Loki
The licensing history matters when you pick a stack. In January 2021, Elastic relicensed Elasticsearch and Kibana from Apache 2.0 to a dual SSPL / Elastic License 2.0 model. In response, AWS forked the last Apache-2.0 release (Elasticsearch 7.10) into OpenSearch, which stays Apache-2.0. In 2024, Elastic added AGPL 3.0 as a third license option, making Elasticsearch OSI-approved open source again. Separately, Grafana Loki takes a different approach — it indexes only labels, not full log content, which makes it cheaper for very high log volumes.
| Elastic (ELK) Stack | OpenSearch | Grafana Loki | |
|---|---|---|---|
| License | Elastic License 2.0 / SSPL / AGPL (2024) | Apache 2.0 | AGPL 3.0 |
| Origin | Elastic | AWS fork of Elasticsearch 7.10 (2021) | Grafana Labs |
| Indexing | Full content indexed | Full content indexed | Labels only; logs stored compressed |
| UI / query | Kibana | OpenSearch Dashboards | Grafana + LogQL |
| Best for | Rich search, APM, security analytics | Open-source, ELK-compatible drop-in | Cheap, high-volume log storage |
Which should you choose?
- Stick with the Elastic Stack for the richest feature set — full-text search, APM, ML-based anomaly detection, and SIEM/security analytics in one product.
- Choose OpenSearch if a permissive Apache-2.0 license or AWS-managed hosting (Amazon OpenSearch Service) is a hard requirement.
- Reach for Grafana Loki when you mostly need cheap, high-volume log storage and already live in Grafana.
For most teams centralizing application and server logs, the ELK Stack with Filebeat / Elastic Agent → ingest pipeline → data stream + ILM is the fastest path to searchable, structured logs. Add Logstash only when your parsing or routing outgrows ingest pipelines. If error tracking is your priority rather than raw logs, pair this with a tool like Sentry — see setting up Sentry for error and performance monitoring.
Frequently Asked Questions
What is the ELK Stack used for?
The ELK Stack centralizes logs and machine data from many servers into one searchable place. Teams use it for troubleshooting, log analytics, application and infrastructure monitoring, and security analysis. Elasticsearch stores and searches the data, Logstash and Beats ingest and parse it, and Kibana visualizes it in dashboards.
Do I still need Logstash, or can Elasticsearch parse logs on its own?
You don't always need Logstash. Elasticsearch ingest pipelines run the same grok, date, and mutate-style processors on the node itself, so Filebeat or Elastic Agent can write parsed logs directly to Elasticsearch. Add Logstash when you need heavy transforms, buffering, routing to multiple destinations, or input protocols that Beats don't cover.
What is grok and how does it parse log lines?
Grok is a pattern-matching syntax built on named regular expressions. A pattern such as %{SYSLOGTIMESTAMP:timestamp} matches part of a log line and assigns it to a field. Chaining grok patterns turns an unstructured line into structured fields you can search, filter, and chart. Grok runs in both Logstash filters and Elasticsearch ingest pipelines.
What ports do Elasticsearch and Kibana use?
Elasticsearch exposes its HTTP REST API on port 9200 (and uses 9300 for node-to-node transport). Kibana serves its web UI on port 5601. When Logstash receives data from Beats, it listens on 5044 by default. Since the 8.x line, TLS and authentication are enabled by default, so always bind 9200 to a private interface.
ELK Stack vs OpenSearch — which should I choose?
They share a common ancestor (Elasticsearch 7.10). Choose the Elastic Stack for the broadest feature set and newest capabilities; choose OpenSearch if you need a fully Apache-2.0 licensed project or want Amazon's managed OpenSearch Service. Both ingest logs from Beats/Filebeat and offer a Kibana-style UI.
How long should I keep logs, and how do I enforce it?
Retention depends on your compliance and cost needs, but you enforce it with Index Lifecycle Management (ILM). An ILM policy rolls indices over by age or size and deletes them after a set period (for example, delete after 30 days). Attach the policy to a data stream through an index template so retention happens automatically.