Logs & Monitoring -- ELK Stack, Loki, Prometheus & Grafana

DodaTech Updated 2026-06-22 6 min read

In this tutorial, you'll learn about Logs & Monitoring. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Centralized logs and monitoring with the ELK Stack, Grafana Loki, Prometheus, and Grafana provides complete Observability over infrastructure, applications, and user-facing services through logs, metrics, and traces.

What You'll Learn

In this tutorial, you will learn how to deploy and configure a full Observability stack combining Elasticsearch-Logstash-Kibana for logs, Prometheus for metrics, Grafana Loki for log aggregation, and Grafana for unified dashboards and alerting.

Why It Matters

When a production system goes down, you need answers fast. Logs tell you what happened, metrics tell you when it started, and traces tell you where the failure originated. Without a centralized monitoring stack, debugging is like finding a needle in a haystack blindfolded.

Real-World Use

Durga Antivirus Pro uses a four-layer Observability stack: Filebeat ships scan logs to Loki, Prometheus collects scan throughput metrics, Grafana visualizes both, and Alertmanager pages the on-call engineer if scan failure rates exceed 1% in any 5-minute window.

Observability Stack Architecture

flowchart TD
    A[Application Logs] --> B[Filebeat]
    B --> C{Log Storage}
    C -->|Full-text search| D[Elasticsearch]
    C -->|Cost-effective| E[Grafana Loki]
    D --> F[Kibana]
    E --> G[Grafana]
    H[System Metrics] --> I[Prometheus Node Exporter]
    I --> J[Prometheus Server]
    J --> G
    G --> K[Alertmanager]
    K --> L["PagerDuty / Slack / Email"]

Deploying the ELK Stack with Docker Compose

Run Elasticsearch, Logstash, and Kibana in Docker:

version: "3.8"
services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
    environment:
      - discovery.type=single-node
      - xpack.security.enabled=false
    ports:
      - "9200:9200"
    volumes:
      - es_data:/usr/share/elasticsearch/data

  logstash:
    image: docker.elastic.co/logstash/logstash:8.12.0
    ports:
      - "5000:5000"
    volumes:
      - ./logstash.conf:/usr/share/logstash/pipeline/logstash.conf
    depends_on:
      - elasticsearch

  kibana:
    image: docker.elastic.co/kibana/kibana:8.12.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    depends_on:
      - elasticsearch

volumes:
  es_data:

Expected behavior: Elasticsearch starts on port 9200, Kibana on port 5601. Logstash listens on port 5000 for incoming logs and forwards them to Elasticsearch.

Logstash Configuration

Parse incoming logs with Grok patterns:

input {
  beats {
    port => 5000
  }
}

filter {
  grok {
    match => {
      "message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message_text}"
    }
  }
  date {
    match => ["timestamp", "ISO8601"]
  }
  mutate {
    remove_field => ["message"]
  }
}

output {
  elasticsearch {
    hosts => ["elasticsearch:9200"]
    index => "app-logs-%{+YYYY.MM.dd}"
  }
}

Expected behavior: Incoming log lines are parsed into structured fields (timestamp, level, message_text) and indexed daily in Elasticsearch. Raw message field is removed after parsing.

Querying Logs with PromQL and LogQL

Monitor log error rates with Prometheus and search logs with Loki:

# PromQL: rate of 5xx errors
sum(rate(http_requests_total{status=~"5.."}[5m])) / sum(rate(http_requests_total[5m])) * 100

# LogQL: find all error logs in the last hour
{app="web-server"} |= "ERROR" | json | timestamp > now() - 1h

Expected output: The PromQL query returns a percentage value (e.g., 0.5) representing the current 5xx error rate. The LogQL query returns all log lines from the web-server app containing "ERROR" in the last hour.

Unified Grafana Dashboard

Create a dashboard panel mixing Prometheus metrics and Loki logs:

// Grafana panel JSON model (simplified)
{
  "panels": [{
    "title": "Error Rate & Logs",
    "type": "row",
    "panels": [
      { "type": "graph", "datasource": "Prometheus", "targets": [
        { "expr": "rate(http_requests_total{status=~\"5..\"}[5m])" }
      ]},
      { "type": "logs", "datasource": "Loki", "targets": [
        { "expr": "{app=\"web-server\"} |= \"ERROR\"" }
      ]}
    ]
  }]
}

Expected behavior: The Grafana dashboard shows a real-time graph of error rates above a live log stream. Clicking a spike on the graph auto-filters the log panel to matching timestamps.

Tool Comparison

Feature	ELK Stack	Grafana Loki	Prometheus	Datadog
Primary data	Logs	Logs	Metrics	All three
Storage engine	Elasticsearch	Object store	TSDB	Proprietary
Query language	ES DSL / KQL	LogQL	PromQL	Proprietary
Retention cost	High	Low (object store)	Medium	High
Alerting	Elastic Alerting	Ruler	Alertmanager	Built-in
Self-hostable	Yes	Yes	Yes	No
Learning curve	Steep	Medium	Medium	Low

Common Errors

1. Elasticsearch Heap Exhaustion

Elasticsearch defaults to 1GB heap. With heavy log volume, the JVM runs out of memory and the cluster becomes unresponsive. Set ES_JAVA_OPTS="-Xms4g -Xmx4g" for production.

2. Logstash Grok Pattern Not Matching

Grok patterns are case-sensitive and must exactly match the log format. Use the Kibana Grok Debugger to test patterns against sample log lines before deploying.

3. Prometheus Scrape Target Not Reachable

If Prometheus cannot reach a target, check network policy, firewall rules, and the /metrics endpoint. Use curl http://target:9100/metrics to verify the target is exposed.

4. Loki Log Stream Too Many Labels

Loki indexes labels. High-cardinality labels (user IDs, IPs) create too many streams and degrade query performance. Keep label cardinality under 10 per stream.

5. Grafana Dashboard Permission Issues

In multi-team setups, dashboard permissions must be configured per folder. Without explicit permissions, team members see all dashboards or cannot edit shared ones.

Practice Questions

1. What is the difference between Prometheus and Loki? Prometheus stores and queries metrics (numeric time-series data). Loki stores and queries logs (text-based event data). They are complementary, not competing tools.

2. How does Logstash parse incoming log lines? Logstash uses Grok patterns in filter plugins to match log formats and extract structured fields like timestamp, log level, and message content.

3. Why is Loki more cost-effective than Elasticsearch for logs? Loki stores logs compressed in object storage and only indexes labels (metadata), not the full log content. Elasticsearch indexes full text, requiring more storage and memory.

4. What happens when Prometheus Alertmanager fires an alert? Alertmanager deduplicates, groups, and routes alerts to configured receivers (Slack, PagerDuty, email). It handles silencing, inhibition, and alert fatigue reduction.

5. Challenge: Deploy a complete Observability stack with ELK for log search, Loki for cost-effective log retention, Prometheus for metrics, and Grafana for unified dashboards. Feed sample application logs through Filebeat and verify that a Grafana dashboard shows both metrics and log panels with cross-panel filtering.

Mini Project

Build a production monitoring dashboard for a web application that combines Prometheus metrics (request rate, error rate, latency percentiles), Loki logs (ERROR-level entries with context), and Elasticsearch full-text search for historical log analysis. Configure Alertmanager to notify via Slack when error rate exceeds 5% for 5 consecutive minutes or when p99 latency exceeds 2 seconds.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

← Previous Data Visualization -- D3.js, Chart.js & Grafana Dashboards Next → Web Analytics Best Practices -- Metrics, Dashboards & Reporting

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Analytics