Skip to content

Grafana Dashboards — Visualization, Alerts & Dashboard as Code Guide

DodaTech Updated 2026-06-24 6 min read

In this tutorial, you'll learn about Grafana Dashboards. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Grafana is an open-source analytics and visualization platform that connects to data sources (Prometheus, Elasticsearch, InfluxDB, Loki) to create interactive dashboards, set up alerts, and provide ad-hoc exploration of metrics and logs.

What You'll Learn

Why It Matters

Raw metrics and logs are overwhelming. Grafana transforms them into actionable visualizations — line charts for trends, heatmaps for distributions, and status panels for SLOs. DodaTech's platform team has 200+ Grafana dashboards covering infrastructure, application, and business metrics, shared across engineering teams via provisioning and team folders.

Real-World Use

DodaZIP's on-call dashboard shows real-time request latency (P50/P95/P99), error rate, CPU/memory by pod, database query performance, and active alerts — all on a single screen. When an incident fires, the engineer opens this dashboard first, reducing mean time to triage by 60%.

flowchart TD
    A[Data Sources] --> B[Grafana Server]
    B --> C[Dashboards]
    B --> D[Alerting]
    B --> E[Explore Mode]
    C --> F[Time Series Panel]
    C --> G[Stat Panel]
    C --> H[Table Panel]
    C --> I[Bargauge Panel]
    D --> J[Alert Rules]
    J --> K[Contact Points]
    K --> L[PagerDuty]
    K --> M[Slack]
    K --> N[Email]
    style B fill:#F46800,color:#fff
â„šī¸ Info

Prerequisites: Prometheus or another data source configured. Basic familiarity with PromQL or LogQL.

Installation

# Install Grafana on Ubuntu
sudo apt-get update
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo wget -q -O /usr/share/keyrings/grafana.key https://packages.grafana.com/gpg.key
sudo apt-get update
sudo apt-get install -y grafana

# Start Grafana
sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl enable grafana-server

# Verify
sudo systemctl status grafana-server

# Expected output:
# ● grafana-server.service - Grafana instance
#    Loaded: loaded
#    Active: active (running)
#    Main PID: 12345 (grafana)
#     Tasks: 14 (limit: 4915)

# Access at http://localhost:3000 (admin/admin)

Dashboard as Code (Provisioning)

# /etc/grafana/provisioning/datasources/prometheus.yml
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
    editable: false
    jsonData:
      timeInterval: 30s
      queryTimeout: 60s
      httpMethod: POST

  - name: Loki
    type: loki
    access: proxy
    url: http://loki:3100
    jsonData:
      maxLines: 1000
# /etc/grafana/provisioning/dashboards/dashboards.yml
apiVersion: 1

providers:
  - name: DodaTech Infrastructure
    type: file
    updateIntervalSeconds: 30
    options:
      path: /etc/grafana/dashboards
      foldersFromFilesStructure: true

Dashboard JSON Model

{
  "title": "DodaTech API Overview",
  "uid": "dodatech-api-overview",
  "tags": ["dodatech", "api"],
  "timezone": "browser",
  "panels": [
    {
      "title": "Request Rate",
      "type": "timeseries",
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
      "targets": [
        {
          "expr": "sum by (status_class) (rate(http_requests_total{service=\"api\"}[5m]))",
          "legendFormat": "{{ status_class }}",
          "refId": "A"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "reqps",
          "min": 0,
          "custom": {
            "lineInterpolation": "smooth",
            "showPoints": "never"
          }
        },
        "overrides": [
          {
            "matcher": { "id": "byName", "options": "5xx" },
            "properties": [{ "id": "color", "value": { "fixed": "red" } }]
          }
        ]
      }
    },
    {
      "title": "P99 Latency",
      "type": "timeseries",
      "gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
      "targets": [
        {
          "expr": "histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket{service=\"api\"}[5m])))",
          "legendFormat": "p99"
        }
      ],
      "fieldConfig": {
        "defaults": { "unit": "s" }
      }
    },
    {
      "title": "Current Error Rate",
      "type": "stat",
      "gridPos": { "h": 4, "w": 4, "x": 0, "y": 8 },
      "targets": [
        {
          "expr": "sum(rate(http_requests_total{service=\"api\", status=~\"5..\"}[5m])) / sum(rate(http_requests_total{service=\"api\"}[5m])) * 100",
          "legendFormat": "Error %"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "thresholds": {
            "mode": "absolute",
            "steps": [
              { "color": "green", "value": null },
              { "color": "yellow", "value": 1 },
              { "color": "red", "value": 5 }
            ]
          }
        }
      }
    },
    {
      "title": "CPU Usage by Pod",
      "type": "bargauge",
      "gridPos": { "h": 8, "w": 12, "x": 0, "y": 12 },
      "targets": [
        {
          "expr": "topk(10, sum by (pod) (rate(container_cpu_usage_seconds_total{namespace=\"production\"}[5m])) * 100)",
          "legendFormat": "{{ pod }}"
        }
      ],
      "fieldConfig": {
        "defaults": {
          "unit": "percent",
          "min": 0,
          "max": 100
        }
      }
    }
  ],
  "refresh": "30s",
  "time": { "from": "now-6h", "to": "now" }
}

Alerting

# /etc/grafana/provisioning/alerting/contact-points.yml
apiVersion: 1

contactPoints:
  - name: PagerDuty Critical
    receivers:
      - uid: pagerduty-critical
        type: pagerduty
        settings:
          integrationKey: YOUR_PAGERDUTY_KEY
          severity: critical

  - name: Slack DevOps
    receivers:
      - uid: slack-devops
        type: slack
        settings:
          url: https://hooks.slack.com/services/T00/B00/XXX
          channel: '#devops-alerts'
          title: 'Grafana Alert: {{ .Message }}'
# /etc/grafana/provisioning/alerting/rules.yml
apiVersion: 1

groups:
  - name: API SLO Alerts
    interval: 30s
    rules:
      - uid: api_high_error_rate
        title: "API Error Rate Above 5%"
        condition: C
        data:
          - refId: A
            relativeTimeRange:
              from: 300
              to: 0
            datasourceUid: prometheus
            model:
              expr: sum(rate(http_requests_total{service="api", status=~"5.."}[5m])) / sum(rate(http_requests_total{service="api"}[5m])) * 100
          - refId: B
            relativeTimeRange:
              from: 0
              to: 0
            datasourceUid: __expr__
            model:
              type: math
              expression: $A > 5
          - refId: C
            relativeTimeRange:
              from: 0
              to: 0
            datasourceUid: __expr__
            model:
              type: reduce
              expression: $B
              reducer: last
        noDataState: NoData
        execErrState: Alerting
        for: 5m
        annotations:
          summary: "API error rate is {{ $values.A }}%"
        labels:
          severity: critical

Using Variables

{
  "title": "Service Overview",
  "templating": {
    "list": [
      {
        "name": "service",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(up, service)",
        "refresh": 1,
        "includeAll": true,
        "multi": true
      },
      {
        "name": "instance",
        "type": "query",
        "datasource": "Prometheus",
        "query": "label_values(up{service=\"$service\"}, instance)",
        "refresh": 1
      },
      {
        "name": "environment",
        "type": "custom",
        "options": [
          { "value": "production", "text": "Production" },
          { "value": "staging", "text": "Staging" }
        ],
        "current": { "value": "production" }
      }
    ]
  }
}

Annotations

{
  "title": "Deployments",
  "annotations": {
    "list": [
      {
        "name": "Deploy Events",
        "type": "dashboard",
        "builtIn": 0,
        "datasource": "Prometheus",
        "expr": "changes(deploy_timestamp[1m]) > 0",
        "iconColor": "blue",
        "enable": true,
        "showIn": 0
      },
      {
        "name": "Alert Events",
        "type": "dashboard",
        "builtIn": 0,
        "datasource": "Grafana",
        "enable": true,
        "showIn": 0,
        "iconColor": "red",
        "rawQuery": "SELECT alert_name, alert_severity FROM alert WHERE time > now() - 24h"
      }
    ]
  }
}

Common Configuration Mistakes

  1. Not setting min/max on panel axes: Auto-scaling axes hide trends. Set min: 0 for rates and max: 100 for percentages to maintain consistent visual baselines.

  2. Using too many series in one panel: Grafana struggles to render 50+ series. Use topk() or aggregation to reduce series count, or split into multiple panels.

  3. Ignoring dashboard refresh intervals: Setting refresh: 1s on a dashboard with expensive queries overloads the data source. Use 30s for infrastructure, 5s for critical app metrics.

  4. Not using variables for reusability: Hardcoded service names require duplicating panels. Template variables make one panel reusable across any service.

  5. Storing dashboard JSON in the database without version control: Dashboard-as-code (provisioning JSON files in Git) provides history, review, and rollback. The database-only approach loses all audit trail.

Practice Questions

  1. What panel types are available in Grafana? Answer: Time series, Stat, Table, Bar gauge, Gauge, Heatmap, State timeline, Logs, Pie chart, Candlestick, and many more via plugins.

  2. How does template variables make dashboards reusable? Answer: Variables let users filter by service, instance, environment, or any label without editing panels. One dashboard template works for any service.

  3. What is the difference between Grafana Alerting and data source alerting? Answer: Grafana Alerting (built-in) supports multi-dimensional alerts across any data source with a unified UI. Prometheus Alertmanager handles only Prometheus-based alerts.

  4. How do you manage dashboards as code? Answer: Export dashboard JSON, store it in a Git Repository, and configure Grafana provisioning to load dashboards from the filesystem. Use CI/CD to manage updates.

Challenge

Build a complete application monitoring dashboard: create a dashboard with time series panels for request rate and P50/P95/P99 latency, stat panels for error percentage and active alerts, a bar gauge for top CPU-consuming pods, a heatmap for latency distribution, and a table for recent errors. Add template variables for service and environment. Configure Grafana alerting for high error rate and P99 latency above SLO. Provision everything as code via YAML configuration files.

Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro