Grafana Dashboards â Visualization, Alerts & Dashboard as Code Guide
In this tutorial, you'll learn about Grafana Dashboards. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Grafana is an open-source analytics and visualization platform that connects to data sources (Prometheus, Elasticsearch, InfluxDB, Loki) to create interactive dashboards, set up alerts, and provide ad-hoc exploration of metrics and logs.
What You'll Learn
Why It Matters
Raw metrics and logs are overwhelming. Grafana transforms them into actionable visualizations â line charts for trends, heatmaps for distributions, and status panels for SLOs. DodaTech's platform team has 200+ Grafana dashboards covering infrastructure, application, and business metrics, shared across engineering teams via provisioning and team folders.
Real-World Use
DodaZIP's on-call dashboard shows real-time request latency (P50/P95/P99), error rate, CPU/memory by pod, database query performance, and active alerts â all on a single screen. When an incident fires, the engineer opens this dashboard first, reducing mean time to triage by 60%.
flowchart TD
A[Data Sources] --> B[Grafana Server]
B --> C[Dashboards]
B --> D[Alerting]
B --> E[Explore Mode]
C --> F[Time Series Panel]
C --> G[Stat Panel]
C --> H[Table Panel]
C --> I[Bargauge Panel]
D --> J[Alert Rules]
J --> K[Contact Points]
K --> L[PagerDuty]
K --> M[Slack]
K --> N[Email]
style B fill:#F46800,color:#fff
Prerequisites: Prometheus or another data source configured. Basic familiarity with PromQL or LogQL.
Installation
# Install Grafana on Ubuntu
sudo apt-get update
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
sudo wget -q -O /usr/share/keyrings/grafana.key https://packages.grafana.com/gpg.key
sudo apt-get update
sudo apt-get install -y grafana
# Start Grafana
sudo systemctl daemon-reload
sudo systemctl start grafana-server
sudo systemctl enable grafana-server
# Verify
sudo systemctl status grafana-server
# Expected output:
# â grafana-server.service - Grafana instance
# Loaded: loaded
# Active: active (running)
# Main PID: 12345 (grafana)
# Tasks: 14 (limit: 4915)
# Access at http://localhost:3000 (admin/admin)
Dashboard as Code (Provisioning)
# /etc/grafana/provisioning/datasources/prometheus.yml
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
jsonData:
timeInterval: 30s
queryTimeout: 60s
httpMethod: POST
- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
maxLines: 1000
# /etc/grafana/provisioning/dashboards/dashboards.yml
apiVersion: 1
providers:
- name: DodaTech Infrastructure
type: file
updateIntervalSeconds: 30
options:
path: /etc/grafana/dashboards
foldersFromFilesStructure: true
Dashboard JSON Model
{
"title": "DodaTech API Overview",
"uid": "dodatech-api-overview",
"tags": ["dodatech", "api"],
"timezone": "browser",
"panels": [
{
"title": "Request Rate",
"type": "timeseries",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 0 },
"targets": [
{
"expr": "sum by (status_class) (rate(http_requests_total{service=\"api\"}[5m]))",
"legendFormat": "{{ status_class }}",
"refId": "A"
}
],
"fieldConfig": {
"defaults": {
"unit": "reqps",
"min": 0,
"custom": {
"lineInterpolation": "smooth",
"showPoints": "never"
}
},
"overrides": [
{
"matcher": { "id": "byName", "options": "5xx" },
"properties": [{ "id": "color", "value": { "fixed": "red" } }]
}
]
}
},
{
"title": "P99 Latency",
"type": "timeseries",
"gridPos": { "h": 8, "w": 12, "x": 12, "y": 0 },
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le) (rate(http_request_duration_seconds_bucket{service=\"api\"}[5m])))",
"legendFormat": "p99"
}
],
"fieldConfig": {
"defaults": { "unit": "s" }
}
},
{
"title": "Current Error Rate",
"type": "stat",
"gridPos": { "h": 4, "w": 4, "x": 0, "y": 8 },
"targets": [
{
"expr": "sum(rate(http_requests_total{service=\"api\", status=~\"5..\"}[5m])) / sum(rate(http_requests_total{service=\"api\"}[5m])) * 100",
"legendFormat": "Error %"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"thresholds": {
"mode": "absolute",
"steps": [
{ "color": "green", "value": null },
{ "color": "yellow", "value": 1 },
{ "color": "red", "value": 5 }
]
}
}
}
},
{
"title": "CPU Usage by Pod",
"type": "bargauge",
"gridPos": { "h": 8, "w": 12, "x": 0, "y": 12 },
"targets": [
{
"expr": "topk(10, sum by (pod) (rate(container_cpu_usage_seconds_total{namespace=\"production\"}[5m])) * 100)",
"legendFormat": "{{ pod }}"
}
],
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100
}
}
}
],
"refresh": "30s",
"time": { "from": "now-6h", "to": "now" }
}
Alerting
# /etc/grafana/provisioning/alerting/contact-points.yml
apiVersion: 1
contactPoints:
- name: PagerDuty Critical
receivers:
- uid: pagerduty-critical
type: pagerduty
settings:
integrationKey: YOUR_PAGERDUTY_KEY
severity: critical
- name: Slack DevOps
receivers:
- uid: slack-devops
type: slack
settings:
url: https://hooks.slack.com/services/T00/B00/XXX
channel: '#devops-alerts'
title: 'Grafana Alert: {{ .Message }}'
# /etc/grafana/provisioning/alerting/rules.yml
apiVersion: 1
groups:
- name: API SLO Alerts
interval: 30s
rules:
- uid: api_high_error_rate
title: "API Error Rate Above 5%"
condition: C
data:
- refId: A
relativeTimeRange:
from: 300
to: 0
datasourceUid: prometheus
model:
expr: sum(rate(http_requests_total{service="api", status=~"5.."}[5m])) / sum(rate(http_requests_total{service="api"}[5m])) * 100
- refId: B
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
type: math
expression: $A > 5
- refId: C
relativeTimeRange:
from: 0
to: 0
datasourceUid: __expr__
model:
type: reduce
expression: $B
reducer: last
noDataState: NoData
execErrState: Alerting
for: 5m
annotations:
summary: "API error rate is {{ $values.A }}%"
labels:
severity: critical
Using Variables
{
"title": "Service Overview",
"templating": {
"list": [
{
"name": "service",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(up, service)",
"refresh": 1,
"includeAll": true,
"multi": true
},
{
"name": "instance",
"type": "query",
"datasource": "Prometheus",
"query": "label_values(up{service=\"$service\"}, instance)",
"refresh": 1
},
{
"name": "environment",
"type": "custom",
"options": [
{ "value": "production", "text": "Production" },
{ "value": "staging", "text": "Staging" }
],
"current": { "value": "production" }
}
]
}
}
Annotations
{
"title": "Deployments",
"annotations": {
"list": [
{
"name": "Deploy Events",
"type": "dashboard",
"builtIn": 0,
"datasource": "Prometheus",
"expr": "changes(deploy_timestamp[1m]) > 0",
"iconColor": "blue",
"enable": true,
"showIn": 0
},
{
"name": "Alert Events",
"type": "dashboard",
"builtIn": 0,
"datasource": "Grafana",
"enable": true,
"showIn": 0,
"iconColor": "red",
"rawQuery": "SELECT alert_name, alert_severity FROM alert WHERE time > now() - 24h"
}
]
}
}
Common Configuration Mistakes
Not setting min/max on panel axes: Auto-scaling axes hide trends. Set
min: 0for rates andmax: 100for percentages to maintain consistent visual baselines.Using too many series in one panel: Grafana struggles to render 50+ series. Use
topk()or aggregation to reduce series count, or split into multiple panels.Ignoring dashboard refresh intervals: Setting
refresh: 1son a dashboard with expensive queries overloads the data source. Use 30s for infrastructure, 5s for critical app metrics.Not using variables for reusability: Hardcoded service names require duplicating panels. Template variables make one panel reusable across any service.
Storing dashboard JSON in the database without version control: Dashboard-as-code (provisioning JSON files in Git) provides history, review, and rollback. The database-only approach loses all audit trail.
Practice Questions
What panel types are available in Grafana? Answer: Time series, Stat, Table, Bar gauge, Gauge, Heatmap, State timeline, Logs, Pie chart, Candlestick, and many more via plugins.
How does template variables make dashboards reusable? Answer: Variables let users filter by service, instance, environment, or any label without editing panels. One dashboard template works for any service.
What is the difference between Grafana Alerting and data source alerting? Answer: Grafana Alerting (built-in) supports multi-dimensional alerts across any data source with a unified UI. Prometheus Alertmanager handles only Prometheus-based alerts.
How do you manage dashboards as code? Answer: Export dashboard JSON, store it in a Git Repository, and configure Grafana provisioning to load dashboards from the filesystem. Use CI/CD to manage updates.
Challenge
Build a complete application monitoring dashboard: create a dashboard with time series panels for request rate and P50/P95/P99 latency, stat panels for error percentage and active alerts, a bar gauge for top CPU-consuming pods, a heatmap for latency distribution, and a table for recent errors. Add template variables for service and environment. Configure Grafana alerting for high error rate and P99 latency above SLO. Provision everything as code via YAML configuration files.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro