Alerting with Alertmanager: Configuring Alerts and Notifications
In this tutorial, you'll learn about Alerting with Alertmanager: Configuring Alerts and Notifications. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
What You Will Learn
This tutorial teaches you how to configure Prometheus Alertmanager to receive alerts from Prometheus, route them to the right teams, and send notifications through email, Slack, PagerDuty, and Webhooks.
Why It Matters
Metrics without alerts are just history. When a service goes down, you need to know immediately -- not when a user complains. Alertmanager gives you the flexibility to route, group, and silence alerts so your team gets the right signal at the right time.
Real-World Use
The Durga Antivirus Pro team uses Alertmanager with multiple routes: critical security alerts go to PagerDuty with a 5-minute escalation, warning-level alerts go to Slack for the next business day, and informational alerts are grouped into a daily digest email.
Alertmanager is the Prometheus component responsible for alert handling. It deduplicates, groups, and routes alerts to notification channels. It also handles silencing and inhibition -- ensuring that a single root cause does not trigger a cascade of redundant notifications.
Prerequisites
- A running Prometheus instance (see Prometheus Introduction)
- Prometheus alerting rules configured (see recording and alerting rules)
- Docker installed for running Alertmanager
Step-by-Step Tutorial
Step 1: Deploy Alertmanager
docker run -d --name alertmanager \
-p 9093:9093 \
-v $(pwd)/alertmanager.yml:/etc/alertmanager/alertmanager.yml \
prom/alertmanager:v0.27.0
Expected output: Alertmanager listens on port 9093. Visit http://localhost:9093 for the web UI.
Step 2: Create the Alertmanager Configuration
Create alertmanager.yml:
route:
receiver: "default"
group_by: ["alertname", "severity"]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receivers:
- name: "default"
slack_configs:
- api_url: "https://hooks.slack.com/services/T0000/B0000/XXXXX"
channel: "#alerts"
send_resolved: true
This sends all alerts to a Slack channel. Alerts are grouped by name and severity.
Step 3: Configure Prometheus to Use Alertmanager
In <a href="/devops/prometheus-grafana/">Prometheus</a>.yml:
alerting:
alertmanagers:
- static_configs:
- targets: ["localhost:9093"]
rule_files:
- "alert-rules.yml"
Step 4: Write Alerting Rules
Create alert-rules.yml:
groups:
- name: infrastructure
rules:
- alert: InstanceDown
expr: up == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} has been down for more than 5 minutes."
- alert: HighCPUUsage
expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "CPU usage above 80% on {{ $labels.instance }}"
Step 5: Add Multiple Routes
route:
receiver: "default"
routes:
- match:
severity: critical
receiver: "pagerduty-critical"
repeat_interval: 5m
- match:
severity: warning
receiver: "slack-warnings"
repeat_interval: 4h
- match:
alertname: "InstanceDown"
receiver: "pagerduty-critical"
receivers:
- name: "pagerduty-critical"
pagerduty_configs:
- routing_key: "YOUR_PAGERDUTY_KEY"
severity: critical
- name: "slack-warnings"
slack_configs:
- api_url: "https://hooks.slack.com/services/T0000/B0000/XXXXX"
channel: "#warnings"
Step 6: Configure Email Notifications
receivers:
- name: "email-alerts"
email_configs:
- to: "team@dodatech.com"
from: "alertmanager@dodatech.com"
smarthost: "smtp.sendgrid.net:587"
auth_username: "apikey"
auth_identity: "apikey"
auth_password: "SG.XXXXX"
Step 7: Set Up Silences and Inhibitions
Inhibitions prevent redundant alerts when a root cause is already firing:
inhibit_rules:
- source_match:
severity: "critical"
target_match:
severity: "warning"
equal: ["instance"]
This suppresses all warning-level alerts on an instance that already has a critical alert.
To silence an alert from the web UI:
- Go to
http://localhost:9093/#/silences - Click New Silence
- Set matchers:
alertname="HighCPUUsage" - Set duration and comment
- Create
Step 8: Test via the API
curl -X POST http://localhost:9093/api/v1/alerts \
-H "Content-Type: application/json" \
-d '[
{
"labels": {
"alertname": "TestAlert",
"severity": "critical",
"instance": "test-01]
},
"annotations": {
"summary": "This is a test alert"
}
}
]'
Expected output: The alert appears in the Alertmanager web UI and triggers the configured route.
Step 9: Use Webhook Receivers for Custom Integrations
receivers:
- name: "webhook"
webhook_configs:
- url: "http://internal-webhook.dodatech.com/alert"
send_resolved: true
Your webhook endpoint receives a JSON payload with the alert data.
Learning Path
flowchart LR
A[Alert Rules] -->|fires| B[Prometheus]
B -->|sends| C[Alertmanager]
C --> D{Routing}
D --> E[Slack]
D --> F[PagerDuty]
D --> G[Email]
D --> H[Webhook]
C --> I[Silences]
C --> J[Inhibitions]
style D fill:#4a90d9,color:#fff
style C fill:#e67e22,color:#fff
Common Errors
Alerts are firing but no notifications are sent -- The Alertmanager configuration has a routing error. Check Alertmanager logs with
docker logs alertmanager.Slack notification says "invalid_url" -- The Slack webhook URL is incorrect or expired. Generate a new webhook in the Slack API dashboard.
Email notifications are not delivered -- SMTP credentials are wrong or the smarthost is not reachable. Test with
curlto verify the SMTP endpoint.Alerts repeat too frequently -- The
repeat_intervalis set too low. Increase it to 4h or more for warning-level alerts.The InstanceDown alert fires during rolling updates -- The
for: 5mclause is not long enough. Increase the duration to accommodate your deployment cycle.Webhook receiver times out -- Alertmanager has a default 5-second timeout for Webhooks. Ensure your endpoint responds quickly.
Inhibition does not suppress warnings -- The
equallabel must match exactly. Ensure the source and target alerts share the same label values.
Practice Questions
What port does Alertmanager listen on by default? Answer: 9093.
What is the purpose of the
group_byparameter in the route configuration? Answer: It groups alerts by specified labels so that related alerts arrive as a single notification instead of many individual ones.How do you prevent alerts from firing during scheduled maintenance? Answer: Create a silence in the Alertmanager web UI or API matching the specific instance or alert name.
What is an inhibition rule? Answer: It suppresses lower-severity alerts when a higher-severity alert is already firing for the same entity.
How do you configure Alertmanager to send alerts to multiple destinations? Answer: Define multiple receivers and route alerts to different receivers using match conditions on labels.
Challenge
Set up a complete alerting pipeline for a production environment. Configure Prometheus alerting rules for: InstanceDown (critical), HighCPUUsage (warning), HighMemoryUsage (warning), DiskSpaceFull (critical), and HighErrorRate (critical based on application metrics). Route critical alerts to PagerDuty with escalation, warnings to Slack in #ops channel, and disk alerts to email. Set up an inhibition rule that suppresses all warnings on an instance that is already down. Create a webhook receiver that forwards alerts to an internal incident management system. Verify every route by firing test alerts through the API.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro