Prometheus Metrics — Complete API Monitoring Implementation Guide
In this tutorial, you will learn about Prometheus Metrics. We cover key concepts, practical examples, and best practices to help you master this topic.
Prometheus is a leading open-source monitoring system that collects metrics from instrumented applications. It stores data as time series, enabling powerful queries with PromQL.
What You'll Learn
You'll learn how to expose Prometheus metrics from your API, write PromQL queries, and create alerts.
Why It Matters
Prometheus is the most popular open-source monitoring system, used by Kubernetes, cloud-native applications, and major tech companies worldwide.
Real-World Use
An API exposes /metrics endpoint with Prometheus format. Prometheus scrapes it every 15 seconds. A Grafana dashboard shows real-time request rates, error rates, and p99 latency.
Implementation
from prometheus_flask_exporter import PrometheusMetrics
from flask import Flask, request, jsonify
app = Flask(__name__)
metrics = PrometheusMetrics(app)
metrics.info("api_info", "API Info", version="1.0.0")
@app.route("/api/data")
@metrics.counter("data_requests_total", "Total data requests", labels={
"method": lambda: request.method,
"status": lambda r: r.status_code
})
@metrics.histogram("data_request_duration_seconds", "Data request latency",
buckets=[.01, .05, .1, .5, 1, 2.5, 5])
def get_data():
return jsonify({"data": "success"})
# prometheus.yml scrape config
scrape_configs:
- job_name: 'api'
scrape_interval: 15s
scrape_timeout: 10s
static_configs:
- targets:
- 'api-server:5000'
relabel_configs:
- source_labels: [__address__]
target_label: instance
# Useful PromQL queries
# Request rate per second
rate(api_requests_total[5m])
# p99 latency
histogram_quantile(0.99,
rate(api_request_duration_seconds_bucket[5m]))
# Error ratio
sum(rate(api_requests_total{status=~"5.."}[5m])) /
sum(rate(api_requests_total[5m]))
# Top 5 slowest endpoints
topk(5, avg by (endpoint) (api_request_duration_seconds))
Common Mistakes
| Mistake | Fix | |---------|-----| | No metric labels | Cannot filter by endpoint or status | Add method, endpoint, status labels | | Too many label values | High cardinality explodes metrics | Limit unique label values (<1000) | | Counter reset on restart | Gaps in rate graphs | Use rate() not increase() | | Not using histograms for latency | Cannot calculate percentiles | Use Histogram not Summary for percentiles | | Scraping too frequently | Overloads the API | Scrape every 15-30 seconds |
Practice Questions
- How does Prometheus scrape metrics?
- What is the difference between Counter, Gauge, and Histogram?
- How do you calculate error rate in PromQL?
- What is high cardinality and why avoid it?
- How do you set up alerting rules?
Challenge
Instrument a Flask API with custom Prometheus metrics. Write PromQL queries for: request rate, error ratio, p99 latency. Create an alert for error ratio > 5%.
What's Next
Learn about Grafana dashboards for API visualization.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro