Metrics — Complete Request Rate, Errors, Latency Guide
In this tutorial, you will learn about Metrics. We cover key concepts, practical examples, and best practices to help you master this topic.
The three golden signals of API monitoring are request rate (traffic volume), error rate (failed requests), and latency (response time). These metrics give you a comprehensive view of API health.
What You'll Learn
You'll learn the key API metrics, how to measure them, and how to set meaningful thresholds.
Why It Matters
These three metrics capture most API problems. A traffic spike, error burst, or latency increase all signal issues that need investigation.
Real-World Use
An API dashboard shows: 1000 requests/sec, 0.5% error rate, p50 latency 50ms, p95 200ms, p99 500ms. When p99 latency jumps to 2000ms, the team investigates the slowest 1% of requests.
Key Metrics
from prometheus_client import Counter, Histogram, Gauge
import time
REQUEST_COUNT = Counter(
"api_requests_total",
"Total API requests",
["method", "endpoint", "status"]
)
REQUEST_LATENCY = Histogram(
"api_request_duration_seconds",
"API request latency in seconds",
["method", "endpoint"],
buckets=[.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5]
)
IN_FLIGHT = Gauge(
"api_requests_in_flight",
"Current requests being processed",
["method"]
)
@app.route("/api/data")
def get_data():
method = request.method
endpoint = "/api/data"
IN_FLIGHT.labels(method=method).inc()
start = time.time()
try:
result = process_request()
REQUEST_COUNT.labels(method=method, endpoint=endpoint, status=200).inc()
return result
except Exception:
REQUEST_COUNT.labels(method=method, endpoint=endpoint, status=500).inc()
raise
finally:
IN_FLIGHT.labels(method=method).dec()
REQUEST_LATENCY.labels(method=method, endpoint=endpoint).observe(
time.time() - start
)
Metric Types
| Metric | Description | Example Alert |
|---|---|---|
| Request rate | Requests per second | Drop below 50% of normal |
| Error rate | Percentage of 5xx responses | Error rate > 5% for 5 minutes |
| Latency p50 | Median response time | p50 > 200ms |
| Latency p95 | Slowest 5% | p95 > 500ms |
| Latency p99 | Slowest 1% | p99 > 2000ms |
Common Mistakes
| Mistake | Fix | |---------|-----| | Only tracking average latency | Hides tail latency | Track p50, p95, p99 percentiles | | No error rate by endpoint | One broken endpoint hidden | Track errors per endpoint | | Counting 4xx as errors | Client errors skew data | Track 5xx as errors, 4xx separately | | No rate of change alerts | Gradual degradation missed | Alert on rate of change metrics | | Too many metrics | Noise hides signal | Start with 5 key metrics |
Practice Questions
- What are the three golden signals?
- Why is p99 latency more important than average?
- What is a good error rate threshold?
- How do you set latency SLOs?
- What is the difference between p95 and p99?
Challenge
Instrument a Flask API with Prometheus metrics: request count (by method, endpoint, status), latency histogram, and in-flight requests gauge. Query with PromQL to find p99 latency.
What's Next
Learn about structured JSON logging for APIs.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro