Skip to content

SLO, SLI, Error Budget — Complete API Reliability Guide

DodaTech Updated 2026-06-28 2 min read

In this tutorial, you will learn about SLO, SLI, Error Budget. We cover key concepts, practical examples, and best practices to help you master this topic.

Service Level Objectives (SLOs) define target reliability levels for your API. Service Level Indicators (SLIs) measure actual performance. Error budgets quantify acceptable downtime.

What You'll Learn

You'll learn how to define SLIs, set SLOs, and use error budgets to balance reliability and feature development.

Why It Matters

SLOs align engineering teams on reliability goals. Error budgets provide a data-driven framework for deciding when to prioritize reliability over features.

Real-World Use

An API has a 99.9% uptime SLO (8.76 hours downtime per year). When the error budget drops below 50%, the team halts feature deploys and focuses on reliability improvements.

flowchart LR
    A[Define SLIs] --> B[Set SLO Targets]
    B --> C[Measure SLIs]
    C --> D{Within Budget?}
    D -->|Yes| E[Continue Feature Work]
    D -->|No| F[Freeze Features]
    F --> G[Reliability Work]
    G --> C

Implementation

# SLO calculation
import time
from collections import deque

class SLOMonitor:
    def __init__(self, slo_target=0.999, window_seconds=30*24*3600):
        self.slo_target = slo_target
        self.window_seconds = window_seconds
        self.events = deque()

    def record_request(self, success):
        self.events.append((time.time(), success))
        cutoff = time.time() - self.window_seconds
        while self.events and self.events[0][0] < cutoff:
            self.events.popleft()

    def current_availability(self):
        if not self.events:
            return 1.0
        successes = sum(1 for _, s in self.events if s)
        return successes / len(self.events)

    def error_budget_remaining(self):
        availability = self.current_availability()
        budget = self.slo_target - (1 - availability)
        return budget / (1 - self.slo_target)

slo = SLOMonitor(slo_target=0.999)
for _ in range(100000):
    slo.record_request(success=True)
for _ in range(100):
    slo.record_request(success=False)
print(f"Availability: {slo.current_availability():.5%}")
print(f"Budget remaining: {slo.error_budget_remaining():.1%}")

Example SLOs

SLI SLO Error Budget / Month
API uptime 99.9% 43 minutes
API uptime 99.99% 4.3 minutes
Latency p99 < 500ms 99.5% 3.6 hours
Error rate < 1% 99% 7.3 hours

Common Mistakes

| Mistake | Fix | |---------|-----| | 100% uptime SLO | Impossible; leads to burnout | Set realistic SLOs (99.9% or 99.99%) | | No error budget policy | Budget meaningless without action | Define what happens when budget exhausted | | Too many SLOs | No focus | Start with 3-5 key SLOs | | SLO without SLI measurement | Cannot track progress | Implement SLI measurement first | | Different SLO per environment | Inconsistent expectations | Same SLO targets for all environments |

Practice Questions

  1. What is the difference between SLI and SLO?
  2. How does an error budget work?
  3. What is a realistic SLO for a public API?
  4. What happens when the error budget is exhausted?
  5. How do you measure latency SLI?

Challenge

Implement SLO monitoring for an API: track uptime (99.9%), p99 latency (<500ms, 99.5%), and error rate (<1%, 99%). Display budget remaining on a dashboard.

What's Next

Learn about anomaly detection for API monitoring.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro