Service Level Agreements (SLAs) vs SLOs vs SLIs

DodaTech Updated 2026-06-23 6 min read

In this tutorial, you'll learn about Service Level Agreements (SLAs) vs SLOs vs SLIs. We cover key concepts, practical examples, and best practices.

SLAs, SLOs, and SLIs form a hierarchy of reliability measurements: SLIs measure what the service does, SLOs set internal targets for those measurements, and SLAs make contractual promises to customers based on those targets — each layer serves a different audience and purpose.

What You'll Learn

In this tutorial, you will learn the distinct definitions of SLAs, SLOs, and SLIs, how they interrelate in a reliability framework, how to set each one for different service tiers, and the common mistakes teams make when confusing these three concepts.

Why It Matters

Confusing SLAs with SLOs is one of the most expensive mistakes an SRE team can make. An SLA is a legal contract with financial penalties. An SLO is an internal engineering target. If you set them at the same level, you have no warning before you breach your contract. Understanding the difference protects both the business and the engineering team.

Real-World Use

DodaTech guarantees 99.9 percent uptime in its customer SLAs for DodaZIP cloud storage. The internal SRE team sets the SLO at 99.99 percent for the same service. This 0.09 percent buffer is the error budget that allows the team to deploy changes and perform maintenance without risking contractual penalties.

graph TD
    A[SLI: Measured Value] --> B[SLO: Internal Target]
    B --> C[Error Budget]
    C --> D[SLA: Customer Contract]
    D --> E[Penalties if Breached]
    B --> F[Engineering Decisions]
    D --> G[Business/Legal]

Prerequisites

You should understand SLIs and SLOs before reading this comparison tutorial. Familiarity with Error Budgets also helps since the SLO-to-SLA buffer creates the error budget.

Definitions

SLI — Service Level Indicator

An SLI is a raw measurement of a specific aspect of service behavior. It answers the question: "What is the current value of this metric?"

Examples:

Request latency at P95 over the last 5 minutes
Number of HTTP 5xx responses per minute
Percentage of successful backups completed today
Storage utilization as a percentage of total capacity

SLO — Service Level Objective

An SLO is a target value or range for an SLI. It answers the question: "What should this metric be?"

Examples:

P95 latency under 500ms over a 30-day window
Error rate below 0.1 percent of requests
Backup success rate of 99.9 percent
Storage utilization below 80 percent

SLA — Service Level Agreement

An SLA is a contractual commitment to a customer that includes specific service levels and penalties for failing to meet them. It answers: "What have we promised the customer?"

Examples:

99.9 percent uptime guarantee with 5 percent service credit per hour of downtime
Maximum 1-second response time at P95 measured monthly
99.99 percent data durability guarantee

Comparison

Dimension	SLI	SLO	SLA
Type	Raw measurement	Internal target	Contractual promise
Audience	Engineering	Engineering + Management	Customers + Legal
Penalty	None	None (error budget)	Financial credits
Typical value	Varies continuously	99% to 99.99%	99% to 99.9%
Review cadence	Real-time	Quarterly	Annually
Tightness	No target	Tight (aspirational)	Loose (with buffer)

Why SLO Must Be Tighter Than SLA

If your SLA promises 99.9 percent uptime and your SLO is also 99.9 percent, then any downtime at all breaches both simultaneously. There is no warning period. The right approach is to set the SLO tighter than the SLA.

def check_sla_slo_relationship(sla, slo):
    print(f"SLA: {sla}%")
    print(f"SLO: {slo}%")
    if slo > sla:
        buffer = slo - sla
        print(f"Buffer: {buffer:.2f}%")
        print(f"Status: SAFE (SLO is {buffer:.2f}% tighter than SLA)")
    else:
        print("Status: DANGER (SLO is not tighter than SLA)")

check_sla_slo_relationship(99.9, 99.99)
check_sla_slo_relationship(99.9, 99.9)

Expected output:

SLA: 99.9%
SLO: 99.99%
Buffer: 0.09%
Status: SAFE (SLO is 0.09% tighter than SLA)
SLA: 99.9%
SLO: 99.9%
Status: DANGER (SLO is not tighter than SLA)

Setting SLIs, SLOs, and SLAs

Step 1: Define SLIs

Identify what matters to users. For a file storage service, durability and availability matter more than latency. For a real-time chat service, latency matters more than durability.

class ServiceLevels:
    def __init__(self, service_name):
        self.name = service_name
        self.slis = []
        self.slos = {}
        self.sla = None

    def add_sli(self, name, measurement, unit):
        self.slis.append({"name": name, "measurement": measurement, "unit": unit})

    def set_slo(self, sli_name, target):
        self.slos[sli_name] = target

    def set_sla(self, sla_value):
        self.sla = sla_value

    def report(self):
        print(f"Service: {self.name}")
        print(f"\nSLIs:")
        for sli in self.slis:
            print(f"  - {sli['name']}: {sli['measurement']} ({sli['unit']})")
        print(f"\nSLOs:")
        for name, target in self.slos.items():
            print(f"  - {name}: {target}")
        print(f"\nSLA: {self.sla}")

levels = ServiceLevels("DodaZIP Cloud Storage")
levels.add_sli("Availability", "Uptime percentage", "%")
levels.add_sli("Durability", "Data integrity check pass rate", "%")
levels.add_sli("Upload latency", "P95 upload completion time", "seconds")
levels.set_slo("Availability", "99.99%")
levels.set_slo("Durability", "99.9999999%")
levels.set_sla("99.9% uptime (SLA)")
levels.report()

Expected output:

Service: DodaZIP Cloud Storage

SLIs:
  - Availability: Uptime percentage (%)
  - Durability: Data integrity check pass rate (%)
  - Upload latency: P95 upload completion time (seconds)

SLOs:
  - Availability: 99.99%
  - Durability: 99.9999999%

SLA: 99.9% uptime (SLA)

Step 2: Set SLOs Using Historical Data

Look at the last 30 days of SLI data. Set the SLO at a level that the service meets most of the time but requires effort to sustain.

Step 3: Negotiate SLAs with the Business

SLAs are business decisions, not engineering decisions. The SRE team provides data on what availability levels are achievable. The product and legal teams decide what to promise customers.

Common Errors

Error	Explanation
Setting SLO equal to SLA	Eliminates the error budget buffer. Any downtime breaches both.
Having only SLAs without SLOs	Without internal SLOs, the SLA is the only reliability target — and by the time you breach it, you are already paying penalties.
Too many SLIs	Each SLI needs monitoring, alerting, and an SLO. Keep the set small and meaningful.
Ignoring SLIs that are not in the SLA	Just because something is not in the SLA does not mean it should not be measured. Track it internally.
Treating SLOs as guarantees	SLOs are not promises. They are internal targets that can be missed without penalty. SLA breaches have consequences.

Practice Questions

What is the difference between SLI, SLO, and SLA?
Why must the SLO be tighter than the SLA?
Who owns each of SLI, SLO, and SLA?
What happens when an SLA is breached?
How many SLIs should you track per service?

Challenge

You are the SRE lead for a new DodaTech service: a real-time document collaboration tool. Define three SLIs, set SLOs for each, and recommend an SLA to the business team. Explain why you chose each SLI and how much buffer exists between the SLO and SLA.

FAQ

What is the difference between SLO and SLA?

An SLO is an internal target for service reliability. An SLA is a contractual commitment to customers. SLOs should always be tighter than SLAs.

Can an SLO be greater than 99.999 percent?

Yes, for services where extreme reliability is achievable. Most services should target 99.9 to 99.99 percent.

Who is responsible for SLAs?

SLAs are owned by product and legal teams. SRE provides input on what is achievable. Engineering owns SLOs.

How often should SLAs be reviewed?

SLAs are contractual terms reviewed annually. SLOs are engineering targets reviewed quarterly.

What happens if we miss an SLO?

Nothing immediate. The team reviews the error budget and may decide to reduce deployment velocity. Missing an SLA triggers financial penalties.

← Previous Toil Automation — Reducing Manual Operations Next → Monitoring and Alerting for SRE

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Site Reliability Engineering