Skip to content

Airflow SLA Miss Error Fix

DodaTech Updated 2026-06-24 3 min read

In this tutorial, you'll learn about Airflow SLA Miss Error Fix. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

A task takes longer than expected but no SLA miss notification is sent:

task = PythonOperator(
    task_id="data_load",
    python_callable=load_data,
    sla=timedelta(hours=2),  # Expected to complete in 2 hours
)

The SLAMissCallback is not triggered even though the task runs for 3+ hours. SLAs in Airflow are calculated relative to the DAG's scheduled execution time, not the task's start time. If the DAG started late, the SLA clock started before the task began running.

Step-by-Step Fix

1. Understand SLA timing

WRONG — thinking SLA = task duration:

# If DAG scheduled at 10:00, but started at 10:30 due to pool limits:
# SLA clocks starts at 10:00, not 10:30!

RIGHT — SLA is calculated from the DAG's execution_date:

Event Time
DAG scheduled at 10:00
DAG starts at 10:30
SLA timer 10:00 + 2h = 12:00
Task completes at 12:15

If the task completes at 12:15, the SLA is missed because 12:15 > 12:00.

2. Set SLA on tasks

WRONG — no SLA set:

task = PythonOperator(
    task_id="data_load",
    python_callable=load_data,
    # No sla parameter
)

RIGHT — set SLA:

task = PythonOperator(
    task_id="data_load",
    python_callable=load_data,
    sla=timedelta(hours=2),
    execution_timeout=timedelta(hours=3),  # Hard stop
)

3. Configure SLA notification callback

WRONG — SLA miss silently ignored:

RIGHT — configure callbacks in default_args:

def sla_miss_callback(dag, task_list, blocking_task_list, slas, blocking_tis):
    print(f"SLA missed for DAG: {dag.dag_id}")
    for task in task_list:
        print(f"Task: {task.task_id}")
    send_alert(dag.dag_id, task_list)

default_args = {
    "sla": timedelta(hours=2),
    "sla_miss_callback": sla_miss_callback,
}

4. Check SLA in the UI

Airflow UI > DAG > SLA Misses tab

This shows all tasks that missed their SLA. If the tab is empty, no SLA misses have occurred.

5. Handle SLA miss notification via email

[email]
sla_miss_callback = airflow.providers.sendgrid.utils.emailer.send_email

[smtp]
smtp_host = smtp.sendgrid.net
smtp_user = apikey
smtp_password = SG.xxxxx
smtp_mail_from = airflow@example.com

Configure default SLA notification settings:

default_args = {
    "email": ["alerts"@example".com"],
    "email_on_failure": True,
}

6. Set execution_timeout with SLA

WRONG — only SLA without hard timeout:

task = PythonOperator(
    task_id="data_load",
    python_callable=load_data,
    sla=timedelta(hours=2),
    # No execution_timeout — task runs forever
)

RIGHT — SLA for alerting, execution_timeout for enforcement:

task = PythonOperator(
    task_id="data_load",
    python_callable=load_data,
    sla=timedelta(hours=2),       # Alert at 2 hours
    execution_timeout=timedelta(hours=4),  # Kill at 4 hours
)

Expected output: SLA miss notifications fire when tasks exceed the expected duration.

Prevention

  • Set SLA values based on historical task durations (p50 + buffer).
  • Always pair SLA with execution_timeout for hard enforcement.
  • Configure sla_miss_callback for all production DAGs.
  • Monitor the SLA Misses tab in the Airflow UI regularly.
  • Use Prometheus or Datadog metrics for SLA monitoring at scale.

Common Mistakes with sla miss

  1. Forgetting that lazy evaluation defers computation until the value is forced, causing space leaks with unevaluated thunks
  2. Using return to exit a function early instead of wrapping a pure value in the monad
  3. Mixing let bindings with <- bindings in do notation, producing type errors

These mistakes appear frequently in real-world AIRFLOW code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### What's the difference between SLA and execution_timeout?

SLA sends an alert when a task takes longer than expected but does not stop the task. execution_timeout kills the task and marks it as failed. Use SLA for soft alerts and execution_timeout for hard limits.

Why does my SLA miss notification arrive hours late?

SLA misses are checked by the scheduler every scheduler.sla_check_interval seconds (default 5 minutes). If your SLA window is tight (e.g., 5 minutes), the notification may appear after the SLA is already overdue.

Can I set SLAs on a DAG level?

SLAs are set per task, not per DAG. To monitor the total DAG duration, create a final task that checks the total elapsed time and raises an alert if exceeded.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro