Airflow SLA Miss Error Fix
In this tutorial, you'll learn about Airflow SLA Miss Error Fix. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
A task takes longer than expected but no SLA miss notification is sent:
task = PythonOperator(
task_id="data_load",
python_callable=load_data,
sla=timedelta(hours=2), # Expected to complete in 2 hours
)
The SLAMissCallback is not triggered even though the task runs for 3+ hours. SLAs in Airflow are calculated relative to the DAG's scheduled execution time, not the task's start time. If the DAG started late, the SLA clock started before the task began running.
Step-by-Step Fix
1. Understand SLA timing
WRONG — thinking SLA = task duration:
# If DAG scheduled at 10:00, but started at 10:30 due to pool limits:
# SLA clocks starts at 10:00, not 10:30!
RIGHT — SLA is calculated from the DAG's execution_date:
| Event | Time |
|---|---|
| DAG scheduled at | 10:00 |
| DAG starts at | 10:30 |
| SLA timer | 10:00 + 2h = 12:00 |
| Task completes at | 12:15 |
If the task completes at 12:15, the SLA is missed because 12:15 > 12:00.
2. Set SLA on tasks
WRONG — no SLA set:
task = PythonOperator(
task_id="data_load",
python_callable=load_data,
# No sla parameter
)
RIGHT — set SLA:
task = PythonOperator(
task_id="data_load",
python_callable=load_data,
sla=timedelta(hours=2),
execution_timeout=timedelta(hours=3), # Hard stop
)
3. Configure SLA notification callback
WRONG — SLA miss silently ignored:
RIGHT — configure callbacks in default_args:
def sla_miss_callback(dag, task_list, blocking_task_list, slas, blocking_tis):
print(f"SLA missed for DAG: {dag.dag_id}")
for task in task_list:
print(f"Task: {task.task_id}")
send_alert(dag.dag_id, task_list)
default_args = {
"sla": timedelta(hours=2),
"sla_miss_callback": sla_miss_callback,
}
4. Check SLA in the UI
Airflow UI > DAG > SLA Misses tab
This shows all tasks that missed their SLA. If the tab is empty, no SLA misses have occurred.
5. Handle SLA miss notification via email
[email]
sla_miss_callback = airflow.providers.sendgrid.utils.emailer.send_email
[smtp]
smtp_host = smtp.sendgrid.net
smtp_user = apikey
smtp_password = SG.xxxxx
smtp_mail_from = airflow@example.com
Configure default SLA notification settings:
default_args = {
"email": ["alerts"@example".com"],
"email_on_failure": True,
}
6. Set execution_timeout with SLA
WRONG — only SLA without hard timeout:
task = PythonOperator(
task_id="data_load",
python_callable=load_data,
sla=timedelta(hours=2),
# No execution_timeout — task runs forever
)
RIGHT — SLA for alerting, execution_timeout for enforcement:
task = PythonOperator(
task_id="data_load",
python_callable=load_data,
sla=timedelta(hours=2), # Alert at 2 hours
execution_timeout=timedelta(hours=4), # Kill at 4 hours
)
Expected output: SLA miss notifications fire when tasks exceed the expected duration.
Prevention
- Set SLA values based on historical task durations (p50 + buffer).
- Always pair SLA with execution_timeout for hard enforcement.
- Configure sla_miss_callback for all production DAGs.
- Monitor the SLA Misses tab in the Airflow UI regularly.
- Use Prometheus or Datadog metrics for SLA monitoring at scale.
Common Mistakes with sla miss
- Forgetting that lazy evaluation defers computation until the value is forced, causing space leaks with unevaluated thunks
- Using
returnto exit a function early instead of wrapping a pure value in the monad - Mixing let bindings with <- bindings in do notation, producing type errors
These mistakes appear frequently in real-world AIRFLOW code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.
Practice Exercise
Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.
This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro