Airflow DAG Run Duplicate Fix
In this tutorial, you'll learn about Airflow DAG Run Duplicate Fix. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Airflow creates multiple DAG runs for the same execution date:
DAG Runs: 3 runs for 2024-06-24
- manual__2024-06-24 (running)
- scheduled__2024-06-24 (queued)
- backfill__2024-06-24 (success)
Duplicate runs occur when the scheduler, a manual trigger, and a backfill all create runs for the same data interval. This wastes resources and can cause data corruption if tasks write to the same output.
Step-by-Step Fix
1. Prevent manual triggers during scheduled runs
WRONG — triggering a DAG run while the scheduled run is active:
RIGHT — check active runs before triggering:
from airflow.models import DagRun
from airflow.utils.state import DagRunState
def trigger_safe(dag_id):
active_runs = DagRun.find(
dag_id=dag_id,
state=[DagRunState.RUNNING, DagRunState.QUEUED]
)
if active_runs:
print(f"DAG {dag_id} already has {len(active_runs)} active runs")
return
# Trigger the DAG safely
2. Set max_active_runs
WRONG — allowing unlimited concurrent runs:
with DAG(
dag_id="my_dag",
schedule="@daily",
# max_active_runs not set — defaults to 16
):
RIGHT — limit to one run at a time:
with DAG(
dag_id="my_dag",
schedule="@daily",
max_active_runs=1,
catchup=False,
):
This ensures only one DAG run is active at any time.
3. Backfill correctly
WRONG — backfill creates overlapping runs with the scheduler:
airflow dags backfill my_dag \
--start-date 2024-01-01 \
--end-date 2024-06-24 \
--reset-dagruns # This resets existing runs
RIGHT — backfill with safe options:
# Only backfill missing intervals
airflow dags backfill my_dag \
--start-date 2024-01-01 \
--end-date 2024-06-24 \
--rerun-failed-tasks # Only rerun failed tasks
The scheduler also creates runs for the schedule interval. Avoid backfilling intervals the scheduler will cover.
4. Use catchup=False in DAG definition
WRONG — catchup=True creates runs for all missed intervals:
with DAG(
dag_id="my_dag",
start_date=datetime(2024, 1, 1),
schedule="@daily", # Creates runs for every day since Jan 1!
catchup=True,
):
RIGHT — disable catchup for most DAGs:
with DAG(
dag_id="my_dag",
start_date=datetime(2024, 1, 1),
schedule="@daily",
catchup=False, # Only create a run for the latest interval
):
5. Check run_id uniqueness
WRONG — manual runs with non-unique run IDs:
# Via API
dag_run = DagRun(
dag_id="my_dag",
run_id="manual_2024-06-24", # Duplicate run_id causes error
execution_date=datetime(2024, 6, 24),
)
RIGHT — use unique run IDs:
from airflow.utils.timezone import utcnow
dag_run = DagRun(
dag_id="my_dag",
run_id=f"manual_{utcnow().isoformat()}", # Unique timestamp
execution_date=datetime(2024, 6, 24),
)
6. Prevent concurrent runs in production
# Airflow config
[core]
max_active_runs_per_dag = 1 # Global limit
[scheduler]
max_per_run_task_creation = 1 # Prevent scheduler from creating too many
Expected output: only one DAG run exists per execution date.
Prevention
- Set
max_active_runs=1for most production DAGs. - Disable
catchupunless backfilling is explicitly needed. - Avoid manual triggering while scheduled runs are active.
- Use
--reset-dagrunscarefully in backfill commands. - Monitor the DAG Runs page in the UI for unexpected duplicates.
Common Mistakes with dag run duplicate
- Forgetting
deriving (Show, Eq)on custom data types needed for debugging - Placing the wildcard pattern first in case expressions, making all subsequent patterns unreachable
- Using
headandtailinstead of pattern matching, causing runtime errors on empty lists
These mistakes appear frequently in real-world AIRFLOW code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.
Practice Exercise
Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.
This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro