Skip to content

Airflow DAG Run Duplicate Fix

DodaTech Updated 2026-06-24 3 min read

In this tutorial, you'll learn about Airflow DAG Run Duplicate Fix. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Airflow creates multiple DAG runs for the same execution date:

DAG Runs: 3 runs for 2024-06-24
- manual__2024-06-24 (running)
- scheduled__2024-06-24 (queued)
- backfill__2024-06-24 (success)

Duplicate runs occur when the scheduler, a manual trigger, and a backfill all create runs for the same data interval. This wastes resources and can cause data corruption if tasks write to the same output.

Step-by-Step Fix

1. Prevent manual triggers during scheduled runs

WRONG — triggering a DAG run while the scheduled run is active:

RIGHT — check active runs before triggering:

from airflow.models import DagRun
from airflow.utils.state import DagRunState

def trigger_safe(dag_id):
    active_runs = DagRun.find(
        dag_id=dag_id,
        state=[DagRunState.RUNNING, DagRunState.QUEUED]
    )
    if active_runs:
        print(f"DAG {dag_id} already has {len(active_runs)} active runs")
        return
    # Trigger the DAG safely

2. Set max_active_runs

WRONG — allowing unlimited concurrent runs:

with DAG(
    dag_id="my_dag",
    schedule="@daily",
    # max_active_runs not set — defaults to 16
):

RIGHT — limit to one run at a time:

with DAG(
    dag_id="my_dag",
    schedule="@daily",
    max_active_runs=1,
    catchup=False,
):

This ensures only one DAG run is active at any time.

3. Backfill correctly

WRONG — backfill creates overlapping runs with the scheduler:

airflow dags backfill my_dag \
    --start-date 2024-01-01 \
    --end-date 2024-06-24 \
    --reset-dagruns  # This resets existing runs

RIGHT — backfill with safe options:

# Only backfill missing intervals
airflow dags backfill my_dag \
    --start-date 2024-01-01 \
    --end-date 2024-06-24 \
    --rerun-failed-tasks  # Only rerun failed tasks

The scheduler also creates runs for the schedule interval. Avoid backfilling intervals the scheduler will cover.

4. Use catchup=False in DAG definition

WRONG — catchup=True creates runs for all missed intervals:

with DAG(
    dag_id="my_dag",
    start_date=datetime(2024, 1, 1),
    schedule="@daily",  # Creates runs for every day since Jan 1!
    catchup=True,
):

RIGHT — disable catchup for most DAGs:

with DAG(
    dag_id="my_dag",
    start_date=datetime(2024, 1, 1),
    schedule="@daily",
    catchup=False,  # Only create a run for the latest interval
):

5. Check run_id uniqueness

WRONG — manual runs with non-unique run IDs:

# Via API
dag_run = DagRun(
    dag_id="my_dag",
    run_id="manual_2024-06-24",  # Duplicate run_id causes error
    execution_date=datetime(2024, 6, 24),
)

RIGHT — use unique run IDs:

from airflow.utils.timezone import utcnow

dag_run = DagRun(
    dag_id="my_dag",
    run_id=f"manual_{utcnow().isoformat()}",  # Unique timestamp
    execution_date=datetime(2024, 6, 24),
)

6. Prevent concurrent runs in production

# Airflow config
[core]
max_active_runs_per_dag = 1  # Global limit

[scheduler]
max_per_run_task_creation = 1  # Prevent scheduler from creating too many

Expected output: only one DAG run exists per execution date.

Prevention

  • Set max_active_runs=1 for most production DAGs.
  • Disable catchup unless backfilling is explicitly needed.
  • Avoid manual triggering while scheduled runs are active.
  • Use --reset-dagruns carefully in backfill commands.
  • Monitor the DAG Runs page in the UI for unexpected duplicates.

Common Mistakes with dag run duplicate

  1. Forgetting deriving (Show, Eq) on custom data types needed for debugging
  2. Placing the wildcard pattern first in case expressions, making all subsequent patterns unreachable
  3. Using head and tail instead of pattern matching, causing runtime errors on empty lists

These mistakes appear frequently in real-world AIRFLOW code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### Can I have multiple runs for the same execution date?

Yes, but only if they have different run_ids. Airflow does not enforce uniqueness of execution_date — it enforces uniqueness of (dag_id, run_id). Multiple runs with the same execution_date but different run_ids are allowed.

What happens to data when duplicate runs happen?

If tasks write to the same output location (same table, same file path), the second run overwrites the first. For idempotent tasks, this is harmless. For non-idempotent tasks (append-only writes), duplicates cause data duplication.

How do I find duplicate DAG runs programmatically?

from airflow.models import DagRun
from collections import Counter

runs = DagRun.find(dag_id="my_dag") dates = [r.execution_date for r in runs] duplicates = [date for date, count in Counter(dates).items() if count > 1] print(f"Duplicate runs for dates: {duplicates}")

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro