Skip to content

Airflow Sensor Timeout Fix

DodaTech Updated 2026-06-24 3 min read

In this tutorial, you'll learn about Airflow Sensor Timeout Fix. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

A sensor task runs forever:

wait_for_file = FileSensor(
    task_id="wait_for_file",
    filepath="/data/input.csv",
    poke_interval=60,  # Check every 60 seconds
    timeout=3600,      # Timeout after 1 hour
)

The sensor times out after 1 hour and the task fails, but it appeared to be "running" the entire time. Sensors use poke mode by default, which blocks a worker slot for the entire duration. If timeout is not set or is too short, the sensor either runs forever or fails prematurely.

Step-by-Step Fix

1. Set both poke_interval and timeout

WRONG — missing or zero timeout:

sensor = S3KeySensor(
    task_id="wait_for_s3_file",
    bucket_key="data/input.csv",
    poke_interval=60,  # Checks every 60s
    # timeout not set — waits forever!
)

RIGHT — set an appropriate timeout:

sensor = S3KeySensor(
    task_id="wait_for_s3_file",
    bucket_key="data/input.csv",
    poke_interval=30,   # Check every 30 seconds
    timeout=7200,       # Stop after 2 hours
    soft_fail=True,     # Skip instead of fail on timeout
)

2. Use mode="reschedule" for long-running sensors

WRONG — using default mode="poke" for sensors that wait hours:

sensor = S3KeySensor(
    task_id="wait_for_file",
    poke_interval=300,  # 5 minutes
    timeout=86400,      # 24 hours
    # mode="poke" — blocks a worker slot for 24 hours!
)

RIGHT — use reschedule mode:

sensor = S3KeySensor(
    task_id="wait_for_file",
    poke_interval=300,
    timeout=86400,
    mode="reschedule",  # Frees the worker slot between pokes
)

In reschedule mode, the task releases its slot between checks, so workers can Process other tasks.

3. Use efficient poke_interval

WRONG — checking too frequently:

poke_interval=5  # Every 5 seconds — unnecessary for file wait

RIGHT — match to the expected availability:

# For a file expected within 1 hour
poke_interval=60  # Every minute is fine

# For a file expected within 24 hours
poke_interval=300  # Every 5 minutes is sufficient

4. Use deferrable operators

Airflow 2.2+ supports deferrable operators that use async triggers:

sensor = S3KeySensor(
    task_id="wait_for_file",
    bucket_key="data/input.csv",
    deferrable=True,  # Uses async trigger (no worker slot)
    poke_interval=60,
    timeout=86400,
)

This is the most efficient approach — zero worker slot usage while waiting.

5. Implement a custom sensor with exponential backoff

class BackoffFileSensor(BaseSensorOperator):
    def __init__(self, filepath, max_wait=86400, **kwargs):
        super().__init__(**kwargs)
        self.filepath = filepath
        self.max_wait = max_wait

    def poke(self, context):
        elapsed = (datetime.utcnow() - context["task_instance"].start_date).total_seconds()
        if elapsed > self.max_wait:
            return True  # Stop waiting
        return os.path.exists(self.filepath)

6. Handle external task sensor timeout

wait_for_dag = ExternalTaskSensor(
    task_id="wait_for_other_dag",
    external_dag_id="upstream_dag",
    external_task_id="final_task",
    timeout=3600,
    allowed_states=["success"],
    failed_states=["failed", "skipped"],
    execution_delta=timedelta(hours=1),  # Look for specific execution date
)

Expected output: sensor completes when the condition is met, or fails gracefully on timeout.

Prevention

  • Always set timeout on sensor tasks.
  • Use mode="reschedule" for sensors that may wait longer than a few minutes.
  • Use deferrable operators when available (Airflow 2.2+).
  • Set soft_fail=True so timeout doesn't cause a DAG failure.
  • Monitor sensor tasks with alerts if they approach the timeout.

Common Mistakes with sensor timeout

  1. Forgetting deriving (Show, Eq) on custom data types needed for debugging
  2. Placing the wildcard pattern first in case expressions, making all subsequent patterns unreachable
  3. Using head and tail instead of pattern matching, causing runtime errors on empty lists

These mistakes appear frequently in real-world AIRFLOW code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### What's the difference between poke and reschedule mode?

Poke mode keeps the worker slot occupied while the sensor runs. Reschedule mode releases the slot between pokes and re-schedules the task later. Use poke for short waits (<5 minutes), reschedule for longer waits.

Why does my sensor timeout even though the file exists?

The sensor might be checking a different path than where the file exists. Check the exact path being polled. Also, some sensors (like S3KeySensor) check for key existence using the exact key path — a trailing slash or missing prefix can cause mismatches.

How do I make a sensor timeout without failing the DAG?

Set soft_fail=True. When the timeout expires, the sensor marks itself as "skipped" instead of "failed." Downstream tasks with trigger_rule="all_done" will still execute.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro