Skip to content

AWS Step Function Execution Error Fix

DodaTech Updated 2026-06-24 2 min read

In this tutorial, you'll learn about AWS Step Function Execution Error Fix. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.

Your Step Function execution fails with States.TaskFailed or States.RuntimeError — a state in your state machine encountered an error that was not handled by retry or catch policies.

Step-by-Step Fix

1. Check execution history

aws stepfunctions describe-execution --execution-arn arn:aws:states:us-east-1:123456789012:execution:my-state-machine:exec-123
aws stepfunctions get-execution-history --execution-arn arn:aws:states:us-east-1:123456789012:execution:my-state-machine:exec-123 --max-items 10

Expected output:

{
    "events": [
        {"type": "TaskFailed", "id": 5, "previousEventId": 4, "taskFailedEventDetails": {
            "resourceType": "lambda",
            "resource": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
            "error": "Lambda.ServiceException",
            "cause": "Internal error]
        }}
    ]
}

2. Add retry policy to the state definition

// Wrong: no retry policy, single failure ends execution
{
    "Type": "Task",
    "Resource": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
    "End": true
}

// Right: retry with exponential backoff
{
    "Type": "Task",
    "Resource": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
    "Retry": [
        {
            "ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "States.TaskFailed"],
            "IntervalSeconds": 2,
            "MaxAttempts": 3,
            "BackoffRate": 2.0
        }
    ],
    "Catch": [
        {
            "ErrorEquals": ["States.ALL"],
            "Next": "HandleFailure",
            "ResultPath": "$.error-info"
        }
    ],
    "End": true
}

3. Add a catch-all error handler state

{
    "Comment": "Order processing workflow",
    "StartAt": "ProcessOrder",
    "States": {
        "ProcessOrder": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:123456789012:function:process-order",
            "Catch": [
                {
                    "ErrorEquals": ["States.ALL"],
                    "Next": "NotifyFailure",
                    "ResultPath": "$.error"
                }
            ],
            "Next": "ShipOrder"
        },
        "ShipOrder": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:123456789012:function:ship-order",
            "End": true
        },
        "NotifyFailure": {
            "Type": "Task",
            "Resource": "arn:aws:lambda:us-east-1:123456789012:function:notify-failure",
            "End": true
        }
    }
}

4. Update the state machine

aws stepfunctions update-state-machine \
  --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:my-state-machine \
  --definition file://state-machine.json

5. Test with a known failing input

aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:my-state-machine \
  --input '{"test": true, "force_error": false}'

Prevention

  • Always include Retry and Catch blocks in every Task state.
  • Use States.ALL as a catch-all for unexpected errors.
  • Set ResultPath in catch blocks to preserve error information.
  • Test workflows with both valid and invalid inputs.
  • Monitor execution failures with CloudWatch alarms.

Common Mistakes with step function error

  1. Non-exhaustive pattern matches that compile with warnings then crash at runtime
  2. Misunderstanding that String is [Char] with poor performance for large text operations
  3. Using foldl instead of foldl' causing stack overflow on large lists

These mistakes appear frequently in real-world AWS code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

What is the difference between Retry and Catch in Step Functions?

Retry automatically retries the failed state with backoff. Catch redirects the execution to a different state when an error occurs. Use Retry for transient errors and Catch for business logic failures. |||Can I handle different error types differently? Yes, list multiple retry/catch entries with different ErrorEquals values. Step Functions matches errors in order, so put specific errors before States.ALL. |||How long can a Step Function execution run? The maximum execution duration is 1 year (31536000 seconds). Individual task states have a timeout that defaults to 60 seconds but can be set up to 3600 seconds.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro