AWS Step Function Execution Error Fix
In this tutorial, you'll learn about AWS Step Function Execution Error Fix. We cover key concepts, practical examples, and best practices to help you understand and apply this topic effectively.
Your Step Function execution fails with States.TaskFailed or States.RuntimeError — a state in your state machine encountered an error that was not handled by retry or catch policies.
Step-by-Step Fix
1. Check execution history
aws stepfunctions describe-execution --execution-arn arn:aws:states:us-east-1:123456789012:execution:my-state-machine:exec-123
aws stepfunctions get-execution-history --execution-arn arn:aws:states:us-east-1:123456789012:execution:my-state-machine:exec-123 --max-items 10
Expected output:
{
"events": [
{"type": "TaskFailed", "id": 5, "previousEventId": 4, "taskFailedEventDetails": {
"resourceType": "lambda",
"resource": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
"error": "Lambda.ServiceException",
"cause": "Internal error]
}}
]
}
2. Add retry policy to the state definition
// Wrong: no retry policy, single failure ends execution
{
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
"End": true
}
// Right: retry with exponential backoff
{
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:my-function",
"Retry": [
{
"ErrorEquals": ["Lambda.ServiceException", "Lambda.AWSLambdaException", "States.TaskFailed"],
"IntervalSeconds": 2,
"MaxAttempts": 3,
"BackoffRate": 2.0
}
],
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "HandleFailure",
"ResultPath": "$.error-info"
}
],
"End": true
}
3. Add a catch-all error handler state
{
"Comment": "Order processing workflow",
"StartAt": "ProcessOrder",
"States": {
"ProcessOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:process-order",
"Catch": [
{
"ErrorEquals": ["States.ALL"],
"Next": "NotifyFailure",
"ResultPath": "$.error"
}
],
"Next": "ShipOrder"
},
"ShipOrder": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:ship-order",
"End": true
},
"NotifyFailure": {
"Type": "Task",
"Resource": "arn:aws:lambda:us-east-1:123456789012:function:notify-failure",
"End": true
}
}
}
4. Update the state machine
aws stepfunctions update-state-machine \
--state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:my-state-machine \
--definition file://state-machine.json
5. Test with a known failing input
aws stepfunctions start-execution \
--state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:my-state-machine \
--input '{"test": true, "force_error": false}'
Prevention
- Always include
RetryandCatchblocks in every Task state. - Use
States.ALLas a catch-all for unexpected errors. - Set
ResultPathin catch blocks to preserve error information. - Test workflows with both valid and invalid inputs.
- Monitor execution failures with CloudWatch alarms.
Common Mistakes with step function error
- Non-exhaustive pattern matches that compile with warnings then crash at runtime
- Misunderstanding that
Stringis[Char]with poor performance for large text operations - Using
foldlinstead offoldl'causing stack overflow on large lists
These mistakes appear frequently in real-world AWS code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.
Practice Exercise
Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.
This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro