Elasticsearch Cluster Health RED Fix
In this tutorial, you'll learn about Elasticsearch Cluster Health RED Fix. We cover key concepts, practical examples, and best practices.
Your Elasticsearch cluster health shows RED — one or more primary shards are not assigned. The cluster cannot serve all data and search results may be incomplete.
The Problem
GET /_cluster/health
{
"cluster_name": "production",
"status": "red",
"number_of_nodes": 5,
"number_of_data_nodes": 3,
"active_primary_shards": 120,
"active_shards": 240,
"unassigned_shards": 3
}
Three primary shards are unassigned. The cluster lost a node containing those primaries, and no replica can be promoted because the replicas were on the same lost node.
Step-by-Step Fix
1. Identify unassigned shards
GET /_cat/shards?v&state=UNASSIGNED
GET /_cluster/allocation/explain
The allocation explain API tells you why each shard is unassigned — node offline, disk full, or allocation filtering.
2. Check for node failures
GET /_cat/nodes?v
If a node is missing, check its logs. If it's permanently gone:
# Reroute remaining replicas
POST /_cluster/reroute
{
"commands": [{
"allocate_stale_primary": {
"index": "my-index",
"shard": 0,
"node": "data-node-2",
"accept_data_loss": true
}
}]
}
3. Free disk space on data nodes
# Check disk usage
GET /_cat/allocation?v
# Force-merge old indices to free space
POST /my-index/_forcemerge?max_num_segments=1
# Delete old indices
DELETE /old-index-2024-01
4. Disable shard allocation recovery backpressure
# Temporarily set replicas to 0 to recover primaries
PUT /my-index/_settings
{
"index": {
"number_of_replicas": 0
}
}
# Then set them back
PUT /my-index/_settings
{
"index": {
"number_of_replicas": 1
}
}
5. Fix disk-based shard allocation
PUT /_cluster/settings
{
"transient": {
"cluster.routing.allocation.disk.watermark.low": "90%",
"cluster.routing.allocation.disk.watermark.high": "95%",
"cluster.routing.allocation.disk.watermark.flood_stage": "98%"
}
}
Expected output:
GET /_cluster/health
{
"status": "green",
"unassigned_shards": 0
}
Prevention Tips
- Keep at least 20% free disk space on each data node
- Configure cross-cluster replication for critical indices
- Set
index.number_of_replicas: 2for important data - Monitor disk watermark warnings before they reach flood stage
- Use hot-warm-cold architecture to move old data to cheaper storage
Common Mistakes with cluster red
- Using
foldlinstead offoldl'causing stack overflow on large lists - Forgetting
deriving (Show, Eq)on custom data types needed for debugging - Placing the wildcard pattern first in case expressions, making all subsequent patterns unreachable
These mistakes appear frequently in real-world ELASTICSEARCH code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.
Practice Exercise
Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.
This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro