Elasticsearch Cluster Health RED Fix

DodaTech Updated 2026-06-24 2 min read

In this tutorial, you'll learn about Elasticsearch Cluster Health RED Fix. We cover key concepts, practical examples, and best practices.

Your Elasticsearch cluster health shows RED — one or more primary shards are not assigned. The cluster cannot serve all data and search results may be incomplete.

The Problem

GET /_cluster/health

{
  "cluster_name": "production",
  "status": "red",
  "number_of_nodes": 5,
  "number_of_data_nodes": 3,
  "active_primary_shards": 120,
  "active_shards": 240,
  "unassigned_shards": 3
}

Three primary shards are unassigned. The cluster lost a node containing those primaries, and no replica can be promoted because the replicas were on the same lost node.

Step-by-Step Fix

1. Identify unassigned shards

GET /_cat/shards?v&state=UNASSIGNED

GET /_cluster/allocation/explain

The allocation explain API tells you why each shard is unassigned — node offline, disk full, or allocation filtering.

2. Check for node failures

GET /_cat/nodes?v

If a node is missing, check its logs. If it's permanently gone:

# Reroute remaining replicas
POST /_cluster/reroute
{
  "commands": [{
    "allocate_stale_primary": {
      "index": "my-index",
      "shard": 0,
      "node": "data-node-2",
      "accept_data_loss": true
    }
  }]
}

3. Free disk space on data nodes

# Check disk usage
GET /_cat/allocation?v

# Force-merge old indices to free space
POST /my-index/_forcemerge?max_num_segments=1

# Delete old indices
DELETE /old-index-2024-01

4. Disable shard allocation recovery backpressure

# Temporarily set replicas to 0 to recover primaries
PUT /my-index/_settings
{
  "index": {
    "number_of_replicas": 0
  }
}

# Then set them back
PUT /my-index/_settings
{
  "index": {
    "number_of_replicas": 1
  }
}

5. Fix disk-based shard allocation

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "98%"
  }
}

Expected output:

GET /_cluster/health
{
  "status": "green",
  "unassigned_shards": 0
}

Prevention Tips

Keep at least 20% free disk space on each data node
Configure cross-cluster replication for critical indices
Set index.number_of_replicas: 2 for important data
Monitor disk watermark warnings before they reach flood stage
Use hot-warm-cold architecture to move old data to cheaper storage

Common Mistakes with cluster red

Using foldl instead of foldl' causing stack overflow on large lists
Forgetting deriving (Show, Eq) on custom data types needed for debugging
Placing the wildcard pattern first in case expressions, making all subsequent patterns unreachable

These mistakes appear frequently in real-world ELASTICSEARCH code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### What's the difference between RED, YELLOW, and GREEN cluster health?

GREEN = all primary and replica shards are assigned. YELLOW = all primaries assigned but some replicas are unassigned (cluster is fully functional but at risk). RED = one or more primary shards are unassigned (data loss, some queries may fail).

How do I recover data from a red index?

Use the allocation explain API to understand why primaries are unassigned. If the node is permanently gone, use allocate_stale_primary with accept_data_loss: true to promote the stale replica. Use snapshot restore as a last resort.

What causes the flood_stage disk watermark?

When disk usage exceeds the flood_stage watermark (default 95%), Elasticsearch blocks writes to all indices on that node. This protects the node from completely filling up but causes RED or YELLOW cluster health. Free disk space immediately to recover.

← Previous Elasticsearch Cross-Cluster Search Connection Refused Next → Elasticsearch Force Merge Segments Not Reducing Fix

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Quick Fix