Skip to content

Grafana Tempo Ingester Out of Memory Fix

DodaTech Updated 2026-06-24 2 min read

In this tutorial, you'll learn about Grafana Tempo Ingester Out of Memory Fix. We cover key concepts, practical examples, and best practices.

Your Grafana Tempo ingester runs out of memory and crashes — the process is killed by OOM killer, or the ingester refuses new traces with ingester full errors. The ingester holds traces in memory before flushing to the backend, and heavy traffic can exhaust available memory.

The Problem

level=error msg="ingester full, refusing traces" component=ingester
FATAL: OOM killer terminated tempo-ingester (exit code 137)

The ingester cannot flush blocks to the backend fast enough, causing memory to accumulate until the process is killed.

Step-by-Step Fix

1. Reduce max block duration

ingester:
  lifecycler: ...
  trace_idle_period: 5s        # Default: 10s — flush idle traces sooner
  max_block_duration: 30s      # Default: 5m — flush blocks frequently
  max_block_bytes: 500_000_000 # 500MB max block size

2. Limit ingester memory usage

ingester:
  max_block_duration: 1m
  max_block_bytes: 256_000_000  # 256MB per block
  concurrent_flushes: 16        # More concurrent flushes
  flush_check_period: 10s       # Check for flush every 10s

3. Configure overrides for tenant limits

overrides:
  defaults:
    ingestion:
      rate_limit: 15000           # 15k spans/sec per tenant
      burst_size: 30000           # 30k burst
      max_traces_per_user: 10000  # Max traces in memory
    global:
      max_ingesters: 3

4. Scale ingesters horizontally

# Increase ingester replicas
ingester:
  replicas: 5  # Distribute load across more ingesters

5. Monitor ingester memory

// Memory usage per ingester
sum(container_memory_usage_bytes{container="tempo-ingester"}) by (pod)

// Blocks ready to flush
tempo_ingester_blocks_flush_queue_length

// Bytes received
rate(tempo_ingester_bytes_received_total[5m])

Expected metrics after tuning:

Memory usage: 2GB → 800MB (ingester)
Blocks flushed per minute: 10 → 120
Ingester full errors: 0

Prevention Tips

  • Set max_block_duration to 30s-1m for high-throughput services
  • Monitor tempo_ingester_blocks_flush_queue_length — should stay near zero
  • Use concurrent_flushes: 16 for faster backend writes
  • Align max_block_bytes with available memory per ingester
  • Scale ingesters as traffic grows

Common Mistakes with tempo ingester

  1. Non-exhaustive pattern matches that compile with warnings then crash at runtime
  2. Misunderstanding that String is [Char] with poor performance for large text operations
  3. Using foldl instead of foldl' causing stack overflow on large lists

These mistakes appear frequently in real-world GRAFANA code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### What causes the ingester to accumulate memory?

The ingester holds traces in memory until the block reaches max_block_duration or max_block_bytes. If the backend (S3/GCS) is slow to respond, blocks accumulate in the flush queue. Tune concurrent_flushes and reduce max_block_bytes to prevent accumulation.

How much memory does each ingester need?

As a rule of thumb: max_block_bytes * concurrent_flushes * 2 + 1GB overhead. For max_block_bytes: 256MB and concurrent_flushes: 16, allocate at least 9GB per ingester. Monitor actual usage and adjust.

Can I use persistent disk for ingester blocks?

Tempo ingester writes blocks to a WAL on disk before flushing. Use ephemeral storage with enough space for the WAL (typically 10-50GB). Blocks are deleted from the WAL after successful flush. If the ingester crashes, the WAL is replayed on restart.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro