Hadoop YARN Memory Allocation Fix
In this tutorial, you'll learn about Hadoop YARN Memory Allocation Fix. We cover key concepts, practical examples, and best practices.
A MapReduce or Spark job fails with:
Container [pid=12345,containerID=container_123] is running beyond virtual memory limits.
Killing container.
YARN kills containers when they exceed their allocated memory. This usually means the application requested less memory than it actually needs, or the cluster's memory configuration is too tight. YARN has both physical and virtual memory limits.
Step-by-Step Fix
1. Increase container memory
WRONG — default container memory is often too low (1GB):
# Defaults:
# yarn.scheduler.minimum-allocation-mb = 1024
# yarn.scheduler.maximum-allocation-mb = 8192
RIGHT — increase memory allocation:
# For the job:
-D mapreduce.map.memory.mb=2048
-D mapreduce.reduce.memory.mb=4096
-D mapreduce.map.java.opts="-Xmx1638m"
-D mapreduce.reduce.java.opts="-Xmx3276m"
In yarn-site.xml:
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>16384</value>
</property>
2. Disable or adjust virtual memory check
WRONG — strict virtual memory check kills valid containers:
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>true</value> <!-- Default -->
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value> <!-- Default ratio -->
</property>
RIGHT — increase the ratio or disable for development:
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>4</value> <!-- Allow 4x virtual memory vs physical -->
</property>
Or disable entirely (development only):
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
3. Calculate cluster memory capacity
# Total cluster memory = sum of all node memory
# Each node's YARN memory = total_ram - OS_overhead - non-YARN_services
# Example: 8 nodes, 32GB RAM each, 25% overhead
# Available: 32 * 0.75 = 24GB per node
# Total cluster: 8 * 24 = 192GB
<!-- Per node -->
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>24576</value> <!-- 24GB per node -->
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>8</value>
</property>
4. Configure spark.yarn.executor memory
For Spark on YARN:
--conf spark.executor.memory=4g
--conf spark.executor.memoryOverhead=1g
--conf spark.driver.memory=4g
--conf spark.yarn.executor.memoryOverhead=1024
5. Check for memory leaks
WRONG — containers slowly consume more memory over time:
RIGHT — analyze with logs:
# Check container logs for OOM errors
yarn logs -applicationId application_123_456 -containerId container_123
# Look for:
# GC overhead limit exceeded
# OutOfMemoryError: Java heap space
# Container killed on request. Exit code is 143
For memory leaks in MapReduce:
// In setup(), clear any cached data
@Override
protected void setup(Context context) {
// Clear any static caches
cache.clear();
}
6. Use fair or capacity scheduler for better allocation
# Configure capacity scheduler
-C mapreduce.job.queuename=high_mem
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,high_mem</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.high_mem.capacity</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.high_mem.maximum-capacity</name>
<value>100</value>
</property>
Expected output: containers run within memory limits without being killed.
Prevention
- Set realistic container memory based on job requirements.
- Monitor YARN UI (port 8088) for container kill reasons.
- Enable virtual memory checks with a reasonable ratio (3-4x).
- Reserve OS memory (25% of total RAM) for system processes.
- Use
yarn.nodemanager.resource.memory-mbto cap per-node allocation.
Common Mistakes with yarn memory
- Mixing let bindings with <- bindings in do notation, producing type errors
- Overlapping type class instances that cause GHC to reject the program with ambiguous dispatch errors
- Non-exhaustive pattern matches that compile with warnings then crash at runtime
These mistakes appear frequently in real-world HADOOP code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.
Practice Exercise
Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.
This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro