Hadoop MapReduce Job Slow Fix
In this tutorial, you'll learn about Hadoop MapReduce Job Slow Fix. We cover key concepts, practical examples, and best practices.
A MapReduce job runs much slower than expected:
Map 100% reduce 0% (Stuck at shuffle phase)
Elapsed time: 45 minutes and counting
Slow MapReduce jobs are usually caused by data skew (one reducer gets most of the data), too few reducers, excessive spills to disk, or improper compression. The MapReduce job tracker UI shows the progress of each task.
Step-by-Step Fix
1. Check the JobTracker UI
RIGHT — analyze the running job:
http://jobtracker-host:8088/cluster
Look at:
- Map phase duration: Are all mappers taking similar time?
- Reduce phase: Is one reducer at 99% while others are done?
- Shuffle bytes: Large shuffle indicates data skew
- Spilled records: High spill count = inefficient memory use
2. Fix data skew with custom partitioner
WRONG — using default hash partitioner on a skewed key:
// Default: job.setPartitionerClass(HashPartitioner.class);
RIGHT — implement a custom partitioner:
public class SkewAwarePartitioner extends Partitioner<Text, IntWritable> {
@Override
public int getPartition(Text key, IntWritable value, int numPartitions) {
String k = key.toString();
if (k.startsWith("hot_key")) {
// Distribute hot keys across multiple partitions
return (k.hashCode() & Integer.MAX_VALUE) % numPartitions;
}
return super.getPartition(key, value, numPartitions);
}
}
job.setPartitionerClass(SkewAwarePartitioner.class);
3. Increase reducers for large data
WRONG — default 1 reducer:
# Default: job.setNumReduceTasks(1);
RIGHT — set appropriate reducer count:
# General formula: 0.95 * (node_count * max_reducers_per_node)
hadoop jar myjob.jar MyJob -D mapreduce.job.reduces=50
Or in Java:
job.setNumReduceTasks(50);
4. Enable compression
WRONG — no intermediate compression causes heavy disk I/O:
RIGHT — enable map output compression:
-D mapreduce.map.output.compress=true
-D mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.SnappyCodec
And job output compression:
-D mapreduce.output.fileoutputformat.compress=true
-D mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.GzipCodec
5. Tune memory and JVM settings
WRONG — default 1GB memory per mapper:
# Increase mapper/reducer memory
-D mapreduce.map.memory.mb=2048
-D mapreduce.reduce.memory.mb=4096
-D mapreduce.map.java.opts="-Xmx1638m"
-D mapreduce.reduce.java.opts="-Xmx3276m"
6. Use Combine for map-side aggregation
WRONG — all aggregation done in reducer:
// No combiner set — all data goes to shuffle
RIGHT — add a combiner:
job.setCombinerClass(MyReducer.class);
// The combiner is the reducer class run on the map side
// for local aggregation, reducing shuffle data
// Or write a dedicated combiner:
public class MyCombiner extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context) {
int sum = 0;
for (IntWritable val : values) sum += val.get();
context.write(key, new IntWritable(sum));
}
}
Expected output: the MapReduce job completes in significantly less time.
Prevention
- Analyze data distribution before running jobs.
- Use custom partitioners for skewed data.
- Always enable intermediate compression for production jobs.
- Set appropriate reducer count (0.95 * node capacity).
- Use Combiners to reduce shuffle data volume.
Common Mistakes with mapreduce slow
- Placing the wildcard pattern first in case expressions, making all subsequent patterns unreachable
- Using
headandtailinstead of pattern matching, causing runtime errors on empty lists - Forgetting that lazy evaluation defers computation until the value is forced, causing space leaks with unevaluated thunks
These mistakes appear frequently in real-world HADOOP code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.
Practice Exercise
Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.
This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro