Skip to content

Hadoop HDFS Disk Space Full Fix

DodaTech Updated 2026-06-24 3 min read

In this tutorial, you'll learn about Hadoop HDFS Disk Space Full Fix. We cover key concepts, practical examples, and best practices.

HDFS operations fail with:

java.io.IOException: No space left on device

DataNode disks are full, or the HDFS capacity threshold has been reached. HDFS reserves space for non-DFS usage (mapreduce intermediate data, logs). When the DFS-used percentage approaches 100%, the NameNode stops accepting write requests.

Step-by-Step Fix

1. Check HDFS disk usage

WRONG — checking the wrong metric:

RIGHT — get the full report:

hdfs dfsadmin -report
Configured Capacity: 10 TB
DFS Used: 9.5 TB (95%)
Non DFS Used: 0.3 TB (3%)
DFS Remaining: 0.2 TB (2%)

If DFS Used > 90%, take action.

2. Find and delete large files

# Find largest directories
hdfs dfs -du -h / | sort -rh | head -10

# Find largest files
hdfs dfs -du -h /data/ | sort -rh | head -10

# Check trash size
hdfs dfs -du -h /user/$USER/.Trash/

# Empty trash
hdfs dfs -expunge

3. Adjust disk space thresholds

WRONG — waiting until disks are completely full:

<property>
    <name>dfs.datanode.du.reserved</name>
    <value>10737418240</value>  <!-- Reserve 10GB per disk -->
</property>
<property>
    <name>dfs.datanode.failed.volumes.tolerated</name>
    <value>1</value>  <!-- Tolerate 1 failed volume -->
</property>

Also set the NameNode space threshold:

<property>
    <name>dfs.namenode.fs-limits.min-block-size</name>
    <value>1048576</value>
</property>

4. Balance disk usage across DataNodes

# Start the balancer (moves blocks from full to empty nodes)
hdfs balancer -threshold 10

# Check balancer status
hdfs balancer -status

The balancer runs in the background. It moves blocks between DataNodes to balance disk usage within the threshold percentage.

5. Archive or compress old data

# Compress infrequently accessed data
hdfs dfs -text /data/old_data.csv | gzip > /tmp/old_data.csv.gz
hdfs dfs -put /tmp/old_data.csv.gz /archive/
hdfs dfs -rm /data/old_data.csv

# Set replication factor lower for archival data
hdfs dfs -setrep -w 1 /archive/

6. Configure disk space quotas

# Set quota on a directory
hdfs dfsadmin -setSpaceQuota 1T /user/project

# Check quotas
hdfs dfsadmin -clrSpaceQuota /user/project

# Set file count quota
hdfs dfsadmin -setQuota 10000 /user/project

# Report quota usage
hdfs dfs -count -q /user/project

Output:

QUOTA   REM_QUOTA     SPACE_QUOTA REM_SPACE_QUOTA    DIR_COUNT  FILE_COUNT  CONTENT_SIZE PATH
10000       9995       1099511627776 1099511627776   1           5           1024        /user/project

Expected output: HDFS frees up enough space for ongoing operations.

Prevention

  • Monitor HDFS disk usage with alerts at 75%, 85%, and 95%.
  • Set disk quotas on user directories.
  • Configure dfs.datanode.du.reserved to prevent 100% disk usage.
  • Run hdfs balancer weekly to keep usage balanced.
  • Enable HDFS trash with TTL for user-recoverable deletions.

Common Mistakes with hdfs disk

  1. Overlapping type class instances that cause GHC to reject the program with ambiguous dispatch errors
  2. Non-exhaustive pattern matches that compile with warnings then crash at runtime
  3. Misunderstanding that String is [Char] with poor performance for large text operations

These mistakes appear frequently in real-world HADOOP code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### What happens when HDFS runs out of space?

Write operations fail with "No space left on device." Read operations still work. The NameNode stops accepting block reports from DataNodes that are full. Deleting files is the immediate fix — compressed or archived data frees up even more space.

How do I find the biggest space consumers?

Use hdfs dfs -du -h / sorted by size. Also check the HDFS UI for per-node disk usage. The hdfs dfsadmin -report command shows per-DataNode disk usage — look for individual DataNodes that are much fuller than others.

Can I add more disks to a DataNode without restarting?

HDFS automatically recognizes new volumes added to dfs.datanode.data.dir. Add the new path to the configuration, run hdfs dfsadmin -reconfigDataNode <host>:50020 start, and the DataNode adds the new volume without restart.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro