Hive Partition Pruning Not Working Fix
In this tutorial, you'll learn about Hive Partition Pruning Not Working Fix. We cover key concepts, practical examples, and best practices.
A query on a partitioned Hive table scans all partitions:
SELECT * FROM orders
WHERE order_date = '2024-06-24';
But the query still scans every partition.
Hive partition pruning only works when the WHERE clause includes partition columns with simple predicates (equality, IN, BETWEEN). Functions on partition columns, casts, or joins that don't push down the partition filter can prevent pruning. The EXPLAIN output shows whether partition pruning is applied.
Step-by-Step Fix
1. Check with EXPLAIN
WRONG — guessing whether pruning is happening:
RIGHT — verify with EXPLAIN:
EXPLAIN SELECT * FROM orders WHERE order_date = '2024-06-24';
Look for:
Partition Pruning: true → Pruning is working
Partition Filters: order_date = '2024-06-24'
# If you see:
# TableScan on orders (no partition filter)
# → Pruning is NOT working
2. Avoid functions on partition columns
WRONG — wrapping the partition column in a function:
SELECT * FROM orders
WHERE YEAR(order_date) = 2024; -- Function on partition column
RIGHT — compare directly:
SELECT * FROM orders
WHERE order_date >= '2024-01-01'
AND order_date < '2025-01-01'; -- Direct comparison
Or use date_add/sub for date ranges:
SELECT * FROM orders
WHERE order_date >= date_add('2024-06-24', -7)
AND order_date <= '2024-06-24';
3. Fix type mismatches
WRIGHT — partition column is string but filter uses number:
-- Partition column is STRING type
SELECT * FROM orders WHERE year = 2024; -- Int vs String mismatch
RIGHT — match the partition column type:
SELECT * FROM orders WHERE year = '2024'; -- String match
Or cast the value:
SELECT * FROM orders WHERE year = CAST(2024 AS STRING);
4. Use static partition pruning for dynamic partitioning
WRONG — subqueries may not push down:
SELECT * FROM orders
WHERE order_date IN (SELECT DISTINCT date FROM active_dates);
RIGHT — rewrite with explicit partition ranges:
SELECT o.* FROM orders o
JOIN active_dates a ON o.order_date = a.date;
Or use a derived table:
SELECT o.* FROM orders o
WHERE EXISTS (
SELECT 1 FROM active_dates a
WHERE a.date = o.order_date
);
5. Enable CBO (Cost-Based Optimization)
WRONG — CBO disabled (default in older Hive versions):
SET hive.cbo.enable=true; -- Enable cost-based optimizer
SET hive.compute.query.using.stats=true;
SET hive.stats.fetch.column.stats=true;
The CBO can push partition filters through complex query plans.
6. Partition by the right granularity
WRONG — too many or too few partitions:
-- Partitioned by hour: 365*24 = 8760 partitions per year
PARTITIONED BY (order_hour STRING)
-- Too many small partitions hurts performance
RIGHT — choose the right granularity:
-- Partitioned by month: 12 partitions per year
PARTITIONED BY (order_month STRING)
-- Or by date: 365 partitions per year
PARTITIONED BY (order_date STRING)
Expected output: the query scans only the relevant partitions.
Prevention
- Use EXPLAIN before running expensive queries to verify partition pruning.
- Avoid functions and type casts on partition columns in WHERE clauses.
- Partition at a granularity that matches common query patterns.
- Keep table and partition statistics up to date with
ANALYZE TABLE. - Enable CBO for better query optimization.
Common Mistakes with partition pruning
- Misunderstanding that
Stringis[Char]with poor performance for large text operations - Using
foldlinstead offoldl'causing stack overflow on large lists - Forgetting
deriving (Show, Eq)on custom data types needed for debugging
These mistakes appear frequently in real-world HIVE code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.
Practice Exercise
Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.
This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro