Skip to content

Hive Partition Pruning Not Working Fix

DodaTech Updated 2026-06-24 3 min read

In this tutorial, you'll learn about Hive Partition Pruning Not Working Fix. We cover key concepts, practical examples, and best practices.

A query on a partitioned Hive table scans all partitions:

SELECT * FROM orders
WHERE order_date = '2024-06-24';

But the query still scans every partition.

Hive partition pruning only works when the WHERE clause includes partition columns with simple predicates (equality, IN, BETWEEN). Functions on partition columns, casts, or joins that don't push down the partition filter can prevent pruning. The EXPLAIN output shows whether partition pruning is applied.

Step-by-Step Fix

1. Check with EXPLAIN

WRONG — guessing whether pruning is happening:

RIGHT — verify with EXPLAIN:

EXPLAIN SELECT * FROM orders WHERE order_date = '2024-06-24';

Look for:

Partition Pruning: true  → Pruning is working
Partition Filters: order_date = '2024-06-24'

# If you see:
# TableScan on orders (no partition filter)
# → Pruning is NOT working

2. Avoid functions on partition columns

WRONG — wrapping the partition column in a function:

SELECT * FROM orders
WHERE YEAR(order_date) = 2024;  -- Function on partition column

RIGHT — compare directly:

SELECT * FROM orders
WHERE order_date >= '2024-01-01'
  AND order_date < '2025-01-01';  -- Direct comparison

Or use date_add/sub for date ranges:

SELECT * FROM orders
WHERE order_date >= date_add('2024-06-24', -7)
  AND order_date <= '2024-06-24';

3. Fix type mismatches

WRIGHT — partition column is string but filter uses number:

-- Partition column is STRING type
SELECT * FROM orders WHERE year = 2024;  -- Int vs String mismatch

RIGHT — match the partition column type:

SELECT * FROM orders WHERE year = '2024';  -- String match

Or cast the value:

SELECT * FROM orders WHERE year = CAST(2024 AS STRING);

4. Use static partition pruning for dynamic partitioning

WRONG — subqueries may not push down:

SELECT * FROM orders
WHERE order_date IN (SELECT DISTINCT date FROM active_dates);

RIGHT — rewrite with explicit partition ranges:

SELECT o.* FROM orders o
JOIN active_dates a ON o.order_date = a.date;

Or use a derived table:

SELECT o.* FROM orders o
WHERE EXISTS (
    SELECT 1 FROM active_dates a
    WHERE a.date = o.order_date
);

5. Enable CBO (Cost-Based Optimization)

WRONG — CBO disabled (default in older Hive versions):

SET hive.cbo.enable=true;  -- Enable cost-based optimizer
SET hive.compute.query.using.stats=true;
SET hive.stats.fetch.column.stats=true;

The CBO can push partition filters through complex query plans.

6. Partition by the right granularity

WRONG — too many or too few partitions:

-- Partitioned by hour: 365*24 = 8760 partitions per year
PARTITIONED BY (order_hour STRING)
-- Too many small partitions hurts performance

RIGHT — choose the right granularity:

-- Partitioned by month: 12 partitions per year
PARTITIONED BY (order_month STRING)

-- Or by date: 365 partitions per year
PARTITIONED BY (order_date STRING)

Expected output: the query scans only the relevant partitions.

Prevention

  • Use EXPLAIN before running expensive queries to verify partition pruning.
  • Avoid functions and type casts on partition columns in WHERE clauses.
  • Partition at a granularity that matches common query patterns.
  • Keep table and partition statistics up to date with ANALYZE TABLE.
  • Enable CBO for better query optimization.

Common Mistakes with partition pruning

  1. Misunderstanding that String is [Char] with poor performance for large text operations
  2. Using foldl instead of foldl' causing stack overflow on large lists
  3. Forgetting deriving (Show, Eq) on custom data types needed for debugging

These mistakes appear frequently in real-world HIVE code. DodaTech's contributors have identified these patterns through analysis of open-source projects and production systems.

Practice Exercise

Write a pure function that safely divides two integers using Maybe, then test it with edge cases like division by zero and negative numbers.

This exercise reinforces the concepts covered in this guide. Try implementing it before checking online solutions.

FAQ

### What partition pruning methods does Hive support?

Hive supports static pruning (WHERE clause on partition columns) and dynamic pruning (join conditions or subqueries on partition columns). Dynamic pruning works with CBO enabled and requires the partitioned table to be on the correct side of the join.

How do I know how many partitions a query scanned?

Check the "Number of accessed partitions" in the Tez or MR job counter. In the YARN UI, look at the Hive counter "HIVE_PARTITIONS_SCANNED." Also set SET hive.exec.tez.union.split.count=1 for more detailed output.

Does partition pruning work with ORC file format?

Yes. ORC further optimizes with predicate pushdown (PPD). Even within a single partition, ORC's min/max indexes skip reading stripes that don't match the filter. Combine partition pruning with ORC for the best performance.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro