Database Query Optimization for Web Applications
In this tutorial, you will learn how to optimize database queries to reduce web application response times, focusing on indexing strategies, connection pooling, query analysis, and result caching. Database queries are often the slowest part of a web request, accounting for 60 to 80 percent of server-side processing time. DodaTech reduced API response times by 70 percent through systematic database optimization.
What You Will Learn
- How to identify slow queries using database logs and EXPLAIN ANALYZE
- How to design effective indexes for common query patterns
- How to implement connection pooling for reduced latency
- How to cache query results with Redis to avoid repeated database hits
Why It Matters
Every web application that persists data relies on database queries. A single unoptimized query can hold up the entire request, increasing TTFB and degrading the user experience. DodaTech observed that optimizing the top 5 slowest queries reduced p95 API latency from 1.8 seconds to 450 milliseconds.
Real-World Use Case
DodaZIP analytics dashboard loaded a chart of downloads over time. The original query scanned 2 million rows and took 12 seconds. After adding a Composite index on (product_id, date) and using a materialized view for pre-aggregated data, the same query ran in 80 milliseconds.
Prerequisites
You should understand basic SQL syntax and how HTTP request-response flow works. Familiarity with PostgreSQL or MySQL is helpful.
Step-by-Step Tutorial
Step 1: Identify Slow Queries
Enable slow query logging in your database and use EXPLAIN ANALYZE to understand query execution.
-- PostgreSQL: enable slow query logging
SET log_min_duration_statement = 200; -- Log queries over 200ms
-- Analyze a specific query
EXPLAIN ANALYZE
SELECT p.name, COUNT(o.id) as order_count
FROM products p
LEFT JOIN orders o ON p.id = o.product_id
WHERE p.created_at > '2026-01-01'
GROUP BY p.name
ORDER BY order_count DESC
LIMIT 20;
Expected output: A query plan showing each step (sequential scan, index scan, hash join) with actual time and row counts. Look for sequential scans on large tables as the primary optimization target.
Step 2: Add Appropriate Indexes
Indexes speed up WHERE clauses, JOIN conditions, and ORDER BY operations. Analyze query patterns before creating indexes.
-- Before: sequential scan on 2 million rows
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;
-- Seq Scan on orders (cost=0.00..35000.00 rows=50 width=100)
-- After: add an index on the filtered column
CREATE INDEX idx_orders_user_id ON orders (user_id);
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 12345;
-- Index Scan using idx_orders_user_id (cost=0.28..8.75 rows=50 width=100)
Expected improvement: The query changes from a sequential scan (35000 cost) to an index scan (8.75 cost), reducing execution time from approximately 500ms to under 5ms.
Step 3: Design Composite Indexes for Multi-Column Queries
When queries filter on multiple columns, a Composite index is more efficient than separate single-column indexes.
-- Query that filters on user_id and date
SELECT * FROM orders
WHERE user_id = 12345
AND created_at BETWEEN '2026-01-01' AND '2026-06-01';
-- Composite index covering both columns
CREATE INDEX idx_orders_user_date ON orders (user_id, created_at);
-- Columns order matters: put equality conditions first, range conditions second
Step 4: Optimize JOIN Queries
Ensure JOIN columns are indexed on both sides of the join.
-- Add indexes on foreign key columns
CREATE INDEX idx_orders_user_id ON orders (user_id);
CREATE INDEX idx_orders_product_id ON orders (product_id);
-- Then the JOIN uses indexed lookups instead of nested sequential scans
SELECT u.name, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.name;
Step 5: Use Connection Pooling
Creating a new database connection for every request adds 50-100ms of overhead. Connection pooling reuses connections.
// Before: direct connection per request (slow)
const { Client } = require('pg');
app.get('/api/data', async (req, res) => {
const client = new Client(DB_URL);
await client.connect(); // 50-100ms overhead
const result = await client.query('SELECT * FROM data');
await client.end();
res.json(result.rows);
});
// After: connection pool (fast)
const { Pool } = require('pg');
const pool = new Pool({
connectionString: DB_URL,
max: 20,
idleTimeoutMillis: 30000,
connectionTimeoutMillis: 2000
});
app.get('/api/data', async (req, res) => {
const result = await pool.query('SELECT * FROM data');
res.json(result.rows);
});
Expected improvement: First request with pool initialization may take 60ms, but subsequent queries take under 2ms for connection acquisition.
Step 6: Cache Query Results
Cache frequently accessed, rarely changing data in Redis to avoid database hits entirely.
const redis = require('redis');
const cache = redis.createClient();
async function getProductList() {
const cacheKey = 'products:all';
const cached = await cache.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
const result = await pool.query(
'SELECT id, name, price FROM products WHERE active = true'
);
await cache.setEx(cacheKey, 300, JSON.stringify(result.rows));
return result.rows;
}
app.get('/api/products', async (req, res) => {
const products = await getProductList();
res.json(products);
});
Expected behavior: The first request hits the database (200ms). Subsequent requests within 5 minutes hit Redis (2ms).
Step 7: Use Pagination Instead of Large Offsets
Large OFFSET values force the database to scan and discard rows. Use keyset pagination instead.
-- Bad: large OFFSET (database scans all 100k rows)
SELECT * FROM orders ORDER BY id LIMIT 20 OFFSET 100000;
-- Good: keyset pagination (database scans only the next 20 rows)
SELECT * FROM orders
WHERE id > 100000
ORDER BY id
LIMIT 20;
Step 8: Materialize Expensive Aggregations
For dashboards and reports that run the same aggregations repeatedly, use materialized views or summary tables.
-- Create a materialized view for pre-aggregated data
CREATE MATERIALIZED VIEW daily_sales AS
SELECT
product_id,
DATE(created_at) as sale_date,
COUNT(*) as order_count,
SUM(amount) as total_revenue
FROM orders
GROUP BY product_id, DATE(created_at)
WITH DATA;
-- Refresh it periodically
REFRESH MATERIALIZED VIEW daily_sales;
-- Queries are now instant
SELECT * FROM daily_sales WHERE sale_date > CURRENT_DATE - 7;
Learning Path
flowchart LR A[Reduce TTFB] --> B[Database Query Optimization] B --> C[Caching Strategies] B --> D[APM and Monitoring] C --> E[Performance Testing] style B fill:#4f46e5,color:#fff style A fill:#6366f1,color:#fff style C fill:#6366f1,color:#fff
Common Errors
Adding indexes without analyzing query patterns: Indexes that are never used waste disk space and slow down write operations. Always verify index usage with EXPLAIN ANALYZE.
Using SELECT * in production queries: Selecting unnecessary columns wastes database and network resources. Always specify only the columns you need.
Creating too many indexes on frequently updated tables: Each index adds overhead to INSERT, UPDATE, and DELETE operations. Aim for 3-5 indexes per table maximum.
N+1 query problem in ORMs: An ORM query that loads a list of items and then queries the database for each items relation creates N+1 queries. Use eager loading or JOINs.
Not setting a statement timeout: A runaway query can block database resources indefinitely. Set
statement_timeoutin PostgreSQL to 10-30 seconds.Caching without invalidation Strategy: Cached data becomes stale. Implement TTL-based expiration or event-driven cache invalidation to ensure data freshness.
Practice Questions
- What is the difference between a sequential scan and an index scan?
- How does connection pooling reduce response time?
- What is the N+1 query problem and how do you fix it?
- Why is keyset pagination more efficient than offset-based pagination?
- When should you use a materialized view?
Answers: 1. Sequential scan reads every row in the table; index scan uses an index structure to find relevant rows directly. 2. Connection pooling reuses existing database connections, eliminating the 50-100ms overhead of creating a new connection per request. 3. N+1 occurs when an ORM loads a parent entity and then executes a separate query for each child; fix it by using JOINs or eager loading. 4. Keyset pagination uses a WHERE clause on an indexed column to filter rows, avoiding the need to scan and discard rows as OFFSET does. 5. When expensive aggregations run frequently on slowly changing data; the materialized view precomputes and stores the results.
Challenge
Set up a PostgreSQL database with a table containing 1 million sample orders. Write a slow query that takes over 5 seconds, use EXPLAIN ANALYZE to identify the bottleneck, add the appropriate index, and verify the query now runs under 50ms. Then implement Redis caching for that query result and compare response times.
FAQ
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro