Database Performance Troubleshooting -- Slow Queries, Connection Pooling & Indexing
Database performance degradation hits applications hard -- slow queries, exhausted connection pools, and replication lag turn fast APIs into unusable services -- this guide shows you how to diagnose, optimize, and prevent these issues in MySQL and PostgreSQL.
What You'll Learn
Why It Matters
A single slow query can block hundreds of others, cascade across services, and trigger downtime alerts at 3 AM. Knowing how to identify the bottleneck using query analysis, system metrics, and tuning parameters is essential for any backend developer or DBA.
Real-World Use
When your API response time jumps from 50ms to 5 seconds, your application reports "too many connections", or your read replicas fall minutes behind the primary during peak traffic, these optimization techniques restore performance.
Common Database Performance Issues Table
| Issue | Symptom | Cause | Diagnostic Tool |
|---|---|---|---|
| Slow query | High response time, CPU spike | Missing index, bad join, full table scan | EXPLAIN ANALYZE, slow query log |
| Connection pool exhaustion | "Too many connections" error | Connections not released, pool too small | SHOW PROCESSLIST, pg_stat_activity |
| Lock contention | Queries stuck in "Waiting" | Long-running transaction holding locks | SHOW ENGINE INNODB STATUS, pg_locks |
| Replication lag | Replica behind by minutes | Large write volume, slow replica CPU | SHOW SLAVE STATUS, pg_stat_Replication |
| Table bloat | Table size much larger than data | Dead tuples (PostgreSQL) or fragmented pages (MySQL) | VACUUM VERBOSE, OPTIMIZE TABLE |
| Buffer pool miss | High disk I/O, slow reads | InnoDB buffer pool too small | SHOW ENGINE INNODB STATUS buffer stats |
Step-by-Step Fixes
Fix 1: Find and Fix Slow Queries
-- MySQL: Enable slow query log
SET GLOBAL slow_query_log = ON;
SET GLOBAL long_query_time = 1;
-- Find top queries by total execution time
SELECT * FROM sys.statement_analysis
ORDER BY total_latency DESC
LIMIT 5;
-- Analyze query execution plan
EXPLAIN ANALYZE
SELECT o.id, o.total, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.created_at > '2026-01-01'
ORDER BY o.total DESC;
-- Add a composite index to speed it up
CREATE INDEX idx_orders_created_total
ON orders(created_at, total DESC);
Expected output:
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------+
| id | select_type | table | partitions | type | possible_keys | key | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------------+
| 1 | SIMPLE | orders| NULL | ALL | NULL | NULL | 500000 | 33.33 | Using where |
+----+-------------+-------+------------+------+---------------+------+---------+------+----------+----------------+
Fix 2: Resolve Connection Pool Exhaustion
# MySQL: Check current connections
mysql -e "SHOW STATUS LIKE 'Threads_connected';"
mysql -e "SHOW VARIABLES LIKE 'max_connections';"
# PostgreSQL: Check current connections
psql -c "SELECT count(*) FROM pg_stat_activity;"
psql -c "SHOW max_connections;"
# Kill idle connections in MySQL
mysql -e "SELECT CONCAT('KILL ', id, ';') FROM information_schema.PROCESSLIST WHERE COMMAND = 'Sleep' AND TIME > 300;" | tail -n +2 | mysql
# Python: Configure connection pooling properly
from sqlalchemy import create_engine
# Good pool settings for a web app
engine = create_engine(
"mysql://user:pass@host/db",
pool_size=10, # Base connections
max_overflow=10, # Extra connections allowed
pool_timeout=30, # Seconds to wait for a connection
pool_recycle=1800, # Recycle connections after 30 min
pool_pre_ping=True # Check connection before using
)
Expected output:
+-----------------+-------+
| Variable_name | Value |
+-----------------+-------+
| Threads_connected | 142 |
| max_connections | 200 |
+-----------------+-------+
Fix 3: Investigate Lock Contention
-- MySQL: Check current locks
SELECT * FROM performance_schema.data_locks\G
-- Find blocking transactions
SELECT r.trx_id AS waiting_trx,
r.trx_mysql_thread_id AS waiting_thread,
b.trx_mysql_thread_id AS blocking_thread
FROM performance_schema.data_lock_waits w
JOIN information_schema.INNODB_TRX r ON w.requesting_trx_id = r.trx_id
JOIN information_schema.INNODB_TRX b ON w.blocking_trx_id = b.trx_id;
-- PostgreSQL: Check blocked queries
SELECT blocked.pid AS blocked_pid,
blocked.query AS blocked_query,
blocking.pid AS blocking_pid,
blocking.query AS blocking_query
FROM pg_stat_activity blocked
JOIN pg_stat_activity blocking ON blocking.pid = ANY(pg_blocking_pids(blocked.pid));
Expected output:
waiting_trx | waiting_thread | blocking_thread
12345 | 45 | 42
Fix 4: Diagnose Replication Lag
-- MySQL: Check replication status
SHOW SLAVE STATUS\G
-- PostgreSQL: Check replication status
SELECT application_name, state, sync_state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes
FROM pg_stat_replication;
# Monitor replication lag over time
watch -n 5 "mysql -e 'SHOW SLAVE STATUS\G' | grep -E 'Seconds_Behind_Master|Read_Master_Log_Pos'"
# Force sync if lag is critical (MySQL)
STOP SLAVE;
SET GLOBAL sql_slave_skip_counter = 1;
START SLAVE;
Expected output:
Seconds_Behind_Master: 125
Master_Log_File: mysql-bin.000042
Read_Master_Log_Pos: 123456789
Fix 5: Reduce Table and Index Bloat
-- PostgreSQL: Vacuum and analyze
VACUUM VERBOSE ANALYZE orders;
-- Check table bloat percentage
SELECT schemaname, tablename,
n_dead_tup,
n_live_tup,
round(n_dead_tup * 100.0 / GREATEST(n_live_tup + n_dead_tup, 1), 2) AS dead_pct
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY dead_pct DESC;
-- MySQL: Optimize table (rebuilds and defragments)
OPTIMIZE TABLE orders;
-- Check table fragmentation
SELECT table_name, ROUND(data_length / 1024 / 1024, 2) AS data_mb,
ROUND(index_length / 1024 / 1024, 2) AS index_mb,
ROUND(data_free / 1024 / 1024, 2) AS free_mb
FROM information_schema.TABLES
WHERE table_schema = 'mydb' AND data_free > 1048576;
Expected output:
INFO: vacuuming "public.orders"
INFO: "orders": removed 50000 dead row versions
Database Performance Flowchart
flowchart TD
A[Database Performance Issue] --> B{Check metric}
B -->|Slow queries| C[Enable slow query log]
C --> D[Run EXPLAIN ANALYZE]
D --> E[Add missing index or rewrite query]
B -->|Connection errors| F[Check thread count]
F --> G[Increase max_connections or pool size]
G --> H[Fix connection leak in app code]
B -->|High lock waits| I[Find blocking transactions]
I --> J[Optimize transaction duration]
J --> K[Kill long-running blocking queries]
B -->|Replication lag| L[Check slave status]
L --> M[Upgrade replica or reduce write load]
B -->|Disk I/O high| N[Check buffer pool size]
N --> O[Increase innodb_buffer_pool_size]
E --> P[Performance Restored]
H --> P
K --> P
M --> P
O --> P
Prevention Tips
- Set
long_query_time = 1in production to capture all queries taking over 1 second - Use connection pooling in your application layer with
pool_pre_ping=Trueto avoid stale connections - Schedule
VACUUM(PostgreSQL) orOPTIMIZE TABLEduring low-traffic windows for write-heavy tables - Monitor replication lag with Prometheus and alert if it exceeds 60 seconds
- Set
innodb_buffer_pool_sizeto 70-80% of available RAM for MySQL dedicated servers - Use
pt-query-digest(Percona Toolkit) to analyze slow query logs offline
Practice Questions
How do you find which queries are consuming the most database time in MySQL? Answer: Enable the slow query log with
SET GLOBAL slow_query_log = ONandlong_query_time = 1. Then usept-query-digestor querymysql.slow_logdirectly. For real-time analysis, usesys.statement_analysis.What is the difference between
max_connectionsand connection pooling? Answer:max_connectionsis the hard limit set on the database server -- no more connections than this can be established. Connection pooling reuses a limited set of database connections across multiple application requests, allowing hundreds of concurrent users without hittingmax_connections.How do you determine whether replication lag is caused by network latency or slow query execution on the replica? Answer: Check
Seconds_Behind_Master(MySQL) or lag bytes (PostgreSQL). If the replica's CPU is high during lag, queries are the bottleneck. If CPU is low but lag exists, network bandwidth or disk I/O on the replica may be the issue.Challenge: Write a SQL query that identifies the top 10 most-blocked queries in PostgreSQL by total wait time. Answer:
SELECT blocked.pid, blocked.query, blocked.wait_event_type, blocked.state, COUNT(blocking.pid) AS blockers, blocked.query_start FROM pg_stat_activity blocked LEFT JOIN LATERAL ( SELECT unnest(pg_blocking_pids(blocked.pid)) AS blocking_pid ) blockers ON TRUE WHERE blocked.wait_event_type = 'Lock' GROUP BY blocked.pid, blocked.query, blocked.wait_event_type, blocked.state, blocked.query_start ORDER BY COUNT(blocking.pid) DESC LIMIT 10;
Quick Reference
| Issue | Diagnostic | Resolution |
|---|---|---|
| Slow query | EXPLAIN ANALYZE <query> |
Add index, optimize joins |
| Connection pool full | SHOW PROCESSLIST |
Increase pool, fix leaks |
| Lock contention | SHOW ENGINE INNODB STATUS |
Kill blocker, shorten transactions |
| Replication lag | SHOW SLAVE STATUS |
Reduce writes, upgrade replica |
| Table bloat | pg_stat_user_tables |
VACUUM ANALYZE or OPTIMIZE TABLE |
FAQ
What is the ideal size for InnoDB buffer pool on a dedicated database server?
Set innodb_buffer_pool_size to 70-80% of total RAM on a dedicated MySQL server, but leave enough memory for OS cache, connections, and internal buffers. For servers with 64GB RAM, start with 48GB. Monitor buffer pool hit rate -- if it drops below 95%, consider increasing the size.
How do you handle connection surges that temporarily exceed max_connections?
Use a connection pooler like PgBouncer (PostgreSQL) or ProxySQL (MySQL) that sits between your application and the database. It maintains a persistent pool of database connections and queues application requests without exceeding the backend limit. Set max_connections 20-30% higher than your average peak to handle bursts.
Why does a query perform well in development but slow down in production?
The most common reasons are production data volume (full table scans on millions of rows vs thousands), production hardware configuration (different buffer pool, disk type, or memory), and concurrent query load (lock contention under real traffic). Always run EXPLAIN ANALYZE on production-like data volumes during development.
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro