Slow database queries are often the biggest bottleneck in application performance. A single poorly written query can bring down your entire system. This guide covers practical SQL optimization techniques that will help you write faster queries and build more responsive applications.
Key Takeaways
- 1Use EXPLAIN ANALYZE to understand query execution plans
- 2Index columns in WHERE, JOIN, and ORDER BY clauses
- 3Avoid functions on indexed columns in WHERE clauses
- 4Use keyset pagination instead of OFFSET for large datasets
- 5Batch large updates and deletes to avoid locks
1Understanding Query Execution with EXPLAIN
EXPLAIN shows how the database plans to execute your query. EXPLAIN ANALYZE actually runs it and shows real timing. This is your primary tool for query optimization.
-- Basic EXPLAIN (shows plan without executing)
EXPLAIN SELECT * FROM users WHERE email = 'user@example.com';
-- EXPLAIN ANALYZE (executes and shows actual timing)
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- PostgreSQL: More detailed output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE created_at > '2024-01-01';
-- MySQL: Extended information
EXPLAIN FORMAT=JSON SELECT * FROM products WHERE category_id = 5;**Key Metrics to Watch:**
- Seq Scan vs Index Scan - Sequential scans read entire table; index scans are faster
- Cost - Estimated expense (lower is better); first number is startup, second is total
- Rows - Estimated rows returned; large differences indicate outdated statistics
- Actual Time - Real execution time (with ANALYZE)
- Buffers - Pages read from cache (shared hit) vs disk (read)
-- Example EXPLAIN output interpretation
-- BAD: Sequential scan on large table
Seq Scan on orders (cost=0.00..15420.00 rows=500000)
Filter: (status = 'pending')
Rows Removed by Filter: 450000
-- GOOD: Index scan
Index Scan using idx_orders_status on orders (cost=0.42..85.50 rows=50000)
Index Cond: (status = 'pending')Run ANALYZE table_name periodically to update statistics. The query planner uses these statistics to choose optimal execution plans.
2Indexing Strategies
Indexes are the most important tool for query optimization. They allow the database to find rows without scanning the entire table.
-- Single column index
CREATE INDEX idx_users_email ON users(email);
-- Composite index (order matters!)
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Partial index (PostgreSQL) - index only matching rows
CREATE INDEX idx_orders_pending ON orders(created_at)
WHERE status = 'pending';
-- Covering index - includes columns for SELECT
CREATE INDEX idx_users_search ON users(email) INCLUDE (name, created_at);
-- Unique index
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);
-- Expression index
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Full-text index (PostgreSQL)
CREATE INDEX idx_products_search ON products
USING GIN(to_tsvector('english', name || ' ' || description));**When to Create Indexes:**
- Columns in WHERE clauses (especially equality checks)
- Columns used in JOIN conditions
- Columns in ORDER BY (to avoid sorting)
- Foreign key columns (speeds up JOINs and cascading deletes)
- Columns with high selectivity (many unique values)
**When to Avoid Indexes:**
- Small tables (sequential scan may be faster)
- Low selectivity columns (boolean, status with few values)
- Frequently updated columns (index maintenance overhead)
- Write-heavy tables with few reads
Composite index column order matters! Index on (user_id, created_at) works for WHERE user_id = 1 and WHERE user_id = 1 AND created_at > X, but NOT for WHERE created_at > X alone. Put most selective columns first.
Efficient Query Patterns
Small changes in query structure can have dramatic performance impacts. Learn these patterns to write faster queries.
-- AVOID: SELECT * (fetches unnecessary data)
SELECT * FROM users WHERE id = 1;
-- BETTER: Select only needed columns
SELECT id, name, email FROM users WHERE id = 1;
-- AVOID: N+1 queries in application code
-- Fetching users, then looping to fetch each user's orders
-- BETTER: Single query with JOIN
SELECT u.id, u.name, o.id as order_id, o.total
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
WHERE u.created_at > '2024-01-01';
-- AVOID: Multiple OR conditions on different columns
SELECT * FROM products WHERE name LIKE '%phone%' OR category = 'electronics';
-- BETTER: UNION for separate index usage
SELECT * FROM products WHERE name LIKE '%phone%'
UNION
SELECT * FROM products WHERE category = 'electronics';-- AVOID: OFFSET for deep pagination (scans skipped rows)
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- BETTER: Keyset/cursor pagination
SELECT * FROM orders
WHERE created_at < '2024-06-15 10:30:00'
ORDER BY created_at DESC
LIMIT 20;
-- For consistent ordering with duplicates
SELECT * FROM orders
WHERE (created_at, id) < ('2024-06-15 10:30:00', 12345)
ORDER BY created_at DESC, id DESC
LIMIT 20;-- AVOID: IN with subquery on large datasets
SELECT * FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 1000);
-- BETTER: EXISTS (stops at first match)
SELECT * FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.total > 1000
);
-- AVOID: NOT IN with NULLs (unexpected behavior)
SELECT * FROM users WHERE id NOT IN (SELECT user_id FROM banned_users);
-- BETTER: NOT EXISTS or LEFT JOIN + IS NULL
SELECT * FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM banned_users b WHERE b.user_id = u.id
);4Common Anti-Patterns to Avoid
These common mistakes prevent index usage and cause performance problems.
-- ANTI-PATTERN: Functions on indexed columns
-- Index on email is NOT used
SELECT * FROM users WHERE LOWER(email) = 'user@example.com';
-- SOLUTION: Function-based index or normalize data
CREATE INDEX idx_users_lower_email ON users(LOWER(email));
-- Or store lowercase in the column
-- ANTI-PATTERN: Implicit type conversion
-- Index on phone (varchar) may not be used
SELECT * FROM users WHERE phone = 1234567890;
-- SOLUTION: Match types
SELECT * FROM users WHERE phone = '1234567890';
-- ANTI-PATTERN: Leading wildcard
-- Index CANNOT be used
SELECT * FROM products WHERE name LIKE '%phone%';
-- SOLUTION: Full-text search or trailing wildcard
SELECT * FROM products WHERE name LIKE 'phone%';
-- Or use full-text search for substring matching-- ANTI-PATTERN: Negation operators (often full scan)
SELECT * FROM orders WHERE status != 'completed';
SELECT * FROM users WHERE deleted_at IS NOT NULL;
-- SOLUTION: Positive conditions when possible
SELECT * FROM orders WHERE status IN ('pending', 'processing', 'shipped');
-- Or partial index for common queries
CREATE INDEX idx_orders_active ON orders(id) WHERE status != 'completed';
-- ANTI-PATTERN: OR on different columns (may skip indexes)
SELECT * FROM users WHERE email = 'x' OR phone = 'y';
-- SOLUTION: UNION of indexed queries
SELECT * FROM users WHERE email = 'x'
UNION
SELECT * FROM users WHERE phone = 'y';SELECT COUNT(*) FROM large_table is slow on PostgreSQL (no index-only scan for count). For approximate counts, use pg_class.reltuples or maintain a counter table.
5Optimizing JOINs
JOIN performance depends on indexes, join order, and the amount of data being combined.
-- Ensure foreign keys are indexed
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_order_items_order_id ON order_items(order_id);
-- GOOD: Join on indexed columns
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id -- user_id is indexed
GROUP BY u.id, u.name;
-- Filter before joining (reduces join size)
SELECT u.name, o.total
FROM users u
JOIN (
SELECT user_id, total FROM orders
WHERE created_at > '2024-01-01' AND total > 100
) o ON o.user_id = u.id;
-- Use appropriate JOIN type
-- INNER JOIN: Only matching rows (fastest when you need matches only)
-- LEFT JOIN: All from left + matches from right
-- Avoid CROSS JOIN on large tables (cartesian product)
-- For many-to-many, ensure junction table has composite index
CREATE INDEX idx_user_roles_composite ON user_roles(user_id, role_id);| JOIN Type | Use Case | Performance Note |
|---|---|---|
| INNER JOIN | Only matching rows needed | Generally fastest |
| LEFT JOIN | All from left, matches from right | May return more rows |
| RIGHT JOIN | All from right, matches from left | Same as LEFT with reversed tables |
| FULL OUTER JOIN | All rows from both tables | Most expensive |
| CROSS JOIN | Cartesian product | Avoid on large tables |
6Efficient Aggregations
Aggregations (COUNT, SUM, AVG) over large datasets need careful optimization.
-- Use covering index for index-only aggregation
CREATE INDEX idx_orders_status_total ON orders(status, total);
-- Then this query uses index-only scan
SELECT status, SUM(total) FROM orders GROUP BY status;
-- For counting with conditions, use SUM + CASE
SELECT
COUNT(*) as total_orders,
SUM(CASE WHEN status = 'completed' THEN 1 ELSE 0 END) as completed,
SUM(CASE WHEN status = 'pending' THEN 1 ELSE 0 END) as pending
FROM orders
WHERE created_at > '2024-01-01';
-- Approximate COUNT for large tables (PostgreSQL)
SELECT reltuples::bigint AS estimate
FROM pg_class WHERE relname = 'orders';
-- Materialized view for expensive aggregations
CREATE MATERIALIZED VIEW order_stats AS
SELECT
DATE_TRUNC('day', created_at) as day,
status,
COUNT(*) as count,
SUM(total) as total
FROM orders
GROUP BY 1, 2;
-- Refresh periodically
REFRESH MATERIALIZED VIEW order_stats;
REFRESH MATERIALIZED VIEW CONCURRENTLY order_stats; -- No lockFor dashboards showing aggregated data, consider materialized views refreshed on a schedule. Real-time counts on tables with millions of rows will always be slow.
7Bulk Operations
Batch operations efficiently when dealing with large data changes.
-- AVOID: Single inserts in a loop
INSERT INTO logs (message) VALUES ('log 1');
INSERT INTO logs (message) VALUES ('log 2');
-- ... 10000 more
-- BETTER: Multi-row insert
INSERT INTO logs (message) VALUES
('log 1'),
('log 2'),
('log 3');
-- Batch in groups of 1000
-- AVOID: Updating millions of rows at once (locks, replication lag)
UPDATE orders SET processed = true WHERE processed = false;
-- BETTER: Batch updates
UPDATE orders SET processed = true
WHERE id IN (
SELECT id FROM orders
WHERE processed = false
LIMIT 1000
);
-- Repeat until done
-- PostgreSQL: COPY for bulk inserts (fastest)
COPY users(name, email) FROM '/path/to/data.csv' CSV HEADER;
-- For upserts, use ON CONFLICT
INSERT INTO users (email, name, updated_at)
VALUES ('user@example.com', 'John', NOW())
ON CONFLICT (email)
DO UPDATE SET name = EXCLUDED.name, updated_at = EXCLUDED.updated_at;Large updates/deletes can lock tables and cause replication lag. Always batch into smaller chunks (1000-10000 rows) with short pauses between batches.
Query Monitoring and Analysis
Find slow queries before they become problems with these monitoring techniques.
-- PostgreSQL: Enable slow query logging
-- In postgresql.conf:
-- log_min_duration_statement = 1000 -- Log queries over 1 second
-- PostgreSQL: pg_stat_statements for query analysis
CREATE EXTENSION pg_stat_statements;
-- Top queries by total time
SELECT
calls,
total_exec_time::int as total_ms,
mean_exec_time::int as avg_ms,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 20;
-- MySQL: Enable slow query log
-- SET GLOBAL slow_query_log = 'ON';
-- SET GLOBAL long_query_time = 1;
-- MySQL: Show running queries
SHOW FULL PROCESSLIST;
SELECT * FROM information_schema.processlist
WHERE time > 5; -- Queries running over 5 seconds
-- PostgreSQL: Current activity
SELECT pid, now() - pg_stat_activity.query_start AS duration, query
FROM pg_stat_activity
WHERE (now() - pg_stat_activity.query_start) > interval '5 seconds';
-- Kill long-running query
SELECT pg_cancel_backend(pid); -- Graceful
SELECT pg_terminate_backend(pid); -- ForceSet up alerts for queries exceeding thresholds. Common monitoring tools: pgBadger (PostgreSQL), Percona Toolkit (MySQL), or cloud-native options like RDS Performance Insights.
Boost Your Developer Workflow
Free online tools for encoding, formatting, hashing, and more.
Explore Dev ToolsFrequently Asked Questions
How do I know which queries need optimization?
Use slow query logs and pg_stat_statements (PostgreSQL) or Performance Schema (MySQL) to identify queries by total time, average time, and call frequency. Focus on queries that consume the most total time (calls x avg time).
Should I index every column in WHERE clauses?
No. Index columns that are queried frequently with high selectivity. Avoid indexing low-cardinality columns (boolean, status with few values) or columns that are updated frequently. Each index adds write overhead.
Why is my query slow even with an index?
Common reasons: index not being used (check EXPLAIN), outdated statistics (run ANALYZE), returning too many rows, index not covering the query, or data not fitting in memory. Also check for functions on indexed columns that prevent index usage.
When should I use a covering index?
When you frequently SELECT the same columns and want to avoid table lookups. A covering index INCLUDES all columns needed by the query, allowing index-only scans. Trade-off: larger index size.
How do I optimize COUNT queries on large tables?
PostgreSQL full table counts are inherently slow (MVCC). Options: use approximate count from pg_class.reltuples, maintain a counter table with triggers, use materialized views for aggregations, or cache counts at the application level.