Three days after launching a new search feature, I got a Slack message from a teammate: "the search page is timing out for some users." I pulled up our monitoring dashboard. One query was taking 47 seconds — over the 30-second timeout — for users with large datasets. In development, where I had tested with a few hundred records, it had taken under 100 milliseconds.
I ran EXPLAIN ANALYZE on the query. The output showed a sequential scan on a table with 2.3 million rows. I had been filtering by a column with no index. Two minutes later, after adding the index and running VACUUM ANALYZE, the same query took 4 milliseconds. A 12,000× improvement from two lines of SQL.
I had been writing SQL for two years at that point. I understood what indexes were conceptually. But I did not fully understand how the database uses them until that incident forced me to learn properly. Here is what I should have known.
What an Index Actually Is
A database index is a separate data structure — typically a B-tree — that the database maintains alongside your table. Think of it as the index at the back of a book: instead of reading every page to find mentions of "query optimisation", you look up the term in the alphabetically sorted index and jump directly to the relevant pages.
A B-tree index on a column stores the column values in sorted order, with pointers to the corresponding rows in the main table. When you filter by that column, the database can traverse the B-tree in O(log n) time to find the matching rows, instead of scanning every row in the table in O(n) time. For a table with 2.3 million rows, that is the difference between 2.3 million row comparisons and roughly 21 comparisons.
When the Database Uses (and Does Not Use) Your Index
Adding an index does not guarantee the database will use it. The query planner makes a cost-based decision. It estimates whether a sequential scan or an index scan is faster given the current table statistics, the query, and the expected number of matching rows.
-- See exactly what PostgreSQL plans to do and how long it takes
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = 42 AND status = 'pending'
ORDER BY created_at DESC
LIMIT 10;
-- Look for: "Seq Scan" (no index used) vs "Index Scan" or "Index Only Scan"
-- Also check: "Rows Removed by Filter" — if this is large, you need an index
Cases where the database may choose a sequential scan over an existing index:
- The query matches a large fraction of the table (say, >10-20% of rows). Scanning many index nodes then fetching each row individually becomes slower than a straight sequential scan. The planner knows this.
- The table statistics are stale. Run
ANALYZE tablenameafter bulk inserts to update the planner's estimates. - The query uses a function on the indexed column.
WHERE LOWER(email) = 'user@example.com'does not use an index onemailbecause the function transforms the value. Use a functional index (CREATE INDEX ON users (LOWER(email))) or store the lowercased value directly.
Index Types and When to Use Each
B-tree (the default)
Used for equality, range, and ordering operations (=, <, >, BETWEEN, ORDER BY). This covers the vast majority of use cases. When in doubt, start here.
Composite (multi-column) Indexes
An index on (user_id, status) can serve queries that filter by user_id alone or by both user_id AND status. It cannot efficiently serve queries that filter by status alone (the left-most prefix rule). Order your columns with the most selective filter first, or the column most frequently used alone.
-- Supports: WHERE user_id = ?
-- Supports: WHERE user_id = ? AND status = ?
-- Does NOT efficiently support: WHERE status = ?
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
-- If you also need to filter by status alone, add a separate index
CREATE INDEX idx_orders_status ON orders (status);
Partial Indexes
Index only the rows that match a condition. Smaller index, faster updates, faster queries for the specific case. I use these often for "active" subsets of large tables:
-- Only index pending orders — completed orders are queried rarely
-- and this index stays small as orders complete
CREATE INDEX idx_orders_pending ON orders (created_at)
WHERE status = 'pending';
GIN Indexes for Full-Text Search and Arrays
B-tree does not work for text search or array containment queries. GIN (Generalized Inverted Index) is designed for these:
-- For full-text search
CREATE INDEX idx_posts_fts ON posts USING GIN (to_tsvector('english', content));
-- For array containment queries
CREATE INDEX idx_products_tags ON products USING GIN (tags);
The Costs of Indexes — What I Did Not Think About
Every index has costs that are easy to ignore when you are optimising for read performance:
- Write overhead. Every INSERT, UPDATE, and DELETE must also update each index on the table. A table with ten indexes on it will have significantly slower write throughput than one with two.
- Storage. Indexes occupy disk space — sometimes as much as the table itself for a wide, frequently-updated column.
- Index bloat. As rows are updated and deleted, indexes accumulate dead entries.
VACUUMreclaims this space in PostgreSQL, but high-churn tables need regular maintenance.
The practical implication: do not add indexes speculatively. Add them in response to measured slow queries or proven access patterns. Before adding an index, run EXPLAIN ANALYZE on the slow query, confirm it is doing a sequential scan where an index would help, add the index, and confirm the improvement. Premature indexing is a real maintenance cost.
My Process for Index Decisions
When I add a new feature that queries data in a new way, I ask myself:
- What columns am I filtering by?
- What columns am I ordering by?
- What is the expected table size at a year of production load?
- What percentage of rows does a typical query match?
If the expected table size is under 10,000 rows, I often do not bother — sequential scans are fast enough. Over 100,000 rows, any unindexed filter on a column with high cardinality (many distinct values) is worth indexing. Over a million rows, I treat every unindexed filter as a production incident waiting to happen.
Further Reading
- PostgreSQL Documentation: Indexes — the definitive reference for every index type PostgreSQL supports
- Use The Index, Luke — the best free resource on SQL indexing across databases, written for developers not DBAs
- EXPLAIN ANALYZE documentation — learn to read query plans, the most valuable diagnostic skill in SQL work