
Why should I get rid of unused indexes?
Everybody knows that a database index is a good thing because it can speed up SQL queries. But this does not come for free.
The disadvantages of indexes are:
- Indexes use up space. It is not unusual for database indexes to use as much storage space as the data themselves. And the kind of reliable, fast storage you want for a database is not necessarily cheap.
The space used up by indexes also increases the size and duration of physical backups. - Indexes slow down data modification. Whenever you
INSERT
into orDELETE
from a table, all indexes have to be modified, in addition to the table itself (the “heap”).
And it is much more expensive to modify the complicated data structure of an index than the heap itself, which has its name precisely because it is basically an unordered “pile” of data (and as everybody knows, maintaining order is more work than having a mess). Modifying an indexed table can easily be an order of magnitude more expensive than modifying an unindexed table. - Indexes prevent HOT updates. Because of the architecture of PostgreSQL, every
UPDATE
causes a new row version (“tuple”) to be written, and that causes a new entry in every index on the table.
This behavior has been dubbed “write amplification” and has drawn a lot of fire. This undesirable effect can be avoided if a) the new tuple fits into the same table block as the old one and b) no indexed column is modified. Then PostgreSQL creates the new tuple as a “Heap Only Tuple” (hence HOT), which is much more efficient and also reduces the workVACUUM
has to do.
The many uses of indexes
Now we know that we don’t want unnecessary indexes. The problem is that indexes serve so many purposes that it is difficult to determine if a certain index is needed or not.
Here is a list of all benefits of indexes in PostgreSQL:
- Indexes can speed up queries that use indexed columns (or expressions) in the
WHERE
clause.
Everybody knows that one!
The traditional B-tree index supports the<
,<=
,=
,>=
and>
operators, while the many other index types in PostgreSQL can support more exotic operators like “overlaps” (for ranges or geometries), “distance” (for words) or regular expression matches. - B-tree indexes can speed up the
max()
andmin()
aggregates. - B-tree indexes can speed up
ORDER BY
clauses. - Indexes can speed up joins. This depends on the “join strategy” chosen by the optimizer: hash joins, for example, will never make use of an index.
- A B-tree index on the origin of a
FOREIGN KEY
constraint avoids a sequential scan when rows are deleted (or keys modified) in the target table. A scan on the origin of the constraint is necessary to make sure that the constraint will not be violated by the modification. - Indexes are used to enforce constraints. Unique B-tree indexes are used to enforce
PRIMARY KEY
andUNIQUE
constraints, while exclusion constraints use GiST indexes. - Indexes can provide the optimizer with better value distribution statistics.
If you create an index on an expression,ANALYZE
and the autoanalyze daemon will not only collect statistics for the data distribution in table columns, but also for each expression that occurs in an index. This helps the optimizer to get a good estimate for the “selectivity” of complicated conditions that contain the indexed expression, which causes better plans to be chosen. This is a widely ignored benefit of indexes!
Find the unused indexes!
The following query that we at CYBERTEC use will show you all indexes that serve none of the above mentioned purposes.
It makes use of the fact that all uses of indexes in the above list with the exception of the last two result in an index scan.
For completeness’ sake, I have to add that the parameter track_counts
has to remain “on” for the query to work, otherwise index usage is not tracked in pg_stat_user_indexes
. But you must not change that parameter anyway, otherwise autovacuum will stop working.
To find the indexes that have never been used since the last statistics reset with pg_stat_reset()
, use
SELECT s.schemaname, s.relname AS tablename, s.indexrelname AS indexname, pg_relation_size(s.indexrelid) AS index_size FROM pg_catalog.pg_stat_user_indexes s JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid WHERE s.idx_scan = 0 -- has never been scanned AND 0 <>ALL (i.indkey) -- no index column is an expression AND NOT i.indisunique -- is not a UNIQUE index AND NOT EXISTS -- does not enforce a constraint (SELECT 1 FROM pg_catalog.pg_constraint c WHERE c.conindid = s.indexrelid) AND NOT EXISTS -- is not an index partition (SELECT 1 FROM pg_catalog.pg_inherits AS inh WHERE inh.inhrelid = s.indexrelid) ORDER BY pg_relation_size(s.indexrelid) DESC;
Some remarks:
- Don’t do that on your test database, but on the production database!
- If your software is running at several customer sites, run the query on all of them.
Different users have different ways to use a software, which can cause different indexes to be used. - You can replace
s.idx_scan = 0
in the query with a different condition, e.g.s.idx_scan < 10
. Indexes that are very rarely used are also good candidates for removal.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.