Another way to get rid of unused indexes...
© Laurenz Albe 2018

Why should I get rid of unused indexes?

Everybody knows that a database index is a good thing because it can speed up SQL queries. But this does not come for free.

The disadvantages of indexes are:

  • Indexes use up space. It is not unusual for database indexes to use as much storage space as the data themselves. And the kind of reliable, fast storage you want for a database is not necessarily cheap.
    The space used up by indexes also increases the size and duration of physical backups.
  • Indexes slow down data modification. Whenever you INSERT into or DELETE from a table, all indexes have to be modified, in addition to the table itself (the “heap”).
    And it is much more expensive to modify the complicated data structure of an index than the heap itself, which has its name precisely because it is basically an unordered “pile” of data (and as everybody knows, maintaining order is more work than having a mess). Modifying an indexed table can easily be an order of magnitude more expensive than modifying an unindexed table.
  • Indexes prevent HOT updates. Because of the architecture of PostgreSQL, every UPDATE causes a new row version (“tuple”) to be written, and that causes a new entry in every index on the table.
    This behavior has been dubbed “write amplification” and has drawn a lot of fire. This undesirable effect can be avoided if a) the new tuple fits into the same table block as the old one and b) no indexed row is modified. Then PostgreSQL creates the new tuple as a “Heap Only Tuple” (hence HOT), which is much more efficient and also reduces the work VACUUM has to do.

The many uses of indexes

Now we know that we don’t want unnecessary indexes. The problem is that indexes serve so many purposes that it is difficult to determine if a certain index is needed or not.

Here is a list of all benefits of indexes in PostgreSQL:

  1. Indexes can speed up queries that use indexed columns (or expressions) in the WHERE clause.
    Everybody knows that one!
    The traditional B-tree index supports the <, <=, =, >= and > operators, while the many other index types in PostgreSQL can support more exotic operators like “overlaps” (for ranges or geometries), “distance” (for words) or regular expression matches.
  2. B-tree indexes can speed up the max() and min() aggregates.
  3. B-tree indexes can speed up ORDER BY clauses.
  4. Indexes can speed up joins. This depends on the “join strategy” chosen by the optimizer: hash joins, for example, will never make use of an index.
  5. A B-tree index on the origin of a FOREIGN KEY constraint avoids a sequential scan when rows are deleted (or keys modified) in the target table. A scan on the origin of the constraint is necessary to make sure that the constraint will not be not violated by the modification.
  6. Indexes are used to enforce constraints. Unique B-tree indexes are used to enforce PRIMARY KEY and UNIQUE constraints, while exclusion constraints use GiST indexes.
  7. Indexes can provide the optimizer with better value distribution statistics.
    If you create an index on an expression, ANALYZE and the autoanalyze daemon will not only collect statistics for the data distribution in table columns, but also for each expression that occurs in an index. This helps the optimizer to get a good estimate for the “selectivity” of complicated conditions that contain the indexed expression, which causes better plans to be chosen.This is a widely ignored benefit of indexes!

Find the unused indexes!

The following query that we at Cybertec use will show you all indexes that serve none of the above mentioned purposes.

It makes use of the fact that all uses of indexes in the above list with the exception of the last two result in an index scan.

For completeness’ sake, I have to add that the parameter track_counts has to remain “on” for the query to work, otherwise index usage is not tracked in pg_stat_user_indexes. But you must not change that parameter anyway, otherwise autovacuum will stop working.

To find the indexes that have never been used since the last statistics reset with pg_stat_reset(), use

SELECT s.schemaname,
       s.relname AS tablename,
       s.indexrelname AS indexname,
       pg_relation_size(s.indexrelid) AS index_size
FROM pg_catalog.pg_stat_user_indexes s
   JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0      -- has never been scanned
  AND 0 <>ALL (i.indkey)  -- no index column is an expression
  AND NOT EXISTS          -- does not enforce a constraint
         (SELECT 1 FROM pg_catalog.pg_constraint c
          WHERE c.conindid = s.indexrelid)
ORDER BY pg_relation_size(s.indexrelid) DESC;

Some remarks:

  • Don’t do that on your test database, but on the production database!
  • If your software it running at several customer sites, run the query on all of them.
    Different users have different ways to use a software, which can cause different indexes to be used.
  • You can replace s.idx_scan = 0 in the query with a different condition, e.g. s.idx_scan < 10. Indexes that are very rarely used are also good candidates for removal.