Get rid of your unused indexes!

Another way to get rid of unused indexes... — © Laurenz Albe 2018

Why should I get rid of unused indexes?

Everybody knows that a database index is a good thing because it can speed up SQL queries. But this does not come for free.

Table of Contents

The disadvantages of indexes are:

Indexes use up space. It is not unusual for database indexes to use as much storage space as the data themselves. And the kind of reliable, fast storage you want for a database is not necessarily cheap.
The space used up by indexes also increases the size and duration of physical backups.
Indexes slow down data modification. Whenever you INSERT into or DELETE from a table, all indexes have to be modified, in addition to the table itself (the “heap”).
And it is much more expensive to modify the complicated data structure of an index than the heap itself, which has its name precisely because it is basically an unordered “pile” of data (and as everybody knows, maintaining order is more work than having a mess). Modifying an indexed table can easily be an order of magnitude more expensive than modifying an unindexed table.
Indexes prevent HOT updates. Because of the architecture of PostgreSQL, every UPDATE causes a new row version (“tuple”) to be written, and that causes a new entry in every index on the table.
This behavior has been dubbed “write amplification” and has drawn a lot of fire. This undesirable effect can be avoided if a) the new tuple fits into the same table block as the old one and b) no indexed column is modified. Then PostgreSQL creates the new tuple as a “Heap Only Tuple” (hence HOT), which is much more efficient and also reduces the work VACUUM has to do.

The many uses of indexes

Now we know that we don't want unnecessary indexes. The problem is that indexes serve so many purposes that it is difficult to determine if a certain index is needed or not.

Here is a list of all benefits of indexes in PostgreSQL:

Indexes can speed up queries that use indexed columns (or expressions) in the WHERE clause.
Everybody knows that one!
The traditional B-tree index supports the <, <=, =, >= and > operators, while the many other index types in PostgreSQL can support more exotic operators like “overlaps” (for ranges or geometries), “distance” (for words) or regular expression matches.
B-tree indexes can speed up the max() and min() aggregates.
B-tree indexes can speed up ORDER BY clauses.
Indexes can speed up joins. This depends on the “join strategy” chosen by the optimizer: hash joins, for example, will never make use of an index.
A B-tree index on the origin of a FOREIGN KEY constraint avoids a sequential scan when rows are deleted (or keys modified) in the target table. A scan on the origin of the constraint is necessary to make sure that the constraint will not be violated by the modification.
Indexes are used to enforce constraints. Unique B-tree indexes are used to enforce PRIMARY KEY and UNIQUE constraints, while exclusion constraints use GiST indexes.
Indexes can provide the optimizer with better value distribution statistics.
If you create an index on an expression, ANALYZE and the autoanalyze daemon will not only collect statistics for the data distribution in table columns, but also for each expression that occurs in an index. This helps the optimizer to get a good estimate for the “selectivity” of complicated conditions that contain the indexed expression, which causes better plans to be chosen. This is a widely ignored benefit of indexes!

Find the unused indexes!

The following query that we at CYBERTEC use will show you all indexes that serve none of the above mentioned purposes.

It makes use of the fact that all uses of indexes in the above list with the exception of the last two result in an index scan.

For completeness' sake, I have to add that the parameter track_counts has to remain “on” for the query to work, otherwise index usage is not tracked in pg_stat_user_indexes. But you must not change that parameter anyway, otherwise autovacuum will stop working.

To find the indexes that have never been used since the last statistics reset with pg_stat_reset(), use

SELECT s.schemaname,
       s.relname AS tablename,
       s.indexrelname AS indexname,
       pg_relation_size(s.indexrelid) AS index_size
FROM pg_catalog.pg_stat_user_indexes s
   JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid
WHERE s.idx_scan = 0      -- has never been scanned
  AND 0 <>ALL (i.indkey)  -- no index column is an expression
  AND NOT i.indisunique   -- is not a UNIQUE index
  AND NOT EXISTS          -- does not enforce a constraint
         (SELECT 1 FROM pg_catalog.pg_constraint c
          WHERE c.conindid = s.indexrelid)
  AND NOT EXISTS          -- is not an index partition
         (SELECT 1 FROM pg_catalog.pg_inherits AS inh
          WHERE inh.inhrelid = s.indexrelid)
ORDER BY pg_relation_size(s.indexrelid) DESC;

SELECT s.schemaname,

s.relname AS tablename,

s.indexrelname AS indexname,

pg_relation_size(s.indexrelid) AS index_size

FROM pg_catalog.pg_stat_user_indexes s

JOIN pg_catalog.pg_index i ON s.indexrelid = i.indexrelid

WHERE s.idx_scan = 0 -- has never been scanned

AND 0 <>ALL (i.indkey) -- no index column is an expression

AND NOT i.indisunique -- is not a UNIQUE index

AND NOT EXISTS -- does not enforce a constraint

(SELECT 1 FROM pg_catalog.pg_constraint c

WHERE c.conindid = s.indexrelid)

AND NOT EXISTS -- is not an index partition

(SELECT 1 FROM pg_catalog.pg_inherits AS inh

WHERE inh.inhrelid = s.indexrelid)

ORDER BY pg_relation_size(s.indexrelid) DESC;

Some remarks:

Don't do that on your test database, but on the production database!
If your software is running at several customer sites, run the query on all of them.
Different users have different ways to use a software, which can cause different indexes to be used.
You can replace s.idx_scan = 0 in the query with a different condition, e.g. s.idx_scan < 10. Indexes that are very rarely used are also good candidates for removal.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

5 3 votes

Article Rating

32 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Sebastien Caunes

1 year ago

This does not work for index on partitioned tables.
SQL Error [2BP01]: ERROR: cannot drop index [partition_index] because index [main_table_index] requires it Indice : You can drop index [main_table_index] instead.

How can I get the list of all index that I can delete on main tables ?

laurenz

1 year ago

Reply to Sebastien Caunes

Ah, that's something I forgot.
It will be difficult to find all partitioned indexes that can be dropped, because you'd have to check if all index partitions are unused. I have modified my query to exclude index partitions.

Sebastien Caunes

1 year ago

Reply to laurenz

I wrote this query to get all index of tables and summary of index of partitionned tables :

select * from (
-- Index of partitionned tables
select
'partitioned index' as indextype,
nsp.nspname as schemaname,
table_class.relname as tablename,
parent_class.relname as indexname,
index_columns.idx_columns as idx_columns,
seek_childs.nb_child_index,
seek_childs.nb_scans
from pg_class parent_class
join pg_index parent_index on parent_index.indexrelid = parent_class.oid
join pg_namespace nsp on nsp.oid = parent_class.relnamespace -- to get schemaname
join pg_class table_class on table_class.oid = parent_index.indrelid
, lateral (
select count(stats_child.idx_scan) as nb_child_index, sum(stats_child.idx_scan) as nb_scans
from pg_catalog.pg_stat_user_indexes stats_child
join pg_inherits pi on pi.inhrelid = stats_child.indexrelid
where pi.inhparent = parent_class.oid
) seek_childs
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = parent_class.oid
) index_columns
where parent_class.relkind = 'I'
AND 0 ALL (parent_index.indkey) -- no index column is an expression
AND NOT parent_index.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc WHERE cc.conindid = parent_index.indexrelid)
and table_class.relname not like '%template'
union
-- Index of regular tables
select
'regular index' as indextype,
stats_child.schemaname,
stats_child.relname AS tablename,
c.relname as indexname,
index_columns.idx_columns as idx_columns,
null as nb_child_index,
stats_child.idx_scan as id_scan_count
from pg_class c
join pg_index idx_parent on idx_parent.indexrelid = c.oid
join pg_catalog.pg_stat_user_indexes stats_child on c.oid = stats_child.indexrelid
, LATERAL (
SELECT string_agg(attname, ', ' order by attnum) AS idx_columns
FROM pg_attribute
WHERE attrelid = c.oid
) index_columns
where c.relkind = 'i'
AND 0 ALL (idx_parent.indkey) -- no index column is an expression
AND NOT idx_parent.indisunique -- is not a UNIQUE index
AND NOT EXISTS -- does not enforce a constraint
(SELECT 1 FROM pg_catalog.pg_constraint cc
WHERE cc.conindid = idx_parent.indexrelid)
AND NOT EXISTS -- is not a child index
(SELECT 1 FROM pg_inherits pi
where pi.inhrelid = c.oid)
and stats_child.relname not like '%template'
) all_index;

Sebastien Caunes

1 year ago

Reply to laurenz

I tried to write the query : look for all partitioned index having(max(child.idx_scan<=0)) ... My brain hurts ! 🙂
I'll try again later

Siraj

2 years ago

Hi Laurenz, this showed me a couple of indexes as unused, but it doesn't make sense because:
1- Without those indexes some of our regularly used queries would take hours to complete
2- EXPLAIN on queries with parameters matching customer usage, show the index being unused, funny thing after running EXPLAIN the idx_scan starts to go > 0

What could be the cause of this ?

laurenz

2 years ago

Reply to Siraj

What you report doesn't make sense to me, sorry.
Running EXPLAIN doesn't change anything, so there is no explanation for that behavior.
I can only assume some trivial mistake, like running the query against a test database.

Siraj

2 years ago

Reply to laurenz

Okay so i ran EXPLAIN ANALYZE sorry i was misleading here which explains the stats incrementing each run, but not sure why they were 0 to begin with, do they get reset regularly these stats ?

laurenz

2 years ago

Reply to Siraj

EXPLAIN (ANALYZE) also doesn't change the table statistics, so that won't change execution plans.
VACUUM (ANALYZE) would do that.
The statistics that my query relies on get reset with the function pg_stat_reset(). You shouldn't call that too often.

Siraj

2 years ago

Reply to laurenz

no one has ever called pg_stat_reset()
Running EXPLAIN ANALYZE on a query that uses an index 100% causes the index usage stats to be incremented, i've confirmed this multiple times now.

laurenz

2 years ago

Reply to Siraj

Ah, sorry, I was thinking in the wrong direction. Since EXPLAIN (ANALYZE) executes the query, it will of course increase the index scan count. To account for cases like this, you may want to use WHERE s.idx_scan < 10 rather than WHERE s.idx_scan = 0 in my query.

Twinkle Shah

3 years ago

Hi Laurenz, If we get list of tables from this query, would you suggest removing indexes from these tables ? How will it impact the performance ?

laurenz

3 years ago

Reply to Twinkle Shah

Yes, I recommend removing the indexes my query lists.
Performance (of data modifications) will be better afterwards.
Review the caveats in my article (run it on the production database, ...).

Twinkle Shah

3 years ago

Reply to laurenz

sure. will do. Also with 200 concurrent users, postgresSQL show 99% CPU usage. Any idea what could be the reason for that ?

laurenz

3 years ago

Reply to Twinkle Shah

This is quite unrelated and would require investigation by a consultant.

Srinivasarao Oguri

4 years ago

This is not showing indexes created on part of json object.

laurenz

4 years ago

Reply to Srinivasarao Oguri

Such indexes are expression indexes. I explained in point 7 of "The many uses of indexes" that such indexes serve another purpose too: they make PostgreSQL collect statistics on the indexed expression. So there is a chance that dropping the index might cause worse query estimates. My query only returns indexes that are guaranteed to be useless.

Bhavneet. Kamal

4 years ago

Is there a way to drop all indexes on a particular table? without hardcoding all the index names

laurenz

4 years ago

Reply to Bhavneet. Kamal

With psql, you can use

SELECT format('DROP INDEX %I;', indexrelid) FROM pg_index WHERE indrelid = 'mytab'::regclass gexec

That will work for all indexes that don't back a constraint — for these it will fail.

Bhavneet. Kamal

4 years ago

Reply to laurenz

I have a table that has 5 indexes on it. I truncate that table and insert a massive amount of data in that table. before i do the insertion, i would like to drop all indexes on that table. The query you provided, i guess i am not really understanding it correctly. I don't see where i pass my table name in it. Indrelid seems to be a numeric field and not a text. def. missing some code here

laurenz

4 years ago

Reply to Bhavneet. Kamal

Substitute your table name instead of 'mytab' in my code sample.

Bhavneet. Kamal

4 years ago

Reply to laurenz

https://uploads.disquscdn.com/images/ffaf157b24574f074488b7e1be76f1b0bda32ae45225bee9467d549b0b5bcfd4.png

I am as new as anyone can be with postgreSQL. first time using Postgresql. Is there anything i am missing with this? Do i need some additional plugin installation?

David Piscitelli

5 years ago

Why "AND 0 ALL (i.indkey)", functionnal indexes are not listed ?

Thx

laurenz

5 years ago

Reply to David Piscitelli

pg_index.indkey contains the list of table columns on which the index is defined.
If an index column is an expressen, the corresponding entry is 0, and the expression is stored in pg_index.indexprs.
The condition in my query means “no entry in indkey is 0”.

David Piscitelli

5 years ago

Reply to laurenz

ok, but why exclude non "simple" indexes ?

laurenz

5 years ago

Reply to David Piscitelli

As I explained in the article, they may serve a purpose even if they aren't scanned.
PostgreSQL collects statistics for the indexed expression, which can result in better execution plans.

David Piscitelli

5 years ago

Reply to laurenz

ok, thx !

sdfsdg

5 years ago

How can I use the output from this query to also drop the indexes?

laurenz

5 years ago

Reply to sdfsdg

You can use psql's gexec feature and modify the SELECT list as follows:

SELECT format('DROP INDEX %I./I', s.schemaname, s.indexrelname)
FROM ... gexec

gexec executes the query result as SQL commands.

Georg Klimm

6 years ago

Sure that this works with partial indexes?

laurenz

6 years ago

Reply to Georg Klimm

Yes, this also works for partial indexes.
They show up in pg_stat_user_indexes just like all other indexes.

AdminDBA

11 months ago

One of my client have a table with 25 Indexes. I want to find out which is not being used last 1 month and need to recommend for dropping the same. Is there any script available to find the index last used date

laurenz

10 months ago

Reply to AdminDBA

No, there isn't. Get the use count and remember it, then look at the use count again one month from now and compare the numbers.

Stay tuned with our

Get rid of your unused indexes!

Why should I get rid of unused indexes?

The many uses of indexes

Find the unused indexes!

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts