© Laurenz Albe 2025
Table of Contents
People often ask “How can I automatically rebuild by indexes regularly?” or “When should I rebuild my indexes in PostgreSQL?”. That always gives me the feeling that they want to solve a problem that isn't there. But the REINDEX
statement is certainly there for a reason, and sometimes it is perfectly reasonable to rebuild an index. In this article, I'll explain when it makes sense to rebuild an index and how you can get the relevant data to make that decision.
Yes, it considerably depends on the index type. If you consider B-tree indexes, hash indexes and BRIN indexes (to name just three), you will notice that their implementation is very different. Consequently, efficiency considerations will be quite different for these three index access methods. In this article, I will primarily discuss the standard B-tree indexes. But most of what I will say will also apply to the other “tree-shaped” index access methods: GIN, GiST and SP-GiST. I'll try to highlight the differences wherever appropriate.
When it comes to rebuilding data structures for better efficiency, many people have a vague idea of “defragmentation”. We want to get a deeper understanding!
The unit of storage for indexes, like for everything else in PostgreSQL, is the 8kB page. With all tree-shaped indexes, the search starts at the root page, which divides the search space into several segments (which can overlap in the case of GiST). In a B-tree index, where the entries are sorted, you search for the biggest entry less than your value and the smallest entry bigger than your value. The gap between these values points to the page you want to search on the next level of the index. You repeat that procedure until you reach a leaf page, then follow the item pointers of the matching entries to the table.
A simple B+ tree example by Grundprinzip - Own work, CC BY 3.0, Link
The performance of this search procedure is inversely proportional to the number of levels (the depth) of the index: the fewer levels, the better. The depth of the index decreases as the number of entries in a single index page (the fan-out) grows. Since an index page is 8kB in size, the fan-out depends on the size of an individual index entry and on how densely the entries are packed in the index page. To be precise, the index depth is proportional to the logarithm of the number of table tuples, where the fan-out is the basis of the logarithm. Since there can be hundreds of index entries on a page, B-tree indexes hardly ever exceed a depth of five.
There is little PostgreSQL can do about the size of an index entry (although index deduplication certainly can help). Consequently, the performance an index scan depends on how densely the index entries are packed on the pages.
If you access a single table row via a B-tree index, the performance impact of index bloat is negligible. In the worst case, you may have to read one more index page, because the index happens to be one level deeper than it would have to be. Since index scans access the intermediate pages frequently, they are usually cached, and the impact is small.
Now sometimes you don't use a B-tree index to search for a single table row. If many rows match the search condition, or if you use the index to support ORDER BY
, the query executor will perform a range scan: it will drill down to the beginning of the range and then keep scanning subsequent leaf pages (with are conveniently linked) until the end of the range.
The performance impact of bloat on an index range scan is higher: the denser the entries are packed, the fewer pages have to be scanned. But there is another consideration for a range scan that reads many index pages: if the leaf pages are next to each other on disk in their logical order, the operating system kernel will detect that and start prefetching the disk blocks that the index scan will read next. That can improve performance considerably, particularly with index-only scans.
In a table, you can always reuse empty space, because order does not matter. So you don't get much table bloat, unless you delete or update a larger portion of the table. In a tree-like index, the situation is different: an index entry will have to be on a certain index page.
If an index page is full, and an INSERT
needs to fit another entry, PostgreSQL can split the page. But if an UPDATE
or DELETE
removes an index entry, a subsequent VACUUM
will leave the page somewhat bloated. PostgreSQL reclaims index pages that become completely empty, but it makes no effort to consolidate sparsely populated pages.
It is normal for a B-tree index to be more bloated than a table. In fact, the default fillfactor
of 90 gives a B-tree index 10% of bloat from the beginning. The free space should reduce the need for an index page split. It is not unusual for the bloat in a B-tree index to reach about 70%. That alone is no reason to rebuild the index. What you should watch out for is if the bloat keeps increasing all the time.
If an index has 60% bloat and you rebuild it, chances are that it will perform slightly better for a little while, but quickly revert to its customary bloat. In such a case, rebuilding the index offers little benefit. And don't forget that REINDEX
is not exactly free either: even if you run it with CONCURRENTLY
, it will consume resources on your database machine.
To measure how your B-tree index is doing, use the pgstattuple extension. The function pgstatindex()
gives you information about a B-tree index:
avg_leaf_density
is the average percentage to which index pages are filled with entries, or 100 minus the index bloat.leaf_fragmentation
is the percentage of index pages that are physically after the logically next page on the same index level (the right sibling). It is a measure for how often a range scan has to “jump backward”.If avg_leaf_density
is very low, and particularly if it keeps shrinking, rebuilding the index is indicated. If a high value of leaf_fragmentation
is a reason to rebuild the index largely depends on whether you perform larger index range scans or not. I'd say that avg_leaf_density
is the more important measure of the two.
Note that pgstatindex()
scans the entire index, so it will definitely consume system resources. If you build it into your monitoring system, don't run that probe every ten minutes. Once a week is probably enough, preferably during times of reduced load on the database.
Sadly, the pgstattuple extension doesn't offer comparative diagnostic functions for other tree-like index types. There, you'll have to resort to circumstantial evidence, such as inordinate growth of the index. For the more exotic index types I lack the experience to say with confidence if they can degrade over time or not.
pgstatindex()
Let's create a table with a million rows. I'll use an UNLOGGED
table, because WAL doesn't matter here, and I disable autovacuum so that I can demonstrate the effects of VACUUM
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
CREATE UNLOGGED TABLE large (id) WITH (autovacuum_enabled = off) AS SELECT * FROM generate_series(1::bigint, 1000000); CREATE INDEX large_idx ON large (id); CREATE EXTENSION pgstattuple; SELECT tree_level, deleted_pages, avg_leaf_density, leaf_fragmentation FROM pgstatindex('large_idx'); tree_level | deleted_pages | avg_leaf_density | leaf_fragmentation ------------+---------------+------------------+-------------------- 2 | 0 | 90.01 | 0 |
You can see that the index is in perfect shape. Note that the index depth is only three (two plus the root node), even though the table is not exactly tiny. Let's delete a third of the rows:
1 2 3 4 5 6 7 8 |
DELETE FROM large WHERE id % 3 = 0; SELECT tree_level, deleted_pages, avg_leaf_density, leaf_fragmentation FROM pgstatindex('large_idx'); tree_level | deleted_pages | avg_leaf_density | leaf_fragmentation ------------+---------------+------------------+-------------------- 2 | 0 | 90.01 | 0 |
Nothing has changed, because the index entries are still in place. VACUUM
will change that:
1 2 3 4 5 6 7 8 |
VACUUM large; SELECT tree_level, deleted_pages, avg_leaf_density, leaf_fragmentation FROM pgstatindex('large_idx'); tree_level | deleted_pages | avg_leaf_density | leaf_fragmentation ------------+---------------+------------------+-------------------- 2 | 0 | 60.09 | 0 |
Now the bloat has increased as expected. No index page has become empty and could be deleted. leaf_fragmentation
didn't change, because the index pages are still in the original order. If PostgreSQL were to delete and reuse some empty pages, we would see a change there.
No matter what your previous experience with other relational database systems may, resist the temptation to automatically rebuild indexes regularly. Before you rebuild an index, use pgstatindex()
to test if that makes sense. You will find that usually, there is no need for a rebuild.
Leave a Reply