Index bloat in bad hands
© Laurenz Albe 2021

PostgreSQL v12 brought more efficient storage for indexes, and v13 improved that even more by adding deduplication of index entries. But Peter Geoghegan is not done yet! PostgreSQL v14 will bring “bottom-up” index entry deletion, which is targeted at reducing unnecessary page splits, index bloat and fragmentation of heavily updated indexes.

Why do we get index bloat?

In a B-tree index, there is an index entry for every row version (“tuple”) in the table that is not dead (invisible to everybody). When VACUUM removes dead tuples, it also has to delete the corresponding index entries. Just like with tables, that creates empty space in an index page. Such space can be reused, but if no new entries are added to the page, the space remains empty.

This “bloat” is unavoidable and normal to some extent, but if it gets to be too much, the index will become less efficient:

  • for an index range scan, more pages have to be scanned
  • index pages cached in RAM means that you cache the bloat, which is a waste of RAM
  • fewer index entries per page mean less “fan out”, so the index could have more levels than necessary

This is particularly likely to happen if you update the same row frequently. Until VACUUM can clean up old tuples, the table and the index will contain many versions of the same row. This is particularly unpleasant if an index page fills up: then PostgreSQL will “split” the index page in two. This is an expensive operation, and after VACUUM is done cleaning up, we end up with two bloated pages instead of a single one.

Current features to improve index bloat and performance

HOT tuples

The creation of HOT tuples is perhaps the strongest weapon PostgreSQL has to combat unnecessary churn in the index. With this feature, an UPDATE creates tuples that are not referenced from an index, but only from the previous version of the table row. That way, there is no need to write a new index entry at all, which is good for performance and completely avoids index bloat.

Read more in my article about HOT updates.

Killing index tuples

When an index scan encounters an entry that points to a dead tuple in the table, it will mark the index entry as “killed”. Subsequent index scans will skip such entries even before VACUUM can remove them. Moreover, PostgreSQL can delete such entries when the index page is full, to avoid a page split.

See my article on killed index tuples for details.

How does v14 reduce index bloat even further?

“Bottom-up index tuple deletion” goes farther than the previous approaches: it deletes index entries that point to dead tuples right before an index page split is about to occur. This can reduce the number of index entries and avoid the expensive page split, together with the bloat that will occur later, when VACUUM cleans up.

In a way, this performs part of the work of VACUUM earlier, at a point where it is useful to avoid index bloat.

A test case

To demonstrate the effects of the new feature, I performed a custom pgbench run on PostgreSQL v13 and v14.

This is the table for the test:

CREATE TABLE testtab (
   id        bigint
      CONSTRAINT testtab_pkey PRIMARY KEY,
   unchanged integer,
   changed   integer
   SELECT i, i, 0
   FROM generate_series(1, 10000) AS i;
CREATE INDEX testtab_unchanged_idx ON testtab (unchanged);
CREATE INDEX testtab_changed_idx ON testtab (changed);

This is the pgbench script called “bench.sql”:

\set id random_gaussian(1, 10000, 10)
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;
UPDATE testtab SET changed = changed + 1 WHERE id = :id;

I chose a normal distribution, because in real life there are usually some (recent?) table rows that receive more updates than others. The row is updated ten times, in order to make it more likely that the affected page will have to be split.

I run the script 60000 times (10000 iterations by 6 clients) as follows:

pgbench -n -c 6 -f bench.sql -t 10000 test

Comparing the test results

We use the pgstattuple extension to get index statistics with psql:

SELECT i.indexrelid::regclass AS index,
FROM pg_index AS i
   CROSS JOIN LATERAL pgstatindex(i.indexrelid) AS s
WHERE indrelid = 'testtab'::regclass;

This is what we get for v13:

         index         │ index_size │ avg_leaf_density 
 testtab_pkey          │     319488 │             66.6
 testtab_unchanged_idx │    4022272 │             5.33
 testtab_changed_idx   │    4505600 │            13.57
(3 rows)

For v14, the result is:

         index         │ index_size │ avg_leaf_density 
 testtab_pkey          │     245760 │            87.91
 testtab_unchanged_idx │     532480 │            39.23
 testtab_changed_idx   │    4038656 │            14.23
(3 rows)

We see the biggest improvement in testtab_unchanged_idx. In v13, the index is bloated out of shape, while in v14 it only has 60% bloat (which is not bad for an index). Here we see the biggest effect of the new feature. The UPDATE doesn’t scan that index, so there are no killed index tuples, and still “bottom-up deletion” could remove enough of them to avoid a page split in most cases.

There is also a measurable improvement in testtab_pkey. Since the UPDATE scans that index, dead index tuples will be killed, and the new feature removes those before splitting the page. The difference to v13 is less pronounced here, since v13 already avoids index bloat quite well.

The index testtab_changed_idx cannot benefit from the new feature, since that only addresses the case where the UPDATE doesn’t modify the indexed value. In case you wonder why the leaf density is so much lower compared to testtab_unchanged_idx in v13: that is index de-duplication, which can kick in because the index entry is not modified.

Will I be able to use this feature after a pg_upgrade?

The storage format of the index is unchanged, so this will automatically work after a pg_upgrade of an index created on PostgreSQL v12 or later. If the index was created with an earlier version of PostgreSQL, you will have to REINDEX the index to benefit from the new feature. Remember that pg_upgrade simply copies the index files and does not update the internal index version.


PostgreSQL v14 continues to bring improvements to B-tree indexes. While this particular one may not be revolutionary, it promises to provide a solid improvement for many workloads, especially those with lots of updates.