CYBERTEC Logo

Deduplication in PostgreSQL v13 B-tree indexes

06.2020 / Category: , / Tags: |
Deduplication in a tree
© Laurenz Albe 2020

 

A while ago, I wrote about B-tree improvements in v12. PostgreSQL v13, which will come out later this year, will feature index entry deduplication as an even more impressive improvement. So I thought it was time for a follow-up.

Deduplication for B-tree indexes

If the indexed keys for different table rows are identical, a GIN index will store that as a single index entry. The pointers to the several table rows (tuple IDs) are stored in the posting list of that entry.

In contrast, B-tree indexes used to store an index entry for each referenced table row. This makes maintenance faster but can lead to duplicate index entries. Now commit 0d861bbb70 has changed that by introducing deduplication of index entries for B-tree indexes.

The big difference is that duplicate index entries can still occur in B-tree indexes. GIN indexes have to integrate new entries into the posting list every time, which makes data modifications less efficient (this is mitigated by the pending list). In contrast, PostgreSQL deduplicates B-tree entries only when it would otherwise have to split the index page. So it has to do the extra work only occasionally, and only when it would have to do extra work anyway. This extra work is balanced by the reduced need for page splits and the resulting smaller index size.

This doesn't affect unique indexes, right?

Oh yes, it does. Every UPDATE creates a new version of a table row, and each row version has to be indexed. Consequently, even a unique index can contain the same index entry multiple times. The new feature is useful for unique indexes because it reduces the index bloat that normally occurs if the same table row is updated frequently.

The benefits of deduplication

Deduplication results in a smaller index size for indexes with repeating entries. This saves disk space and, even more importantly, RAM when the index is cached in shared buffers. Scanning the index becomes faster, and index bloat is reduced.

Upgrade considerations

If you want to make use of the new feature after upgrading with pg_upgrade, you'll have to REINDEX promising indexes that contain many duplicate entries. Upgrading with pg_dumpall and restore or using logical replication rebuilds the index, and deduplication is automatically enabled.

If you want to retain the old behavior, you can disable deduplication by setting deduplicate_items = off on the index.

A small test for index deduplication

I'll use the same test case as in my previous article:

The interesting index here is rel_bid_idx. I'll measure its size, then the size after a REINDEX. Finally, I'll execute the following query several times:

This executes an index scan without actually transferring the result to the client. I'll measure the time with psql's timing on the client side.

I'll run that test on PostgreSQL v12 and v13. As a comparison, I'll do the same thing with a GIN index on bid (this requires the btree_gin contrib module).

Test results

v13 B-tree index v12 B-tree index v12 GIN index
size 126MB 408MB 51MB
size after REINDEX 133MB 429MB 47MB
query duration 130ms 130ms 140ms

Discussion

In my example, the deduplicated B-tree index is more than three times smaller than the v12 index, although it is still considerably larger than the GIN index.

The performance is comparable. I observed a bigger variance in the execution time with the deduplicated index, which I cannot explain.

This feature is another nice step ahead for PostgreSQL's B-tree indexes!

0 0 votes
Article Rating
Subscribe
Notify of
guest
3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
jjxtra
jjxtra
1 year ago

So postgresql 13+ deduplication is the default for btrees?

laurenz
laurenz
1 year ago
Reply to  jjxtra

Yes.

Alexandre de Arruda Paes
Alexandre de Arruda Paes
3 years ago

Awesome !
Thanks Laurenz

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    3
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram