Unexpected downsides of UUID keys in PostgreSQL

Ants Aasma

06.2023 | Category: How To,Performance | Tags: development, index, uuid

06.2023

Category: How To,Performance

Tags: development, index, uuid

There are various compelling reasons to use universally unique identifiers (UUID) as primary keys. Two examples are:

Table of Contents

To be able to generate keys independently of the database
To move sets of related records between different databases without having to deal with renumbering everything

However, like everything good in life, UUID's come with their own downsides. The most important of these downsides is a loss of temporal locality.

The most common way to generate a UUID is to pick a random 128 bit number. (As an aside, for those worried about collisions: you should take up the lottery, since winning the jackpot twice in a row is a much more likely outcome than your system ever generating two identical random 128 bit numbers.) There is a standard for doing this called UUID v4. That standard requires fixed values for 6 bits, leaving only 122 bits of randomness-- but that is still plenty for all practical purposes. There even is a convenient PostgreSQL function called gen_random_uuid() for generating these values.

Random values for UUID's = loss of correlation

The trouble with picking random values is that they lose any correlation between the order of inserts and the order of id’s. B-tree indexes on UUID values are built in lexicographic order, which means that when using this index to look up two rows that were inserted at approximately the same time, they will be in completely different places in the index. By contrast, id’s generated from a sequence are going to be close by each other in the index. Since workloads commonly are interested in recently inserted rows, this means that a random UUID-based index will need to keep essentially the whole index in cache, whereas a sequential id index can make do with only a few recent pages.

But this relationship goes the other way, too. With random ID’s, values that are near each other in the index are going to be inserted at totally different times, and be in totally different places in the table. This is a story of how that can become important.

Getting started: table with a random UUID

To start with, we'll create a table of records with a traditional integer id, a random UUID. Let's also add a timestamp-based UUID method (using the soon-to-be-standardized UUID v7 method) to see if the reason for this is the UUID type itself, or the data ordering. We will also add 100 bytes of filler for every row, both to be realistic and in order for the planner to pick the plans that I want. This table will have 10 million rows in it.

locality=# create table records (id int8 not null, uuid_v4 uuid not null, uuid_v7 uuid not null, filler text);
CREATE TABLE
Time: 98.515 ms
locality=# insert into records select id, gen_random_uuid(), uuid_generate_v7(), repeat(' ', 100) from generate_series(1, 10000000) id;
INSERT 0 10000000
Time: 28050.965 ms (00:28.051)

locality=# create table records (id int8 not null, uuid_v4 uuid not null, uuid_v7 uuid not null, filler text);

CREATE TABLE

Time: 98.515 ms

locality=# insert into records select id, gen_random_uuid(), uuid_generate_v7(), repeat(' ', 100) from generate_series(1, 10000000) id;

INSERT 0 10000000

Time: 28050.965 ms (00:28.051)

We will add a unique index on each of the id columns, like a normal table would have, and run VACUUM ANALYZE on it, just so we don’t have to wait for AUTOVACUUM to get around to this table.

locality=# create index on records (id);
CREATE INDEX
Time: 1689.437 ms (00:01.689)
locality=# create index on records (uuid_v4);
CREATE INDEX
Time: 2378.215 ms (00:02.378)
locality=# create index on records (uuid_v7);
CREATE INDEX
Time: 1956.945 ms (00:01.957)
locality=# vacuum analyze records;
VACUUM
Time: 281.858 ms

locality=# create index on records (id);

CREATE INDEX

Time: 1689.437 ms (00:01.689)

locality=# create index on records (uuid_v4);

CREATE INDEX

Time: 2378.215 ms (00:02.378)

locality=# create index on records (uuid_v7);

CREATE INDEX

Time: 1956.945 ms (00:01.957)

locality=# vacuum analyze records;

VACUUM

Time: 281.858 ms

Checking the sizes of the table and indexes, we ended up with and making sure they are cached:

locality=# select relname, pg_size_pretty(pg_relation_size(oid)), pg_prewarm(oid) from pg_class where relname like 'records%';
       relname       | pg_size_pretty | pg_prewarm 
---------------------+----------------+------------
 records             | 1662 MB        |     212766
 records_id_idx      | 214 MB         |      27422
 records_uuid_v4_idx | 301 MB         |      38506
 records_uuid_v7_idx | 301 MB         |      38506
(4 rows)

Time: 283.849 ms

locality=# select relname, pg_size_pretty(pg_relation_size(oid)), pg_prewarm(oid) from pg_class where relname like 'records%';

relname | pg_size_pretty | pg_prewarm

---------------------+----------------+------------

records | 1662 MB | 212766

records_id_idx | 214 MB | 27422

records_uuid_v4_idx | 301 MB | 38506

records_uuid_v7_idx | 301 MB | 38506

(4 rows)

Time: 283.849 ms

Let’s do a couple of index-only scans to see how well these indexes work:

locality=# SELECT COUNT(id) FROM records;
  count   
----------
 10000000
(1 row)

Time: 526.486 ms
locality=# SELECT COUNT(uuid_v4) FROM records;
  count   
----------
 10000000
(1 row)

Time: 1213.813 ms (00:01.214)

locality=# SELECT COUNT(id) FROM records;

count

----------

10000000

(1 row)

Time: 526.486 ms

locality=# SELECT COUNT(uuid_v4) FROM records;

count

----------

10000000

(1 row)

Time: 1213.813 ms (00:01.214)

The UUID is about twice as slow, but the values in it are also twice as big, so that makes sense, right?

Now let's look at the sequential UUID index

locality=# SELECT COUNT(uuid_v7) FROM records;
  count   
----------
 10000000
(1 row)

Time: 541.322 ms

locality=# SELECT COUNT(uuid_v7) FROM records;

count

----------

10000000

(1 row)

Time: 541.322 ms

Huh? So it’s the random ordering that is causing us issues, not the UUID type itself. We will have to dig into the explain plans to see what is going on.

locality=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(id) FROM records;
                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=202422.47..202422.48 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared hit=27332
   ->  Index Only Scan using records_id_idx on records  (cost=0.43..177422.46 rows=10000002 width=8) (actual rows=10000000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=27332
 Planning Time: 0.056 ms
 Execution Time: 777.764 ms
(7 rows)
locality=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v4) FROM records;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=213506.47..213506.48 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared hit=8562960
   ->  Index Only Scan using records_uuid_v4_idx on records  (cost=0.43..188506.46 rows=10000002 width=16) (actual rows=10000000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=8562960
 Planning Time: 0.058 ms
 Execution Time: 1430.684 ms
(7 rows)

locality=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(id) FROM records;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=202422.47..202422.48 rows=1 width=8) (actual rows=1 loops=1)

Buffers: shared hit=27332

-> Index Only Scan using records_id_idx on records (cost=0.43..177422.46 rows=10000002 width=8) (actual rows=10000000 loops=1)

Heap Fetches: 0

Buffers: shared hit=27332

Planning Time: 0.056 ms

Execution Time: 777.764 ms

(7 rows)

locality=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v4) FROM records;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=213506.47..213506.48 rows=1 width=8) (actual rows=1 loops=1)

Buffers: shared hit=8562960

-> Index Only Scan using records_uuid_v4_idx on records (cost=0.43..188506.46 rows=10000002 width=16) (actual rows=10000000 loops=1)

Heap Fetches: 0

Buffers: shared hit=8562960

Planning Time: 0.058 ms

Execution Time: 1430.684 ms

(7 rows)

Holy behemoth buffer count batman, what is going on with the number of buffers referenced by the random UUID query? To understand this, we have to talk a little bit about how index-only scans work.

How Index-only Scans Work

PostgreSQL indexes do not include transaction metadata used to determine who gets to see what. So normally, when we want to check if an index row is visible to us, we have to go, fetch the table row, and see if the transaction that added that row is committed and visible to us-- and also make sure the row hasn’t been deleted or updated. As you can imagine, that takes quite a bit of work. For most bigger tables, the answer will almost always be yes.

The visibility map

There is a way to get around this. Vacuuming keeps track of which table pages do not need any further cleanup work as everything on it is visible to everyone, a concept known as page all visible. This is kept track of in a visibility map, that is just a bitmap with 2 bits per each 8KB page. This makes it 32’768 times smaller than the table, or in other words, tiny and really easy to cache. Index-only scan makes use of it, and if we only need values from the index and the visibility map tells us that the row is visible, we can skip looking up the row, saving a bunch of time.

But both of our scans need to do this visibility map lookup, so what is going on up there? The answer is that because looking up pages in shared buffers is not free - it involves grabbing a lock, running a hash table lookup on a relatively large hash table, and doing a memory write to mark the pages as being looked at - index-only scan will hang onto a reference to the visibility map page it had to look at last time, and if the next index tuple happens to need the same page, we can skip all that work, and just read the bit we need off of it. Some of you might see where this is going…

The performance catch

Each visibility map page covers 8192*8/2 = 32’768 table pages, or 256MiB. In the sequential id case, nearby id-values are practically always on the same visibility map page and the optimization works perfectly. With random UUIDs, however, there is only about a 1 in 7 chance that they are on the same page. This results in having to do about 6/7*10M ~= 8.5M extra buffer lookups for visibility checks, each of which is counted as one buffer hit in the explain plan. A cache hit is cheap, but not free, and doing it 8.5M times over adds up.

Just to verify this, we can see that the sequential UUID case does not have that issue, and only has buffer hits increase in proportion to its size.

locality=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v7) FROM records;
                                                                QUERY PLAN                                                                
------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=213506.47..213506.48 rows=1 width=8) (actual rows=1 loops=1)
   Buffers: shared hit=39126
   ->  Index Only Scan using records_uuid_v7_idx on records  (cost=0.43..188506.46 rows=10000002 width=16) (actual rows=10000000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=39126
 Planning Time: 0.058 ms
 Execution Time: 790.872 ms
(7 rows)

locality=# EXPLAIN (BUFFERS, ANALYZE, TIMING OFF) SELECT COUNT(uuid_v7) FROM records;

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=213506.47..213506.48 rows=1 width=8) (actual rows=1 loops=1)

Buffers: shared hit=39126

-> Index Only Scan using records_uuid_v7_idx on records (cost=0.43..188506.46 rows=10000002 width=16) (actual rows=10000000 loops=1)

Heap Fetches: 0

Buffers: shared hit=39126

Planning Time: 0.058 ms

Execution Time: 790.872 ms

(7 rows)

Data locality

The moral of the story is that data locality matters, and it can pop up in the most surprising of places. Using random is typically the worst thing you can do for locality, so if you want to use UUID’s, try to use a sequential variant. UUID v7 is a good option. Hopefully it’s coming to PostgreSQL 17. In the meantime, there are already a couple of implementations out there, even a plain SQL one. Your cache hit rates will thank you.

See Laurenz' blog about auto-generated primary keys for more info on this topic.

See Hans' blog about Hint Bits if you would like to read more about PostgreSQL performance under the hood.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

2 responses to “Unexpected downsides of UUID keys in PostgreSQL”

David Lee Lambert says:

June 21, 2023 at 4:59 pm

I would argue that the "traditional" UUID is variant 0x8, version 1, RFC 4122 section 4.2.1. By the time Postgres introduced a native UUID type, that was widely deprecated in favor of version 4, same RFC, section 4.4; but there was a "timestamp" version already. The first field in sorting is "time_low", which wraps around about every 429½ seconds (about 7 minutes).

Reply
Ants Aasma says:

June 22, 2023 at 10:46 am

While v1 UUID is indeed timestamp based, because of that wrap around behavior it does not preserve temporal locality and as such for performance considerations is as good as the random variant.

Reply

Unexpected downsides of UUID keys in PostgreSQL

Random values for UUID's = loss of correlation

Getting started: table with a random UUID

Let’s do a couple of index-only scans to see how well these indexes work:

Now let's look at the sequential UUID index

How Index-only Scans Work

The visibility map

The performance catch

Data locality

2 responses to “Unexpected downsides of UUID keys in PostgreSQL”

Leave a Reply Cancel reply

Ants Aasma

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts