Unexpected downsides of UUID keys in PostgreSQL

06.2023 / Category: , / Tags: | |

There are various compelling reasons to use universally unique identifiers (UUID) as primary keys. Two examples are:

  • To be able to generate keys independently of the database
  • To move sets of related records between different databases without having to deal with renumbering everything

However, like everything good in life, UUID's come with their own downsides. The most important of these downsides is a loss of temporal locality.

The most common way to generate a UUID is to pick a random 128 bit number. (As an aside, for those worried about collisions: you should take up the lottery, since winning the jackpot twice in a row is a much more likely outcome than your system ever generating two identical random 128 bit numbers.) There is a standard for doing this called UUID v4. That standard requires fixed values for 6 bits, leaving only 122 bits of randomness-- but that is still plenty for all practical purposes. There even is a convenient PostgreSQL function called gen_random_uuid() for generating these values.

Random values for UUID's = loss of correlation

The trouble with picking random values is that they lose any correlation between the order of inserts and the order of id’s. B-tree indexes on UUID values are built in lexicographic order, which means that when using this index to look up two rows that were inserted at approximately the same time, they will be in completely different places in the index. By contrast, id’s generated from a sequence are going to be close by each other in the index. Since workloads commonly are interested in recently inserted rows, this means that a random UUID-based index will need to keep essentially the whole index in cache, whereas a sequential id index can make do with only a few recent pages.

But this relationship goes the other way, too. With random ID’s, values that are near each other in the index are going to be inserted at totally different times, and be in totally different places in the table. This is a story of how that can become important.

Getting started: table with a random UUID

To start with, we'll create a table of records with a traditional integer id, a random UUID. Let's also add a timestamp-based UUID method (using the soon-to-be-standardized UUID v7 method) to see if the reason for this is the UUID type itself, or the data ordering. We will also add 100 bytes of filler for every row, both to be realistic and in order for the planner to pick the plans that I want. This table will have 10 million rows in it.

We will add a unique index on each of the id columns, like a normal table would have, and run VACUUM ANALYZE on it, just so we don’t have to wait for AUTOVACUUM to get around to this table.

Checking the sizes of the table and indexes, we ended up with and making sure they are cached:

Let’s do a couple of index-only scans to see how well these indexes work:

The UUID is about twice as slow, but the values in it are also twice as big, so that makes sense, right?

Now let's look at the sequential UUID index

Huh? So it’s the random ordering that is causing us issues, not the UUID type itself. We will have to dig into the explain plans to see what is going on.

Holy behemoth buffer count batman, what is going on with the number of buffers referenced by the random UUID query? To understand this, we have to talk a little bit about how index-only scans work.

How Index-only Scans Work

PostgreSQL indexes do not include transaction metadata used to determine who gets to see what. So normally, when we want to check if an index row is visible to us, we have to go, fetch the table row, and see if the transaction that added that row is committed and visible to us-- and also make sure the row hasn’t been deleted or updated. As you can imagine, that takes quite a bit of work. For most bigger tables, the answer will almost always be yes.

The visibility map

There is a way to get around this. Vacuuming keeps track of which table pages do not need any further cleanup work as everything on it is visible to everyone, a concept known as page all visible. This is kept track of in a visibility map, that is just a bitmap with 2 bits per each 8KB page. This makes it 32’768 times smaller than the table, or in other words, tiny and really easy to cache. Index-only scan makes use of it, and if we only need values from the index and the visibility map tells us that the row is visible, we can skip looking up the row, saving a bunch of time.

But both of our scans need to do this visibility map lookup, so what is going on up there? The answer is that because looking up pages in shared buffers is not free - it involves grabbing a lock, running a hash table lookup on a relatively large hash table, and doing a memory write to mark the pages as being looked at - index-only scan will hang onto a reference to the visibility map page it had to look at last time, and if the next index tuple happens to need the same page, we can skip all that work, and just read the bit we need off of it. Some of you might see where this is going…

The performance catch

Each visibility map page covers 8192*8/2 = 32’768 table pages, or 256MiB. In the sequential id case, nearby id-values are practically always on the same visibility map page and the optimization works perfectly. With random UUIDs, however, there is only about a 1 in 7 chance that they are on the same page. This results in having to do about 6/7*10M ~= 8.5M extra buffer lookups for visibility checks, each of which is counted as one buffer hit in the explain plan. A cache hit is cheap, but not free, and doing it 8.5M times over adds up.

Just to verify this, we can see that the sequential UUID case does not have that issue, and only has buffer hits increase in proportion to its size.

Data locality

The moral of the story is that data locality matters, and it can pop up in the most surprising of places. Using random is typically the worst thing you can do for locality, so if you want to use UUID’s, try to use a sequential variant. UUID v7 is a good option. Hopefully it’s coming to PostgreSQL 17. In the meantime, there are already a couple of implementations out there, even a plain SQL one. Your cache hit rates will thank you.

See Laurenz' blog about auto-generated primary keys for more info on this topic.

See Hans' blog about Hint Bits if you would like to read more about PostgreSQL performance under the hood.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
Ants Aasma
Ants Aasma
11 months ago

While v1 UUID is indeed timestamp based, because of that wrap around behavior it does not preserve temporal locality and as such for performance considerations is as good as the random variant.

David Lee Lambert
David Lee Lambert
11 months ago

I would argue that the "traditional" UUID is variant 0x8, version 1, RFC 4122 section 4.2.1. By the time Postgres introduced a native UUID type, that was widely deprecated in favor of version 4, same RFC, section 4.4; but there was a "timestamp" version already. The first field in sorting is "time_low", which wraps around about every 429½ seconds (about 7 minutes).

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf

+43 (0) 2622 93022-0

Get the newest PostgreSQL Info & Tools

    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    CYBERTEC PostgreSQL International GmbH
    Would love your thoughts, please comment.x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram