© Laurenz Albe 2021
Most database tables have an artificial numeric primary key, and that number is usually generated automatically using a sequence. I wrote about auto-generated primary keys in some detail in a previous article. Occasionally, gaps in these primary key sequences can occur – which might come as a surprise to you.
This article shows the causes of sequence gaps, demonstrates the unexpected fact that sequences can even jump backwards, and gives an example of how to build a gapless sequence.
Gaps in sequences caused by rollback
We are used to the atomic behavior of database transactions: when PostgreSQL rolls a transaction back, all its effects are undone. As the documentation tells us, that is not the case for sequence values:
To avoid blocking concurrent transactions that obtain numbers from the same sequence, a
nextvaloperation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value. For example an
ON CONFLICTclause will compute the to-be-inserted tuple, including doing any required
nextvalcalls, before detecting any conflict that would cause it to follow the
ON CONFLICTrule instead. Such cases will leave unused “holes” in the sequence of assigned values.
This little example shows how a gap forms in a sequence:
CREATE TABLE be_positive ( id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY, value integer CHECK (value > 0) ); -- the identity column is backed by a sequence: SELECT pg_get_serial_sequence('be_positive', 'id'); pg_get_serial_sequence ════════════════════════════ laurenz.be_positive_id_seq (1 row) INSERT INTO be_positive (value) VALUES (42); INSERT 0 1 INSERT INTO be_positive (value) VALUES (-99); ERROR: new row for relation "be_positive" violates check constraint "be_positive_value_check" DETAIL: Failing row contains (2, -99). INSERT INTO be_positive (value) VALUES (314); INSERT 0 1 TABLE be_positive; id │ value ════╪═══════ 1 │ 42 3 │ 314 (2 rows)
The second statement was rolled back, but the sequence value 2 is not, forming a gap.
This intentional behavior is necessary for good performance. After all, a sequence should not be the bottleneck for a workload consisting of many
INSERTs, so it has to perform well. Rolling back sequence values would reduce concurrency and complicate processing.
Gaps in sequences caused by caching
nextval is cheap, a sequence could still be a bottleneck in a highly concurrent workload. To work around that, you can define a sequence with a
CACHE clause greater than 1. Then the first call to
nextval in a database session will actually fetch that many sequence values in a single operation. Subsequent calls to
nextval use those cached values, and there is no need to access the sequence.
As a consequence, these cached sequence values get lost when the database session ends, leading to gaps:
CREATE SEQUENCE seq CACHE 20; SELECT nextval('seq'); nextval ═════════ 1 (1 row) SELECT nextval('seq'); nextval ═════════ 2 (1 row)
Now end the database session and start a new one:
SELECT nextval('seq'); nextval ═════════ 21 (1 row)
Gaps in sequences caused by a crash
As with all other objects, changes to sequences are logged to WAL, so that recovery can restore the state from a backup or after a crash. Since writing WAL impacts performance, not each call to
nextval will log to WAL. Rather, the first call logs a value 32 numbers ahead of the current value, and the next 32 calls to
nextval don’t log anything. That means that after recovering from a crash, the sequence may have skipped some values.
To demonstrate, I’ll use a little PL/Python function that crashes the server by sending a
KILL signal to the current process:
CREATE FUNCTION seppuku() RETURNS void LANGUAGE plpython3u AS 'import os, signal os.kill(os.getpid(), signal.SIGKILL)';
Now let’s see this in action:
CREATE SEQUENCE seq; SELECT nextval('seq'); nextval ═════════ 1 (1 row) SELECT seppuku(); server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
Upon reconnect, we find that some values are missing:
SELECT nextval('seq'); nextval ═════════ 34 (1 row)
Sequences that jump backwards after a crash
It is a little-known fact that sequences can also jump backwards. A backwards jump can happen if the WAL record that logs the advancement of the sequence value has not yet been persisted to disk. Why? Because the transaction that contained the call to
nextval has not yet committed:
CREATE SEQUENCE seq; BEGIN; SELECT nextval('seq'); nextval ═════════ 1 (1 row) SELECT nextval('seq'); nextval ═════════ 2 (1 row) SELECT nextval('seq'); nextval ═════════ 3 (1 row) SELECT seppuku(); psql:seq.sql:9: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request.
Now reconnect and fetch the next sequence value:
SELECT nextval('seq'); nextval ═════════ 1 (1 row)
This looks scary, but no damage can happen to the database: since the transaction didn’t commit, it was rolled back, along with all possible data modifications that used the “lost” sequence values.
However, that leads to an interesting conclusion: don’t use sequence values from an uncommitted transaction outside that transaction.
How to build a gapless sequence
First off: think twice before you decide to build a gapless sequence. It will serialize all transactions that use that “sequence”. That will deteriorate your data modification performance considerably.
You almost never need a gapless sequence. Usually, it is good enough if you know the order of the rows, for example from the current timestamp at the time the row was inserted. Then you can use the
row_number window function to calculate the gapless ordering while you query the data:
SELECT created_ts, value, row_number() OVER (ORDER BY created_ts) AS gapless_seq FROM mytable;
You can implement a truly gapless sequence using a “singleton” table:
CREATE TABLE seq (id bigint NOT NULL); INSERT INTO seq (id) VALUES (0); CREATE FUNCTION next_val() RETURNS bigint LANGUAGE sql AS 'UPDATE seq SET id = id + 1 RETURNING id';
It is important not to create an index on the table, so that you can get HOT updates– and so that the table does not get bloated.
next_val function will lock the table row until the end of the transaction, so keep all transactions that use it short.
I’ve shown you several different ways to make a sequence skip values — sometimes even backwards. But that is never a problem, if all you need are unique primary key values.
Resist the temptation to try for a “gapless sequence”. You can get it, but the performance impact is high.
If you are interested in learning about advanced techniques to enforce integrity, check out our blogpost on constraints over multiple rows.