CYBERTEC PostgreSQL Logo

Gaps in sequences in PostgreSQL

09.2021 / Category: / Tags: | |
when gaps in sequences are a real problem
© Laurenz Albe 2021

 

Most database tables have an artificial numeric primary key, and that number is usually generated automatically using a sequence. I wrote about auto-generated primary keys in some detail in a previous article. Occasionally, gaps in these primary key sequences can occur - which might come as a surprise to you.

This article shows the causes of sequence gaps, demonstrates the unexpected fact that sequences can even jump backwards, and gives an example of how to build a gapless sequence.

Gaps in sequences caused by rollback

We are used to the atomic behavior of database transactions: when PostgreSQL rolls a transaction back, all its effects are undone. As the documentation tells us, that is not the case for sequence values:

To avoid blocking concurrent transactions that obtain numbers from the same sequence, a nextval operation is never rolled back; that is, once a value has been fetched it is considered used and will not be returned again. This is true even if the surrounding transaction later aborts, or if the calling query ends up not using the value. For example an INSERT with an ON CONFLICT clause will compute the to-be-inserted tuple, including doing any required nextval calls, before detecting any conflict that would cause it to follow the ON CONFLICT rule instead. Such cases will leave unused “holes” in the sequence of assigned values.

This little example shows how a gap forms in a sequence:

The second statement was rolled back, but the sequence value 2 is not, forming a gap.

This intentional behavior is necessary for good performance. After all, a sequence should not be the bottleneck for a workload consisting of many INSERTs, so it has to perform well. Rolling back sequence values would reduce concurrency and complicate processing.

Gaps in sequences caused by caching

Even though nextval is cheap, a sequence could still be a bottleneck in a highly concurrent workload. To work around that, you can define a sequence with a CACHE clause greater than 1. Then the first call to nextval in a database session will actually fetch that many sequence values in a single operation. Subsequent calls to nextval use those cached values, and there is no need to access the sequence.

As a consequence, these cached sequence values get lost when the database session ends, leading to gaps:

Now end the database session and start a new one:

Gaps in sequences caused by a crash

As with all other objects, changes to sequences are logged to WAL, so that recovery can restore the state from a backup or after a crash. Since writing WAL impacts performance, not each call to nextval will log to WAL. Rather, the first call logs a value 32 numbers ahead of the current value, and the next 32 calls to nextval don't log anything. That means that after recovering from a crash, the sequence may have skipped some values.

To demonstrate, I'll use a little PL/Python function that crashes the server by sending a KILL signal to the current process:

Now let's see this in action:

Upon reconnect, we find that some values are missing:

Sequences that jump backwards after a crash

It is a little-known fact that sequences can also jump backwards. A backwards jump can happen if the WAL record that logs the advancement of the sequence value has not yet been persisted to disk. Why? Because the transaction that contained the call to nextval has not yet committed:

Now reconnect and fetch the next sequence value:

This looks scary, but no damage can happen to the database: since the transaction didn't commit, it was rolled back, along with all possible data modifications that used the “lost” sequence values.

However, that leads to an interesting conclusion: don't use sequence values from an uncommitted transaction outside that transaction.

How to build a gapless sequence

First off: think twice before you decide to build a gapless sequence. It will serialize all transactions that use that “sequence”. That will deteriorate your data modification performance considerably.

You almost never need a gapless sequence. Usually, it is good enough if you know the order of the rows, for example from the current timestamp at the time the row was inserted. Then you can use the row_number window function to calculate the gapless ordering while you query the data:

You can implement a truly gapless sequence using a “singleton” table:

It is important not to create an index on the table, so that you can get HOT updates and so that the table does not get bloated.

Calling the next_val function will lock the table row until the end of the transaction, so keep all transactions that use it short.

Conclusion

I've shown you several different ways to make a sequence skip values — sometimes even backwards. But that is never a problem, if all you need are unique primary key values.

Resist the temptation to try for a “gapless sequence”. You can get it, but the performance impact is high.

If you are interested in learning about advanced techniques to enforce integrity, check out our blogpost on constraints over multiple rows.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

8 responses to “Gaps in sequences in PostgreSQL”

  1. I faced one day with another kind of gaps that can be reproduced very simply:


    create table if not exists check_seq (id int4, some_txt_attr text, some_int_attr int4);

    create sequence if not exists check_seq_seq
    increment by 1
    minvalue 1
    maxvalue 2147483647
    start 1
    cache 1
    no cycle;

    insert into check_seq
    select (jsonb_populate_record(null::check_seq, jsonb_build_object('id', nextval('check_seq_seq'), 'some_txt_attr', 'random text seq', 'some_int_attr', 0))).*;

    insert into check_seq
    select (jsonb_populate_record(null::check_seq, jsonb_build_object('id', nextval('check_seq_seq'), 'some_txt_attr', 'random text seq', 'some_int_attr', 0))).*;
    select * from check_seq;


    and as a result:

    https://uploads.disquscdn.com/images/db963f49dd55842a27d054d4ecc1113cdf61c4b1caf59655e530a87c4875dd6c.png

    • Nice example! In this case, the problem is that
      jsonb_populate_record(...nextval('check_seq_seq')...).*


      gets expanded to

      jsonb_populate_record(...nextval('check_seq_seq')...).col1, jsonb_populate_record(...nextval('check_seq_seq')...).col2, jsonb_populate_record(...nextval('check_seq_seq')...).col3


      so that

      nextval is called three times.

      • it works correctly though:

        insert into check_seq
        select * from jsonb_populate_record(null::check_seq, jsonb_build_object('id', nextval('check_seq_seq'), 'some_txt_attr', 'random text seq', 'some_int_attr', 0));

  2. Hi.
    I want to store coin information in postgresql. And this operation is supposed to be done every 24 hours and each time about 24 or 25 thousand coins are checked and if a new coin was added it will be added in the database otherwise nothing will happen (on conflict do nothing).
    As you know, each time the seq id increases and there will be a big gap between the ids, and after one or two months this number will reach 1 million. Is it correct to use the method you said? Or do you suggest that I create the coin_id myself or do you have another suggestion?

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram