PostgreSQL: Sequences vs. Invoice numbers

Sequences are a core feature of SQL. However, some users are tempted to implement sequences to generate invoices. That's dangerous and should be avoided. The core question is: Why? What's the problem with using database-side sequences to put unique invoice numbers to send to clients? Let's dive in and find out.

Table of Contents

Getting started with CREATE SEQUENCE

Before we try to figure out the best way to handle invoice numbers it makes sense to take a look at the inner workings for CREATE SEQUENCE. The syntax specification is quite straightforward and allows us to achieve many different things:

test=# h CREATE SEQUENCE
Command: CREATE SEQUENCE
Description: define a new sequence generator
Syntax:
CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] SEQUENCE
[ IF NOT EXISTS ] name
[ AS data_type ]
[ INCREMENT [ BY ] increment ]
[ MINVALUE minvalue | NO MINVALUE ]
[ MAXVALUE maxvalue | NO MAXVALUE ]
[ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]
[ OWNED BY { table_name.column_name | NONE } ]

URL: https://www.postgresql.org/docs/15/sql-createsequence.html

test=# h CREATE SEQUENCE

Command: CREATE SEQUENCE

Description: define a new sequence generator

Syntax:

CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] SEQUENCE

[ IF NOT EXISTS ] name

[ AS data_type ]

[ INCREMENT [ BY ] increment ]

[ MINVALUE minvalue | NO MINVALUE ]

[ MAXVALUE maxvalue | NO MAXVALUE ]

[ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ]

[ OWNED BY { table_name.column_name | NONE } ]

URL: https://www.postgresql.org/docs/15/sql-createsequence.html

For the sake of simplicity, I have created a basic sequence without any fancy parameters. Once this is done you can call the nextval function to increment the sequence and return the value:

test=# CREATE SEQUENCE seq_a;
CREATE SEQUENCE
test=# SELECT nextval('seq_a');
nextval
---------
1
(1 row)

test=# SELECT nextval('seq_a');
nextval
---------
2
(1 row)

test=# CREATE SEQUENCE seq_a;

CREATE SEQUENCE

test=# SELECT nextval('seq_a');

nextval

---------

(1 row)

test=# SELECT nextval('seq_a');

nextval

---------

(1 row)

What's important to mention here is that the sequence generates an ever increasing number. If you want to figure out which value has already been used, you can use the currval function:

test=# SELECT currval('seq_a');
currval
---------
2
(1 row)

test=# SELECT currval('seq_a');

currval

---------

(1 row)

currval will ensure that the latest value produced by YOUR session is returned. It is NOT the last value issued by the sequence in your database - PostgreSQL guarantees that it is indeed the last value consumed by your session. This is important because we can rely on the fact that we will never get somebody else’s value.

The behavior just described becomes obvious if we reconnect and call currval again:

test=# SELECT currval('seq_a');
ERROR: currval of sequence 'seq_a' is not yet defined in this session

1 2	test=# SELECT currval('seq_a'); ERROR: currval of sequence 'seq_a' is not yet defined in this session

As you can see, an error is issued because no value has been generated by this session yet. This makes sense because the overall maximum value issued by the sequence is pointless anyway.

Sequences and transactions

So far, a sequence seems to be a good solution to handle invoice IDs. However, let's inspect what a sequence does in case of failed transactions:

test=# SELECT nextval('seq_a');
nextval
---------
3
(1 row)

test=# BEGIN;
BEGIN
test=*# SELECT nextval('seq_a');
nextval
---------
4
(1 row)

test=*# ROLLBACK;
ROLLBACK
test=# SELECT nextval('seq_a');
nextval
---------
5
(1 row)

test=# SELECT nextval('seq_a');

nextval

---------

(1 row)

test=# BEGIN;

BEGIN

test=*# SELECT nextval('seq_a');

nextval

---------

(1 row)

test=*# ROLLBACK;

ROLLBACK

test=# SELECT nextval('seq_a');

nextval

---------

(1 row)

Note that the value after the ROLLBACK statement is 5 and not 4. Most people expect the sequence to rollback as well - which is not the case. A sequence ensures that it yields strictly ascending numbers which in turn means that it must not plug gaps in the list of numbers.

For a deeper insight on how sequences interact with transaction, you can read this article.

Invoice IDs done the wrong way

If sequences are not an option, we can try various different approaches:

test=# CREATE TABLE t_invoice (
inv_id int PRIMARY KEY,
whatever text
);
CREATE TABLE
test=# INSERT INTO t_invoice
VALUES (1, 'whatever invoice data');
INSERT 0 1

test=# CREATE TABLE t_invoice (

inv_id int PRIMARY KEY,

whatever text

);

CREATE TABLE

test=# INSERT INTO t_invoice

VALUES (1, 'whatever invoice data');

INSERT 0 1

Our invoice table carries an invoice ID and some additional fields which are not relevant to the core problem we are going to solve here.

Remember: Invoice IDs should be numbers that are strictly unique, ascending and there should not be any gaps (by law). Many people therefore try a simple approach:

test=# INSERT INTO t_invoice
SELECT max(inv_id) + 1, 'more invoice data'
FROM t_invoice;
INSERT 0 1

test=# INSERT INTO t_invoice

SELECT max(inv_id) + 1, 'more invoice data'

FROM t_invoice;

INSERT 0 1

What's wrong with this approach? The problem can be summed up with one word: Concurrency. What if two people run the same operation? Well, both queries will return the same value. max(id) + 1 will be identical and therefore a primary key violation will be the logical consequence.

One way to get around this problem is to add a retry-loop to the application which simply runs the query again, in case a key violation pops up. If there are not too many invoices, this is just fine. However, if you want to handle thousands of sales a second, a retry loop might not be acceptable - we can expect too many key collisions.

Using brute force locking

How can we handle such conflicts? One way is to lock the table. The problem is: We need a lock that prevents writes as well as reads. We have to ensure that only one transaction at a time can calculate the previous max value. Let's see how this can work:

test=# BEGIN;
BEGIN
test=*# LOCK TABLE t_invoice IN ACCESS EXCLUSIVE MODE;
LOCK TABLE
test=*# INSERT INTO t_invoice
SELECT max(inv_id) + 1, 'high value client'
FROM t_invoice;
INSERT 0 1
test=*# COMMIT;
COMMIT

test=# BEGIN;

BEGIN

test=*# LOCK TABLE t_invoice IN ACCESS EXCLUSIVE MODE;

LOCK TABLE

test=*# INSERT INTO t_invoice

SELECT max(inv_id) + 1, 'high value client'

FROM t_invoice;

INSERT 0 1

test=*# COMMIT;

COMMIT

While this is technically correct, locking the entire table really hurts because it will impact other transactions. Just imagine some kind of reporting job which tries to sum up yesterday’s turnover - it has to read the table but it doesn't work, because invoice creation keeps locking up the table in the most brutal way known to PostgreSQL. Clearly, this is not an option.

Using SERIALIZABLE transactions

The alternative to a table lock is to use a SERIALIZABLE transaction:

test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
test=*# INSERT INTO t_invoice
SELECT max(inv_id) + 1, 'fancy support client'
FROM t_invoice;
INSERT 0 1
test=*# COMMIT;
COMMIT

test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN

test=*# INSERT INTO t_invoice

SELECT max(inv_id) + 1, 'fancy support client'

FROM t_invoice;

INSERT 0 1

test=*# COMMIT;

COMMIT

The beauty of a serializable transaction is that one does not have to care about locking and concurrency at all. The PostgreSQL core will sort things out for you. Keep in mind that SERIALIZABLE does have some overhead, but it solves a couple of common problems. Usually it is better to have slower single thread performance but beat the brute force table lock by introducing more concurrency to the system. Also remember: There is no slower way to execute than to “wait”.

Using an ID table to minimize locking

If SERIALIZABLE is not what you want, there is one more option to solve the problem. One solution is to introduce a separate table that does nothing but holding the most current value:

test=# CREATE TABLE t_invoice_id AS SELECT 1 AS id;
SELECT 1

1 2	test=# CREATE TABLE t_invoice_id AS SELECT 1 AS id; SELECT 1

This is interesting. What we can do here is to run an UPDATE statement on this new table to fetch the new id. That avoids a lock on the invoice table - but ensures that only one number can be generated at a time, which guarantees that those numbers are ascending and unique. Basically we abuse this single line in t_invoice_id to centralize the lock in an efficient way.

Here's how the invoice table can be populated:

test=# WITH x AS (UPDATE t_invoice_id
SET id = id + 1
RETURNING *
)
INSERT INTO t_invoice
SELECT x.id, 'cool client' FROM x;
INSERT 0 1

test=# WITH x AS (UPDATE t_invoice_id

SET id = id + 1

RETURNING *

)

INSERT INTO t_invoice

SELECT x.id, 'cool client' FROM x;

INSERT 0 1

Using a CTE (= Common Table Expression) is a good way to handle this operation in a single statement.

Finally …

There are definitely many different ways to handle unique invoice numbers which are not allowed to have gaps. This post just shared a few of them to give you insights into how this problem can be solved efficiently. If you want to learn more about PostgreSQL and sequences, check out my blog on fixing out-of-sync sequences.

Another related topic of interest are primary keys, see this blog on UUID, serial and identity columns for PostgreSQL auto-generated primary keys.

0 0 votes

Article Rating

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Daniel

1 year ago

I've had long discussions with accounting people, who claim that invoice IDs must be strictly ascending without gaps, but this simply isn't true (at least in germany).

I have been using invoice and order IDs like SOMEPREFIX 2022-01/FB42-23AF for a few years now without any complaints from the tax authorities.

They are composed with:
- a prefix identifying the invoice group
- some part identifying the timeframe (with an appropriate granularity for the total number of invoices per year)
- some part from a random UUID

The only real requirement is uniqueness. I think the "no gaps" myth comes from some old best practices ... I mean, it is a good heuristic if you have to check the invoices by hand ... but if you don't trust your DB not to loose your invoices you probably have a much bigger problem

Oleksandr Kharchenko

1 year ago

Thank you for sharing interesting approaches!

Georg Klimm

1 year ago

I would always favour the strategy with a separate table for the invoice numbers ("Using an ID table to minimize locking").

Thomas Güttler

1 year ago

Gaps in invoices numbers are ok in Germany. Which law (in which country) says that there must not be gaps?

laurenz

1 year ago

Reply to Thomas Güttler

No idea, but people keep coming up with that requirement.

Stay tuned with our

PostgreSQL: Sequences vs. Invoice numbers

Getting started with CREATE SEQUENCE

Sequences and transactions

Invoice IDs done the wrong way

Using brute force locking

Using SERIALIZABLE transactions

Using an ID table to minimize locking

Finally …

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts