Sequences are a core feature of SQL. However, some users are tempted to implement sequences to generate invoices. That's dangerous and should be avoided. The core question is: Why? What's the problem with using database-side sequences to put unique invoice numbers to send to clients? Let's dive in and find out.
Table of Contents
Before we try to figure out the best way to handle invoice numbers it makes sense to take a look at the inner workings for CREATE SEQUENCE
. The syntax specification is quite straightforward and allows us to achieve many different things:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
test=# h CREATE SEQUENCE Command: CREATE SEQUENCE Description: define a new sequence generator Syntax: CREATE [ { TEMPORARY | TEMP } | UNLOGGED ] SEQUENCE [ IF NOT EXISTS ] name [ AS data_type ] [ INCREMENT [ BY ] increment ] [ MINVALUE minvalue | NO MINVALUE ] [ MAXVALUE maxvalue | NO MAXVALUE ] [ START [ WITH ] start ] [ CACHE cache ] [ [ NO ] CYCLE ] [ OWNED BY { table_name.column_name | NONE } ] URL: https://www.postgresql.org/docs/15/sql-createsequence.html |
For the sake of simplicity, I have created a basic sequence without any fancy parameters. Once this is done you can call the nextval
function to increment the sequence and return the value:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# CREATE SEQUENCE seq_a; CREATE SEQUENCE test=# SELECT nextval('seq_a'); nextval --------- 1 (1 row) test=# SELECT nextval('seq_a'); nextval --------- 2 (1 row) |
What's important to mention here is that the sequence generates an ever increasing number. If you want to figure out which value has already been used, you can use the currval
function:
1 2 3 4 5 |
test=# SELECT currval('seq_a'); currval --------- 2 (1 row) |
currval
will ensure that the latest value produced by YOUR session is returned. It is NOT the last value issued by the sequence in your database - PostgreSQL guarantees that it is indeed the last value consumed by your session. This is important because we can rely on the fact that we will never get somebody else’s value.
The behavior just described becomes obvious if we reconnect and call currval
again:
1 2 |
test=# SELECT currval('seq_a'); ERROR: currval of sequence 'seq_a' is not yet defined in this session |
As you can see, an error is issued because no value has been generated by this session yet. This makes sense because the overall maximum value issued by the sequence is pointless anyway.
So far, a sequence seems to be a good solution to handle invoice IDs. However, let's inspect what a sequence does in case of failed transactions:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
test=# SELECT nextval('seq_a'); nextval --------- 3 (1 row) test=# BEGIN; BEGIN test=*# SELECT nextval('seq_a'); nextval --------- 4 (1 row) test=*# ROLLBACK; ROLLBACK test=# SELECT nextval('seq_a'); nextval --------- 5 (1 row) |
Note that the value after the ROLLBACK
statement is 5 and not 4. Most people expect the sequence to rollback as well - which is not the case. A sequence ensures that it yields strictly ascending numbers which in turn means that it must not plug gaps in the list of numbers.
For a deeper insight on how sequences interact with transaction, you can read this article.
If sequences are not an option, we can try various different approaches:
1 2 3 4 5 6 7 8 |
test=# CREATE TABLE t_invoice ( inv_id int PRIMARY KEY, whatever text ); CREATE TABLE test=# INSERT INTO t_invoice VALUES (1, 'whatever invoice data'); INSERT 0 1 |
Our invoice table carries an invoice ID and some additional fields which are not relevant to the core problem we are going to solve here.
Remember: Invoice IDs should be numbers that are strictly unique, ascending and there should not be any gaps (by law). Many people therefore try a simple approach:
1 2 3 4 |
test=# INSERT INTO t_invoice SELECT max(inv_id) + 1, 'more invoice data' FROM t_invoice; INSERT 0 1 |
What's wrong with this approach? The problem can be summed up with one word: Concurrency. What if two people run the same operation? Well, both queries will return the same value. max(id) + 1
will be identical and therefore a primary key violation will be the logical consequence.
One way to get around this problem is to add a retry-loop to the application which simply runs the query again, in case a key violation pops up. If there are not too many invoices, this is just fine. However, if you want to handle thousands of sales a second, a retry loop might not be acceptable - we can expect too many key collisions.
How can we handle such conflicts? One way is to lock the table. The problem is: We need a lock that prevents writes as well as reads. We have to ensure that only one transaction at a time can calculate the previous max value. Let's see how this can work:
1 2 3 4 5 6 7 8 9 10 |
test=# BEGIN; BEGIN test=*# LOCK TABLE t_invoice IN ACCESS EXCLUSIVE MODE; LOCK TABLE test=*# INSERT INTO t_invoice SELECT max(inv_id) + 1, 'high value client' FROM t_invoice; INSERT 0 1 test=*# COMMIT; COMMIT |
While this is technically correct, locking the entire table really hurts because it will impact other transactions. Just imagine some kind of reporting job which tries to sum up yesterday’s turnover - it has to read the table but it doesn't work, because invoice creation keeps locking up the table in the most brutal way known to PostgreSQL. Clearly, this is not an option.
The alternative to a table lock is to use a SERIALIZABLE
transaction:
1 2 3 4 5 6 7 8 |
test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE; BEGIN test=*# INSERT INTO t_invoice SELECT max(inv_id) + 1, 'fancy support client' FROM t_invoice; INSERT 0 1 test=*# COMMIT; COMMIT |
The beauty of a serializable transaction is that one does not have to care about locking and concurrency at all. The PostgreSQL core will sort things out for you. Keep in mind that SERIALIZABLE
does have some overhead, but it solves a couple of common problems. Usually it is better to have slower single thread performance but beat the brute force table lock by introducing more concurrency to the system. Also remember: There is no slower way to execute than to “wait”.
If SERIALIZABLE
is not what you want, there is one more option to solve the problem. One solution is to introduce a separate table that does nothing but holding the most current value:
1 2 |
test=# CREATE TABLE t_invoice_id AS SELECT 1 AS id; SELECT 1 |
This is interesting. What we can do here is to run an UPDATE
statement on this new table to fetch the new id. That avoids a lock on the invoice table - but ensures that only one number can be generated at a time, which guarantees that those numbers are ascending and unique. Basically we abuse this single line in t_invoice_id
to centralize the lock in an efficient way.
Here's how the invoice table can be populated:
1 2 3 4 5 6 7 |
test=# WITH x AS (UPDATE t_invoice_id SET id = id + 1 RETURNING * ) INSERT INTO t_invoice SELECT x.id, 'cool client' FROM x; INSERT 0 1 |
Using a CTE (= Common Table Expression) is a good way to handle this operation in a single statement.
There are definitely many different ways to handle unique invoice numbers which are not allowed to have gaps. This post just shared a few of them to give you insights into how this problem can be solved efficiently. If you want to learn more about PostgreSQL and sequences, check out my blog on fixing out-of-sync sequences.
Another related topic of interest are primary keys, see this blog on UUID, serial and identity columns for PostgreSQL auto-generated primary keys.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Gaps in invoices numbers are ok in Germany. Which law (in which country) says that there must not be gaps?
No idea, but people keep coming up with that requirement.
Thank you for sharing interesting approaches!
I would always favour the strategy with a separate table for the invoice numbers ("Using an ID table to minimize locking").
I've had long discussions with accounting people, who claim that invoice IDs must be strictly ascending without gaps, but this simply isn't true (at least in germany).
I have been using invoice and order IDs like
SOMEPREFIX 2022-01/FB42-23AF
for a few years now without any complaints from the tax authorities.They are composed with:
- a prefix identifying the invoice group
- some part identifying the timeframe (with an appropriate granularity for the total number of invoices per year)
- some part from a random UUID
The only real requirement is uniqueness. I think the "no gaps" myth comes from some old best practices ... I mean, it is a good heuristic if you have to check the invoices by hand ... but if you don't trust your DB not to loose your invoices you probably have a much bigger problem