CYBERTEC PostgreSQL Logo

PostgreSQL: Sequences vs. Invoice numbers

09.2022 / Category: / Tags: |

Sequences are a core feature of SQL. However, some users are tempted to implement sequences to generate invoices. That's dangerous and should be avoided. The core question is: Why? What's the problem with using database-side sequences to put unique invoice numbers to send to clients? Let's dive in and find out.

Getting started with CREATE SEQUENCE

Before we try to figure out the best way to handle invoice numbers it makes sense to take a look at the inner workings for CREATE SEQUENCE. The syntax specification is quite straightforward and allows us to achieve many different things:

For the sake of simplicity, I have created a basic sequence without any fancy parameters. Once this is done you can call the nextval function to increment the sequence and return the value:

What's important to mention here is that the sequence generates an ever increasing number. If you want to figure out which value has already been used, you can use the currval function:

currval will ensure that the latest value produced by YOUR session is returned. It is NOT the last value issued by the sequence in your database - PostgreSQL guarantees that it is indeed the last value consumed by your session. This is important because we can rely on the fact that we will never get somebody else’s value.

The behavior just described becomes obvious if we reconnect and call currval again:

As you can see, an error is issued because no value has been generated by this session yet. This makes sense because the overall maximum value issued by the sequence is pointless anyway.

Sequences and transactions

So far, a sequence seems to be a good solution to handle invoice IDs. However, let's inspect what a sequence does in case of failed transactions:

Note that the value after the ROLLBACK statement is 5 and not 4. Most people expect the sequence to rollback as well - which is not the case. A sequence ensures that it yields strictly ascending numbers which in turn means that it must not plug gaps in the list of numbers.

For a deeper insight on how sequences interact with transaction, you can read this article.

Invoice IDs done the wrong way

If sequences are not an option, we can try various different approaches:

Our invoice table carries an invoice ID and some additional fields which are not relevant to the core problem we are going to solve here.

Remember: Invoice IDs should be numbers that are strictly unique, ascending and there should not be any gaps (by law). Many people therefore try a simple approach:

What's wrong with this approach? The problem can be summed up with one word: Concurrency. What if two people run the same operation? Well, both queries will return the same value. max(id) + 1 will be identical and therefore a primary key violation will be the logical consequence.

One way to get around this problem is to add a retry-loop to the application which simply runs the query again, in case a key violation pops up. If there are not too many invoices, this is just fine. However, if you want to handle thousands of sales a second, a retry loop might not be acceptable - we can expect too many key collisions.

Using brute force locking

How can we handle such conflicts? One way is to lock the table. The problem is: We need a lock that prevents writes as well as reads. We have to ensure that only one transaction at a time can calculate the previous max value. Let's see how this can work:

While this is technically correct, locking the entire table really hurts because it will impact other transactions. Just imagine some kind of reporting job which tries to sum up yesterday’s turnover - it has to read the table but it doesn't work, because invoice creation keeps locking up the table in the most brutal way known to PostgreSQL. Clearly, this is not an option.

Using SERIALIZABLE transactions

The alternative to a table lock is to use a SERIALIZABLE transaction:

The beauty of a serializable transaction is that one does not have to care about locking and concurrency at all. The PostgreSQL core will sort things out for you. Keep in mind that SERIALIZABLE does have some overhead, but it solves a couple of common problems. Usually it is better to have slower single thread performance but beat the brute force table lock by introducing more concurrency to the system. Also remember: There is no slower way to execute than to “wait”.

Using an ID table to minimize locking

If SERIALIZABLE is not what you want, there is one more option to solve the problem. One solution is to introduce a separate table that does nothing but holding the most current value:

This is interesting. What we can do here is to run an UPDATE statement on this new table to fetch the new id. That avoids a lock on the invoice table - but ensures that only one number can be generated at a time, which guarantees that those numbers are ascending and unique. Basically we abuse this single line in t_invoice_id to centralize the lock in an efficient way.

Here's how the invoice table can be populated:

Using a CTE (= Common Table Expression) is a good way to handle this operation in a single statement.

Finally …

There are definitely many different ways to handle unique invoice numbers which are not allowed to have gaps. This post just shared a few of them to give you insights into how this problem can be solved efficiently. If you want to learn more about PostgreSQL and sequences, check out my blog on fixing out-of-sync sequences.

Another related topic of interest are primary keys, see this blog on UUID, serial and identity columns for PostgreSQL auto-generated primary keys.

5 responses to “PostgreSQL: Sequences vs. Invoice numbers”

  1. I've had long discussions with accounting people, who claim that invoice IDs must be strictly ascending without gaps, but this simply isn't true (at least in germany).

    I have been using invoice and order IDs like SOMEPREFIX 2022-01/FB42-23AF for a few years now without any complaints from the tax authorities.

    They are composed with:
    - a prefix identifying the invoice group
    - some part identifying the timeframe (with an appropriate granularity for the total number of invoices per year)
    - some part from a random UUID

    The only real requirement is uniqueness. I think the "no gaps" myth comes from some old best practices ... I mean, it is a good heuristic if you have to check the invoices by hand ... but if you don't trust your DB not to loose your invoices you probably have a much bigger problem

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram