© Laurenz Albe 2022
I was recently confronted with a nice example of how adding
FOR UPDATE to a query can introduce transaction anomalies. This article will explain how that happens and how you can cope with the problem. Be ready to learn some PostgreSQL implementation details!
Setting the stage
We have a table of bills and a table of items on a bill:
CREATE TABLE bill ( bill_id bigint PRIMARY KEY, total numeric(15,2) DEFAULT 0 NOT NULL ); CREATE TABLE item ( item_id bigint PRIMARY KEY, bill_id bigint REFERENCES bill NOT NULL, amount numeric(15,2) NOT NULL );
Let’s insert some data:
INSERT INTO bill (bill_id, total) VALUES (1, 60.0); INSERT INTO item (item_id, bill_id, amount) VALUES (101, 1, 10.0), (102, 1, 20.0), (103, 1, 30.0);
Adding a new item to the bill
Adding a new item would be a simple
INSERT, but there is a catch:
bill.total should always be the sum of the
item.amount for all items on that bill. This constraint introduces a redundancy to the data model. Normally, the best practice is not to persist that sum in the database, but to calculate it on the fly when you query the database. But there are certain reasons that can justify storing redundant data:
- calculating the sum on the fly is too expensive, for example if the calculation is performed very often of the bills are large
- you want to place a constraint on the sum, for example to guarantee that it is never negative
Now the only proper way to do this is a trigger, but for simplicity’s sake, we will do it explicitly in the transaction that adds the new item:
BEGIN; INSERT INTO item (item_id, bill_id, amount) VALUES (104, 1, 40.0); UPDATE bill SET total = total + 40.0 WHERE bill_id = 1; COMMIT;
A concurrent report
To query the bill data, we could proceed like this:
SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id);
If we run the report before the inserting transaction commits, the result will look as follows:
bill_id │ amount │ total ═════════╪════════╪═══════ 1 │ 10.00 │ 60.00 1 │ 20.00 │ 60.00 1 │ 30.00 │ 60.00 (3 rows)
If we run the report after the inserting transaction commits, we get the following:
bill_id │ amount │ total ═════════╪════════╪════════ 1 │ 10.00 │ 100.00 1 │ 20.00 │ 100.00 1 │ 30.00 │ 100.00 1 │ 40.00 │ 100.00 (4 rows)
Can there be any transaction anomalies in this example?
A transaction anomaly is a race condition in which the concurrent execution of multiple transactions produces a result that can never happen with any serialized execution of the transactions. Note that not all serialized execution orders need to produce the same result: in our example above, it makes a difference if we run the report before or after the inserting transaction.
In our example, concurrent execution of the transactions will never produce an anomaly. No matter what we do, the report will always be consistent, and the total will match the sum of the items. The explanation for this is that PostgreSQL takes a snapshot at the beginning of the reporting query. Such a snapshot determines which transactions are visible to the query and which ones are not. Now either the reporting transaction is part of the snapshot or not, and the query will either see all or none of the data modifications from the inserting transaction.
A surprising transaction anomaly with
SELECT ... FOR UPDATE
We can add
FOR UPDATE to the reporting query, for example if we plan to delete bills that have no items.
FOR UPDATE takes a row lock on the rows returned by the query, which is useful to prevent concurrent transactions from modifying these data:
SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill;
This only locks the rows in
bill. Note that it would be an error to try and lock the rows returned from
item (the reason for that would be material for another article). Also, note that
FOR UPDATE is the correct lock if we intend to
DELETE, and the weaker
FOR NO KEY UPDATE lock would be appropriate if we intend to
Now timing is of the essence. We run the modified reporting query after the inserting transaction has performed the
UPDATE, but before the
COMMIT. That will cause the reporting query to hang (because the row in
bill is locked), but after the
COMMIT, the lock is released, and we get the following output:
bill_id │ amount │ total ═════════╪════════╪════════ 1 │ 10.00 │ 100.00 1 │ 20.00 │ 100.00 1 │ 30.00 │ 100.00 (3 rows)
The sum of the items is no longer equal to the total, and we get an inconsistent result. For some reason, we see the new total, but we don’t see the new item, even though the item was added before the total was updated!
Explanation for the transaction anomaly
To understand what’s going on, we first have to understand how PostgreSQL processes the query. For that, we use
EXPLAIN to see the execution plan:
EXPLAIN (COSTS OFF) SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill; QUERY PLAN ══════════════════════════════════════════════════ LockRows -> Hash Right Join Hash Cond: (item.bill_id = bill.bill_id) -> Seq Scan on item -> Hash -> Seq Scan on bill (6 rows)
You see that locking the rows is the last thing that happens. So PostgreSQL first scans
bill, which will result in the old version of the data (
amount = 60), because that is the snapshot that the query sees. The query also only sees the three original entries from
item. Then, when PostgreSQL tries to lock the row from
bill, it gets stuck behind the row lock from the concurrent
UPDATE statement. The documentation explains what happens now (the “second updater” is our
SELECT ... FOR UPDATE, the emphasis is mine):
In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). […] If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the
WHEREclause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.
In short, what is locked (and returned) can be a version of the row that is normally not visible in the query’s snapshot.
The internal PostgreSQL name for repeating the operation with the latest row version is “EvalPlanQual” (short EPQ) after the function that implements it. It can be instructive to read the README in the source code.
Is that a PostgreSQL bug?
No, it isn’t. First, it is working as documented. The behavior is also condoned by the SQL standard: our query was executed with the default transaction isolation level
READ COMMITTED, and all that the standard requires is that a
READ COMMITTED transaction can never see data from uncommitted transactions. Only with the
SERIALIZABLE transaction isolation level does the SQL standard require that you can never experience a transaction anomaly.
Think of the alternatives that PostgreSQL has in this situation and consider if they would be better:
- Can PostgreSQL lock the old version of the row? No, that would not make sense, because a subsequent update can only update the most recent version of a row, which would then be different from what it told you it was.
- When the lock is released and the row has changed, PostgreSQL could abandon everything, take a new snapshot and start over from the beginning. That would be complicated and not great for performance. If there are many concurrent transactions, that could mean that a query keeps repeating and never gets done.
- PostgreSQL could throw a serialization failure. The SQL standard allows this explicitly “when [the database] detects the inability to guarantee the serializability of two or more concurrent SQL-transactions”. But that would mean that you have to be ready to repeat transactions in a
READ COMMITTEDtransaction, which would be cumbersome. It is a desirable feature that PostgreSQL does not throw serialization failures in
How can I avoid these transaction anomalies?
The correct way to avoid transaction anomalies is to use a higher transaction isolation level. The SQL standard requires that
SERIALIZABLE transactions are free from anomalies (even though Oracle doesn’t seem to have got the message). However, in PostgreSQL it is enough to use the
REPEATABLE READ isolation level. The reason is that on the
REPEATABLE READ isolation level, PostgreSQL uses the same snapshot for the whole transaction.. If a concurrent transaction has modified a row after the snapshot was taken, an attempt to update that row will cause a serialization failure. Consequently,
SELECT ... FOR UPDATE, which should behave consistent with
UPDATE, does the same thing:
BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill; ERROR: could not serialize access due to concurrent update
Then you have to do what you always have to do when you get a serialization failure: rollback and repeat the transaction.
ROLLBACK; BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill; bill_id │ amount │ total ═════════╪════════╪════════ 1 │ 10.00 │ 100.00 1 │ 20.00 │ 100.00 1 │ 30.00 │ 100.00 1 │ 40.00 │ 100.00 (4 rows)
REPEATABLE READ transactions is cheap in PostgreSQL. So this is a solution that is always viable.
You can get surprising transaction anomalies with
SELECT ... FOR UPDATE on the default
READ COMMITTED transaction isolation level. The simple way to avoid them is to use a
REPEATABLE READ transaction.
If you liked this article, you might also be interested in my article about race conditions with constraint triggers.