Table of Contents
I was recently confronted with a nice example of how adding FOR UPDATE
to a query can introduce transaction anomalies. This article will explain how that happens and how you can cope with the problem. Be ready to learn some PostgreSQL implementation details!
We have a table of bills and a table of items on a bill:
1 2 3 4 5 6 7 8 9 10 |
CREATE TABLE bill ( bill_id bigint PRIMARY KEY, total numeric(15,2) DEFAULT 0 NOT NULL ); CREATE TABLE item ( item_id bigint PRIMARY KEY, bill_id bigint REFERENCES bill NOT NULL, amount numeric(15,2) NOT NULL ); |
Let's insert some data:
1 2 3 4 5 6 7 8 9 10 11 |
INSERT INTO bill (bill_id, total) VALUES (1, 60.0); INSERT INTO item (item_id, bill_id, amount) VALUES (101, 1, 10.0), (102, 1, 20.0), (103, 1, 30.0); |
Adding a new item would be a simple INSERT
, but there is a catch: bill.total
should always be the sum of the item.amount
for all items on that bill. This constraint introduces a redundancy to the data model. Normally, the best practice is not to persist that sum in the database, but to calculate it on the fly when you query the database. But there are certain reasons that can justify storing redundant data:
Now the only proper way to do this is a trigger, but for simplicity's sake, we will do it explicitly in the transaction that adds the new item:
1 2 3 4 5 6 7 8 9 10 11 |
BEGIN; INSERT INTO item (item_id, bill_id, amount) VALUES (104, 1, 40.0); UPDATE bill SET total = total + 40.0 WHERE bill_id = 1; COMMIT; |
To query the bill data, we could proceed like this:
1 2 3 |
SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id); |
If we run the report before the inserting transaction commits, the result will look as follows:
1 2 3 4 5 6 |
bill_id │ amount │ total ═════════╪════════╪═══════ 1 │ 10.00 │ 60.00 1 │ 20.00 │ 60.00 1 │ 30.00 │ 60.00 (3 rows) |
If we run the report after the inserting transaction commits, we get the following:
1 2 3 4 5 6 7 |
bill_id │ amount │ total ═════════╪════════╪════════ 1 │ 10.00 │ 100.00 1 │ 20.00 │ 100.00 1 │ 30.00 │ 100.00 1 │ 40.00 │ 100.00 (4 rows) |
A transaction anomaly is a race condition in which the concurrent execution of multiple transactions produces a result that can never happen with any serialized execution of the transactions. Note that not all serialized execution orders need to produce the same result: in our example above, it makes a difference if we run the report before or after the inserting transaction.
In our example, concurrent execution of the transactions will never produce an anomaly. No matter what we do, the report will always be consistent, and the total will match the sum of the items. The explanation for this is that PostgreSQL takes a snapshot at the beginning of the reporting query. Such a snapshot determines which transactions are visible to the query and which ones are not. Now either the reporting transaction is part of the snapshot or not, and the query will either see all or none of the data modifications from the inserting transaction.
SELECT ... FOR UPDATE
We can add FOR UPDATE
to the reporting query, for example if we plan to delete bills that have no items. FOR UPDATE
takes a row lock on the rows returned by the query, which is useful to prevent concurrent transactions from modifying these data:
1 2 3 4 |
SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill; |
This only locks the rows in bill
. Note that it would be an error to try and lock the rows returned from item
(the reason for that would be material for another article). Also, note that FOR UPDATE
is the correct lock if we intend to DELETE
, and the weaker FOR NO KEY UPDATE
lock would be appropriate if we intend to UPDATE
.
Now timing is of the essence. We run the modified reporting query after the inserting transaction has performed the UPDATE
, but before the COMMIT
. That will cause the reporting query to hang (because the row in bill
is locked), but after the COMMIT
, the lock is released, and we get the following output:
1 2 3 4 5 6 |
bill_id │ amount │ total ═════════╪════════╪════════ 1 │ 10.00 │ 100.00 1 │ 20.00 │ 100.00 1 │ 30.00 │ 100.00 (3 rows) |
The sum of the items is no longer equal to the total, and we get an inconsistent result. For some reason, we see the new total, but we don't see the new item, even though the item was added before the total was updated!
To understand what's going on, we first have to understand how PostgreSQL processes the query. For that, we use EXPLAIN
to see the execution plan:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
EXPLAIN (COSTS OFF) SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill; QUERY PLAN ══════════════════════════════════════════════════ LockRows -> Hash Right Join Hash Cond: (item.bill_id = bill.bill_id) -> Seq Scan on item -> Hash -> Seq Scan on bill (6 rows) |
You see that locking the rows is the last thing that happens. So PostgreSQL first scans bill
, which will result in the old version of the data (amount = 60
), because that is the snapshot that the query sees. The query also only sees the three original entries from item
. Then, when PostgreSQL tries to lock the row from bill
, it gets stuck behind the row lock from the concurrent UPDATE
statement. The documentation explains what happens now (the “second updater” is our SELECT ... FOR UPDATE
, the emphasis is mine):
In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). [...] If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the
WHERE
clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.
In short, what is locked (and returned) can be a version of the row that is normally not visible in the query's snapshot.
The internal PostgreSQL name for repeating the operation with the latest row version is “EvalPlanQual” (short EPQ) after the function that implements it. It can be instructive to read the README in the source code.
No, it isn't. First, it is working as documented. The behavior is also condoned by the SQL standard: our query was executed with the default transaction isolation level READ COMMITTED
, and all that the standard requires is that a READ COMMITTED
transaction can never see data from uncommitted transactions. Only with the SERIALIZABLE
transaction isolation level does the SQL standard require that you can never experience a transaction anomaly.
Think of the alternatives that PostgreSQL has in this situation and consider if they would be better:
READ COMMITTED
transaction, which would be cumbersome. It is a desirable feature that PostgreSQL does not throw serialization failures in READ COMMITTED
transactions!The correct way to avoid transaction anomalies is to use a higher transaction isolation level. The SQL standard requires that SERIALIZABLE
transactions are free from anomalies (even though Oracle doesn't seem to have got the message). However, in PostgreSQL it is enough to use the REPEATABLE READ
isolation level. The reason is that on the REPEATABLE READ
isolation level, PostgreSQL uses the same snapshot for the whole transaction.. If a concurrent transaction has modified a row after the snapshot was taken, an attempt to update that row will cause a serialization failure. Consequently, SELECT ... FOR UPDATE
, which should behave consistent with UPDATE
, does the same thing:
1 2 3 4 5 6 7 8 |
BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill; ERROR: could not serialize access due to concurrent update |
Then you have to do what you always have to do when you get a serialization failure: rollback and repeat the transaction.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
ROLLBACK; BEGIN ISOLATION LEVEL REPEATABLE READ; SELECT bill_id, item.amount, bill.total FROM bill LEFT JOIN item USING (bill_id) FOR UPDATE OF bill; bill_id │ amount │ total ═════════╪════════╪════════ 1 │ 10.00 │ 100.00 1 │ 20.00 │ 100.00 1 │ 30.00 │ 100.00 1 │ 40.00 │ 100.00 (4 rows) |
Different from SERIALIZABLE
, using REPEATABLE READ
transactions is cheap in PostgreSQL. So this is a solution that is always viable.
You can get surprising transaction anomalies with SELECT ... FOR UPDATE
on the default READ COMMITTED
transaction isolation level. The simple way to avoid them is to use a REPEATABLE READ
transaction.
If you liked this article, you might also be interested in my article about race conditions with constraint triggers.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
In read committed the serialization failure could be catched and the query repeated by the backend, transparently, rather than raised to the client. But would need an implicit savepoint before each query.
It would not require a savepoint, since there is no error. As I detailed in the second point of "Is that a PostgreSQL bug?", a new snapshot would be enough.
Ok, I thought a savepoint was needed to release the row locks. thanks
Hm, I guess you are right; I didn't think of the locks.
Anyway, our discussion is theoretical, because that's not what PostgreSQL does.
Performing the update before the insert could also solve the problem. The issue at hand is that while the bill and the items are treated as separate tables, they work as a whole unit. The lock should be applied to the whole unit, so to the beginning of the operation,otherwise the results are inconsistent.
Another note, the anomaly is probably present without locks too, it's just extremely hard to reproduce due to how fast the execution is. On top of that, the sum is commutative, so even with wrong order of operations, there is no visible difference. If the update used a nested query to compute the total, the issue should be visible. Alternatively, you could use toggling a boolean, that should cause more visible issues too.
Thank you for the article, I keep being concerned: most devs (myself included) think as if the isolation level was "repeatable read" all the times
The order of
UPDATE
andINSERT
does not matter in this example. I only chose this order to make the result seem even weirder.Without locks (
FOR UPDATE
) the anomaly is not there. PostgreSQL just does something different whenFOR UPDATE
is present. Read again :^)I like that you said that most developers think in
REPEATABLE READ
. I would even go as far as claiming that most developers think inSERIALIZABLE
, that is, they don't expect anomalies at all. I think thatREPEATABLE READ
and higher isolation levels are among the most ignored database features. But they can make your life so much easier, particularly in read-only transactions.Unfortunately I won't be able to test this for a few days, however from my understanding (I could totally be wrong) if the update happens before the insert, the update locks the row first, as such the select for update will halt on the first update.
The lock for the update will be released at the commit of the update insert combination, which means the select for update will proceed to read only after both the update and the insert have happened, reading the snapshot of the database after the transaction is completed.
The transaction that sees the anomaly will still see the old version for the inserted rows. Only the row in
bill
that blocked it will be re-evaluated with a more recent snapshot, all other rows will be seen with the query's snapshot. That's the cause of the anomaly: not all rows are seen in the same snapshot.Question, why the following:
-- session 1
BEGIN;
-- notice the reversed order
UPDATE bill SET total = total 40.0
WHERE bill_id = 1;
INSERT INTO item
(item_id, bill_id, amount)
VALUES
(104, 1, 40.0);
-- start new session (session 2)
COMMIT;
-- session 2
begin;
select * from bill where bill_id = 1 for update;
-- will halt here, commit session 1
SELECT bill_id, item.amount, bill.total
FROM bill
LEFT JOIN item USING (bill_id)
FOR UPDATE OF bill;
commit;
Results in the correct output?
I finally got to a computer and you are definitely right.
What it took a while of reading and experimenting, but I think I interpreted the problem in the following: I always thought of
update
anddelete
to be "atomic", but that's not right, postgres interprets those commands as select update and select delete. This in combination with your hint of "lock is the last thing that happens in the query" changes how I interpret updates and deletes:- update = select, lock, update based on the data of the first select
- delete = select, lock, delete based on the data of the first select
The consequence of this is that write operations are affected by READ COMMITTED too, not just "select statements". The example in the postgres documentation is very helpful:
create table website (
website_id bigint PRIMARY KEY,
hits bigint default 0 not null
);
insert into website(website_id, hits) values (1, 9);
insert into website(website_id, hits) values (2, 10);
BEGIN;
UPDATE website SET hits = hits 1;
-- run from another session: DELETE FROM website WHERE hits = 10;
-- Output: DELETE 0
COMMIT;
The DELETE statement deletes 0 because first, it selects all rows that are already committed, that means it selects row with website_id 2 (but not the one with ID 1). Then, it proceeds to lock the row and since the first update statement is still running, it hangs on this row, waiting for the lock to be released.
Now, the update statement completes and the lock is released only when the transaction is completed, that means the MODIFIED data is available for reading.
Delete at this point re-runs the WHERE clause to test if concurrent updates modified the row (which they did), so the row with ID 2 now has hits 11. Given that, the delete is skipped.
DELETE operation is now completed, zero rows deleted.
Even more crazy is that all of the above said is true even for subqueries of a select for update statement, there are not even two queries needed to reproduce this:
create table t (id int4, status TEXT);
insert into t values (1,'FREE');
insert into t values (2,'FREE');
-- Session 1
select * from t
where id = (select min(id) from t where status = 'FREE')
for update;
/* result
id|status|
-- ------
1|FREE |
*/
update t set status = 'NOT_FREE' where id = 1;
-- updated rows: 1
-- Session 2
select * from t
where id = (select min(id) from t where status = 'FREE')
for update;
-- query now waiting
-- Session 1
commit;
-- Session 2
-- now gets this result, so the subquery was not evaluated again after the end of the lock:
/*
id|status |
-- --------
1|NOT_FREE|
*/