Transaction anomalies with SELECT FOR UPDATE

novel ways to deal with transaction anomalies — © Laurenz Albe 2022

Table of Contents

I was recently confronted with a nice example of how adding FOR UPDATE to a query can introduce transaction anomalies. This article will explain how that happens and how you can cope with the problem. Be ready to learn some PostgreSQL implementation details!

The example

Setting the stage

We have a table of bills and a table of items on a bill:

CREATE TABLE bill (
   bill_id bigint PRIMARY KEY,
   total numeric(15,2) DEFAULT 0 NOT NULL
);

CREATE TABLE item (
   item_id bigint PRIMARY KEY,
   bill_id bigint REFERENCES bill NOT NULL,
   amount numeric(15,2) NOT NULL
);

CREATE TABLE bill (

bill_id bigint PRIMARY KEY,

total numeric(15,2) DEFAULT 0 NOT NULL

);

CREATE TABLE item (

item_id bigint PRIMARY KEY,

bill_id bigint REFERENCES bill NOT NULL,

amount numeric(15,2) NOT NULL

);

Let's insert some data:

INSERT INTO bill
   (bill_id, total)
VALUES
   (1, 60.0);

INSERT INTO item
   (item_id, bill_id, amount)
VALUES
   (101, 1, 10.0),
   (102, 1, 20.0),
   (103, 1, 30.0);

INSERT INTO bill

(bill_id, total)

VALUES

(1, 60.0);

INSERT INTO item

(item_id, bill_id, amount)

VALUES

(101, 1, 10.0),

(102, 1, 20.0),

(103, 1, 30.0);

Adding a new item to the bill

Adding a new item would be a simple INSERT, but there is a catch: bill.total should always be the sum of the item.amount for all items on that bill. This constraint introduces a redundancy to the data model. Normally, the best practice is not to persist that sum in the database, but to calculate it on the fly when you query the database. But there are certain reasons that can justify storing redundant data:

calculating the sum on the fly is too expensive, for example if the calculation is performed very often of the bills are large
you want to place a constraint on the sum, for example to guarantee that it is never negative

Now the only proper way to do this is a trigger, but for simplicity's sake, we will do it explicitly in the transaction that adds the new item:

BEGIN;

INSERT INTO item
   (item_id, bill_id, amount)
VALUES
   (104, 1, 40.0);

UPDATE bill SET total = total + 40.0
WHERE bill_id = 1;

COMMIT;

BEGIN;

INSERT INTO item

(item_id, bill_id, amount)

VALUES

(104, 1, 40.0);

UPDATE bill SET total = total + 40.0

WHERE bill_id = 1;

COMMIT;

A concurrent report

To query the bill data, we could proceed like this:

SELECT bill_id, item.amount, bill.total
FROM bill
   LEFT JOIN item USING (bill_id);

SELECT bill_id, item.amount, bill.total

FROM bill

LEFT JOIN item USING (bill_id);

If we run the report before the inserting transaction commits, the result will look as follows:

&nbsp;bill_id │ amount │ total 
═════════╪════════╪═══════
       1 │  10.00 │ 60.00
       1 │  20.00 │ 60.00
       1 │  30.00 │ 60.00
(3 rows)

 bill_id │ amount │ total

═════════╪════════╪═══════

1 │ 10.00 │ 60.00

1 │ 20.00 │ 60.00

1 │ 30.00 │ 60.00

(3 rows)

If we run the report after the inserting transaction commits, we get the following:

&nbsp;bill_id │ amount │ total  
═════════╪════════╪════════
       1 │  10.00 │ 100.00
       1 │  20.00 │ 100.00
       1 │  30.00 │ 100.00
       1 │  40.00 │ 100.00
(4 rows)

 bill_id │ amount │ total

═════════╪════════╪════════

1 │ 10.00 │ 100.00

1 │ 20.00 │ 100.00

1 │ 30.00 │ 100.00

1 │ 40.00 │ 100.00

(4 rows)

Can there be any transaction anomalies in this example?

A transaction anomaly is a race condition in which the concurrent execution of multiple transactions produces a result that can never happen with any serialized execution of the transactions. Note that not all serialized execution orders need to produce the same result: in our example above, it makes a difference if we run the report before or after the inserting transaction.

In our example, concurrent execution of the transactions will never produce an anomaly. No matter what we do, the report will always be consistent, and the total will match the sum of the items. The explanation for this is that PostgreSQL takes a snapshot at the beginning of the reporting query. Such a snapshot determines which transactions are visible to the query and which ones are not. Now either the reporting transaction is part of the snapshot or not, and the query will either see all or none of the data modifications from the inserting transaction.

A surprising transaction anomaly with `SELECT ... FOR UPDATE`

We can add FOR UPDATE to the reporting query, for example if we plan to delete bills that have no items. FOR UPDATE takes a row lock on the rows returned by the query, which is useful to prevent concurrent transactions from modifying these data:

SELECT bill_id, item.amount, bill.total
FROM bill
   LEFT JOIN item USING (bill_id)
FOR UPDATE OF bill;

SELECT bill_id, item.amount, bill.total

FROM bill

LEFT JOIN item USING (bill_id)

FOR UPDATE OF bill;

This only locks the rows in bill. Note that it would be an error to try and lock the rows returned from item (the reason for that would be material for another article). Also, note that FOR UPDATE is the correct lock if we intend to DELETE, and the weaker FOR NO KEY UPDATE lock would be appropriate if we intend to UPDATE.

Now timing is of the essence. We run the modified reporting query after the inserting transaction has performed the UPDATE, but before the COMMIT. That will cause the reporting query to hang (because the row in bill is locked), but after the COMMIT, the lock is released, and we get the following output:

&nbsp;bill_id │ amount │ total  
═════════╪════════╪════════
       1 │  10.00 │ 100.00
       1 │  20.00 │ 100.00
       1 │  30.00 │ 100.00
(3 rows)

 bill_id │ amount │ total

═════════╪════════╪════════

1 │ 10.00 │ 100.00

1 │ 20.00 │ 100.00

1 │ 30.00 │ 100.00

(3 rows)

The sum of the items is no longer equal to the total, and we get an inconsistent result. For some reason, we see the new total, but we don't see the new item, even though the item was added before the total was updated!

Explanation for the transaction anomaly

To understand what's going on, we first have to understand how PostgreSQL processes the query. For that, we use EXPLAIN to see the execution plan:

EXPLAIN (COSTS OFF)
SELECT bill_id, item.amount, bill.total
FROM bill
   LEFT JOIN item USING (bill_id)
FOR UPDATE OF bill;

                    QUERY PLAN                    
══════════════════════════════════════════════════
 LockRows
   ->  Hash Right Join
         Hash Cond: (item.bill_id = bill.bill_id)
         ->  Seq Scan on item
         ->  Hash
               ->  Seq Scan on bill
(6 rows)

EXPLAIN (COSTS OFF)

SELECT bill_id, item.amount, bill.total

FROM bill

LEFT JOIN item USING (bill_id)

FOR UPDATE OF bill;

QUERY PLAN

══════════════════════════════════════════════════

LockRows

-> Hash Right Join

Hash Cond: (item.bill_id = bill.bill_id)

-> Seq Scan on item

-> Hash

-> Seq Scan on bill

(6 rows)

You see that locking the rows is the last thing that happens. So PostgreSQL first scans bill, which will result in the old version of the data (amount = 60), because that is the snapshot that the query sees. The query also only sees the three original entries from item. Then, when PostgreSQL tries to lock the row from bill, it gets stuck behind the row lock from the concurrent UPDATE statement. The documentation explains what happens now (the “second updater” is our SELECT ... FOR UPDATE, the emphasis is mine):

In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). [...] If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row.

In short, what is locked (and returned) can be a version of the row that is normally not visible in the query's snapshot.

The internal PostgreSQL name for repeating the operation with the latest row version is “EvalPlanQual” (short EPQ) after the function that implements it. It can be instructive to read the README in the source code.

Is that a PostgreSQL bug?

No, it isn't. First, it is working as documented. The behavior is also condoned by the SQL standard: our query was executed with the default transaction isolation level READ COMMITTED, and all that the standard requires is that a READ COMMITTED transaction can never see data from uncommitted transactions. Only with the SERIALIZABLE transaction isolation level does the SQL standard require that you can never experience a transaction anomaly.

Think of the alternatives that PostgreSQL has in this situation and consider if they would be better:

Can PostgreSQL lock the old version of the row? No, that would not make sense, because a subsequent update can only update the most recent version of a row, which would then be different from what it told you it was.
When the lock is released and the row has changed, PostgreSQL could abandon everything, take a new snapshot and start over from the beginning. That would be complicated and not great for performance. If there are many concurrent transactions, that could mean that a query keeps repeating and never gets done.
PostgreSQL could throw a serialization failure. The SQL standard allows this explicitly “when [the database] detects the inability to guarantee the serializability of two or more concurrent SQL-transactions”. But that would mean that you have to be ready to repeat transactions in a READ COMMITTED transaction, which would be cumbersome. It is a desirable feature that PostgreSQL does not throw serialization failures in READ COMMITTED transactions!

How can I avoid these transaction anomalies?

The correct way to avoid transaction anomalies is to use a higher transaction isolation level. The SQL standard requires that SERIALIZABLE transactions are free from anomalies (even though Oracle doesn't seem to have got the message). However, in PostgreSQL it is enough to use the REPEATABLE READ isolation level. The reason is that on the REPEATABLE READ isolation level, PostgreSQL uses the same snapshot for the whole transaction.. If a concurrent transaction has modified a row after the snapshot was taken, an attempt to update that row will cause a serialization failure. Consequently, SELECT ... FOR UPDATE, which should behave consistent with UPDATE, does the same thing:

BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT bill_id, item.amount, bill.total
FROM bill
   LEFT JOIN item USING (bill_id)
FOR UPDATE OF bill;

ERROR:  could not serialize access due to concurrent update

BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT bill_id, item.amount, bill.total

FROM bill

LEFT JOIN item USING (bill_id)

FOR UPDATE OF bill;

ERROR: could not serialize access due to concurrent update

Then you have to do what you always have to do when you get a serialization failure: rollback and repeat the transaction.

ROLLBACK;

BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT bill_id, item.amount, bill.total
FROM bill
   LEFT JOIN item USING (bill_id)
FOR UPDATE OF bill;

 bill_id │ amount │ total  
═════════╪════════╪════════
       1 │  10.00 │ 100.00
       1 │  20.00 │ 100.00
       1 │  30.00 │ 100.00
       1 │  40.00 │ 100.00
(4 rows)

ROLLBACK;

BEGIN ISOLATION LEVEL REPEATABLE READ;

SELECT bill_id, item.amount, bill.total

FROM bill

LEFT JOIN item USING (bill_id)

FOR UPDATE OF bill;

bill_id │ amount │ total

═════════╪════════╪════════

1 │ 10.00 │ 100.00

1 │ 20.00 │ 100.00

1 │ 30.00 │ 100.00

1 │ 40.00 │ 100.00

(4 rows)

Different from SERIALIZABLE, using REPEATABLE READ transactions is cheap in PostgreSQL. So this is a solution that is always viable.

Conclusion

You can get surprising transaction anomalies with SELECT ... FOR UPDATE on the default READ COMMITTED transaction isolation level. The simple way to avoid them is to use a REPEATABLE READ transaction.

If you liked this article, you might also be interested in my article about race conditions with constraint triggers.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes

Article Rating

12 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Franck Pachot

1 year ago

In read committed the serialization failure could be catched and the query repeated by the backend, transparently, rather than raised to the client. But would need an implicit savepoint before each query.

laurenz

1 year ago

Reply to Franck Pachot

It would not require a savepoint, since there is no error. As I detailed in the second point of "Is that a PostgreSQL bug?", a new snapshot would be enough.

Franck Pachot

1 year ago

Reply to laurenz

Ok, I thought a savepoint was needed to release the row locks. thanks

laurenz

1 year ago

Reply to Franck Pachot

Hm, I guess you are right; I didn't think of the locks.
Anyway, our discussion is theoretical, because that's not what PostgreSQL does.

Francesco Belladonna

1 year ago

Performing the update before the insert could also solve the problem. The issue at hand is that while the bill and the items are treated as separate tables, they work as a whole unit. The lock should be applied to the whole unit, so to the beginning of the operation,otherwise the results are inconsistent.

Another note, the anomaly is probably present without locks too, it's just extremely hard to reproduce due to how fast the execution is. On top of that, the sum is commutative, so even with wrong order of operations, there is no visible difference. If the update used a nested query to compute the total, the issue should be visible. Alternatively, you could use toggling a boolean, that should cause more visible issues too.

Thank you for the article, I keep being concerned: most devs (myself included) think as if the isolation level was "repeatable read" all the times

laurenz

1 year ago

Reply to Francesco Belladonna

The order of UPDATE and INSERT does not matter in this example. I only chose this order to make the result seem even weirder.

Without locks (FOR UPDATE) the anomaly is not there. PostgreSQL just does something different when FOR UPDATE is present. Read again :^)

I like that you said that most developers think in REPEATABLE READ. I would even go as far as claiming that most developers think in SERIALIZABLE, that is, they don't expect anomalies at all. I think that REPEATABLE READ and higher isolation levels are among the most ignored database features. But they can make your life so much easier, particularly in read-only transactions.

Francesco Belladonna

1 year ago

Reply to laurenz

Unfortunately I won't be able to test this for a few days, however from my understanding (I could totally be wrong) if the update happens before the insert, the update locks the row first, as such the select for update will halt on the first update.
The lock for the update will be released at the commit of the update+insert combination, which means the select for update will proceed to read only after both the update and the insert have happened, reading the snapshot of the database after the transaction is completed.

laurenz

1 year ago

Reply to Francesco Belladonna

The transaction that sees the anomaly will still see the old version for the inserted rows. Only the row in bill that blocked it will be re-evaluated with a more recent snapshot, all other rows will be seen with the query's snapshot. That's the cause of the anomaly: not all rows are seen in the same snapshot.

Francesco Belladonna

1 year ago

Reply to laurenz

Question, why the following:

-- session 1 BEGIN;


-- notice the reversed order

UPDATE bill SET total = total + 40.0

WHERE bill_id = 1;
INSERT INTO item

   (item_id, bill_id, amount)

VALUES

   (104, 1, 40.0);
-- start new session (session 2)
COMMIT;
-- session 2

begin;

select * from bill where bill_id = 1 for update;

-- will halt here, commit session 1
SELECT bill_id, item.amount, bill.total

FROM bill

   LEFT JOIN item USING (bill_id)

FOR UPDATE OF bill;

commit;

Results in the correct output?

Francesco Belladonna

1 year ago

Reply to laurenz

I finally got to a computer and you are definitely right.

What it took a while of reading and experimenting, but I think I interpreted the problem in the following: I always thought of update and delete to be "atomic", but that's not right, postgres interprets those commands as select + update and select + delete. This in combination with your hint of "lock is the last thing that happens in the query" changes how I interpret updates and deletes:

- update = select, lock, update based on the data of the first select
- delete = select, lock, delete based on the data of the first select

The consequence of this is that write operations are affected by READ COMMITTED too, not just "select statements". The example in the postgres documentation is very helpful:

create table website ( website_id bigint PRIMARY KEY, hits bigint default 0 not null );


insert into website(website_id, hits) values (1, 9);

insert into website(website_id, hits) values (2, 10);

BEGIN; UPDATE website SET hits = hits + 1; -- run from another session: DELETE FROM website WHERE hits = 10; -- Output: DELETE 0 COMMIT;

The DELETE statement deletes 0 because first, it selects all rows that are already committed, that means it selects row with website_id 2 (but not the one with ID 1). Then, it proceeds to lock the row and since the first update statement is still running, it hangs on this row, waiting for the lock to be released.
Now, the update statement completes and the lock is released only when the transaction is completed, that means the MODIFIED data is available for reading.

Delete at this point re-runs the WHERE clause to test if concurrent updates modified the row (which they did), so the row with ID 2 now has hits 11. Given that, the delete is skipped.

DELETE operation is now completed, zero rows deleted.

PgM

6 months ago

Even more crazy is that all of the above said is true even for subqueries of a select for update statement, there are not even two queries needed to reproduce this:

create table t (id int4, status TEXT);


insert into t values (1,'FREE');

insert into t values (2,'FREE');
-- Session 1

select * from t

where id = (select min(id) from t where status = 'FREE')

for update;

/* result

id|status|

-- ------

 1|FREE  |

 */

update t set status = 'NOT_FREE' where id = 1;

-- updated rows: 1
-- Session 2

select * from t

where id = (select min(id) from t where status = 'FREE')

for update;

-- query now waiting
-- Session 1

commit;

-- Session 2 -- now gets this result, so the subquery was not evaluated again after the end of the lock: /* id|status | -- -------- 1|NOT_FREE| */

Stay tuned with our

Transaction anomalies with SELECT FOR UPDATE

The example

Setting the stage

Adding a new item to the bill

A concurrent report

Can there be any transaction anomalies in this example?

A surprising transaction anomaly with `SELECT ... FOR UPDATE`

Explanation for the transaction anomaly

Is that a PostgreSQL bug?

How can I avoid these transaction anomalies?

Conclusion

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Transaction anomalies with SELECT FOR UPDATE

The example

Setting the stage

Adding a new item to the bill

A concurrent report

Can there be any transaction anomalies in this example?

A surprising transaction anomaly with SELECT ... FOR UPDATE

Explanation for the transaction anomaly

Is that a PostgreSQL bug?

How can I avoid these transaction anomalies?

Conclusion

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

A surprising transaction anomaly with `SELECT ... FOR UPDATE`