© Laurenz Albe 2020
(Last updated on 2023-01-24) Recently, while troubleshooting PostgreSQL performance problems, I ran into problems with subtransactions twice. So I thought this was a nice topic for another blog post.
What are subtransactions?
Everybody knows database transactions. In PostgreSQL, which is operating in autocommit mode, you have to start a transaction that spans multiple statements explicitly with
START TRANSACTION and close it with
COMMIT. If you abort the transaction with
ROLLBACK (or end the database session without committing) all work done inside the transaction becomes undone.
Now subtransactions allow you to roll back part of the work done in a transaction. You start a subtransaction inside a transaction with the standard SQL statement:
name” is an identifier (no single quotes!) for the subtransaction. You cannot commit a subtransaction in SQL (it is automatically committed with the transaction that contains it), but you can roll it back with:
ROLLBACK TO SAVEPOINT name;
The use of subtransactions
Subtransactions are useful in longer transactions. In PostgreSQL, any error inside a transaction will abort the transaction:
test=> BEGIN; BEGIN test=*> SELECT 'Some work is done'; ?column? ------------------- Some work is done (1 row) test=*> SELECT 12 / (factorial(0) - 1); ERROR: division by zero test=!> SELECT 'try to do more work'; ERROR: current transaction is aborted, commands ignored until end of transaction block test=!> COMMIT; ROLLBACK
With a transaction that does a lot of work this is quite annoying, because it means that all work done so far is lost. Subtransactions can help you recover from such a situation:
test=> BEGIN; BEGIN test=*> SELECT 'Some work is done'; ?column? ------------------- Some work is done (1 row) test=*> SAVEPOINT a; SAVEPOINT test=*> SELECT 12 / (factorial(0) - 1); ERROR: division by zero test=!> ROLLBACK TO SAVEPOINT a; ROLLBACK test=*> SELECT 'try to do more work'; ?column? --------------------- try to do more work (1 row) test=*> COMMIT; COMMIT
ROLLBACK TO SAVEPOINT starts another subtransaction called
a when it rolls back the old one.
Subtransactions in PL/pgSQL
Even if you have never used the
SAVEPOINT statement, you may already have encountered subtransactions. Written in PL/pgSQL, the code in the previous section looks like this:
BEGIN PERFORM 'Some work is done'; BEGIN -- a block inside a block PERFORM 12 / (factorial(0) - 1); EXCEPTION WHEN division_by_zero THEN NULL; -- ignore the error END; PERFORM 'try to do more work'; END;
Every time you enter a block with an
EXCEPTION clause, a new subtransaction is started. The subtransaction is committed when you leave the block, and rolled back when you enter the exception handler.
Compatibility with other databases
Many other databases handle errors inside a transaction differently. Rather than aborting the complete transaction, they roll back only the statement that caused the error, leaving the transaction itself active.
When migrating or porting from such a database to PostgreSQL, you might be tempted to wrap every single statement in a subtransaction to emulate the above behavior.
The PostgreSQL JDBC driver even has a connection parameter “
autosave” that you can set to “
always” to automatically set a savepoint before each statement and rollback in case of failure.
As the following will show, this alluring technique will lead to serious performance problems.
A performance test case
To demonstrate the problems caused by overuse of subtransactions, I created a test table:
CREATE UNLOGGED TABLE contend ( id integer PRIMARY KEY, val integer NOT NULL ) WITH (fillfactor='50'); INSERT INTO contend (id, val) SELECT i, 0 FROM generate_series(1, 10000) AS i; VACUUM (ANALYZE) contend;
The table is small, unlogged and has a low fillfactor to reduce the required I/O as much as possible. This way, I can observe the effects of subtransactions better.
pgbench, the benchmarking tool shipped with PostgreSQL, to run the following custom SQL script:
BEGIN; PREPARE sel(integer) AS SELECT count(*) FROM contend WHERE id BETWEEN $1 AND $1 + 100; PREPARE upd(integer) AS UPDATE contend SET val = val + 1 WHERE id IN ($1, $1 + 10, $1 + 20, $1 + 30); SAVEPOINT a; \set rnd random(1,990) EXECUTE sel(10 * :rnd + :client_id + 1); EXECUTE upd(10 * :rnd + :client_id); SAVEPOINT a; \set rnd random(1,990) EXECUTE sel(10 * :rnd + :client_id + 1); EXECUTE upd(10 * :rnd + :client_id); ... SAVEPOINT a; \set rnd random(1,990) EXECUTE sel(10 * :rnd + :client_id + 1); EXECUTE upd(10 * :rnd + :client_id); DEALLOCATE ALL; COMMIT;
The script will set 60 savepoints for test number 1, and 90 for test number 2. It uses prepared statements to minimize the overhead of query parsing.
pgbench will replace
:client_id with a number unique to the database session. So as long as there are no more than 10 clients, each client’s
UPDATEs won’t collide with those of other clients, but they will
SELECT each other’s rows.
Running the performance test
Since my machine has 8 cores, I’ll run the tests with 6 concurrent clients for ten minutes.
To see meaningful information with “
perf top”, you need the PostgreSQL debugging symbols installed. This is particularly recommended on production systems.
Test 1 (60 subtransactions)
pgbench -f subtrans.sql -n -c 6 -T 600 transaction type: subtrans.sql scaling factor: 1 query mode: simple number of clients: 6 number of threads: 1 duration: 600 s number of transactions actually processed: 100434 latency average = 35.846 ms tps = 167.382164 (including connections establishing) tps = 167.383187 (excluding connections establishing)
This is what “
perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres” shows while the test is running:
+ 1.86% [.] tbm_iterate + 1.77% [.] hash_search_with_hash_value 1.75% [.] AllocSetAlloc + 1.36% [.] pg_qsort + 1.12% [.] base_yyparse + 1.10% [.] TransactionIdIsCurrentTransactionId + 0.96% [.] heap_hot_search_buffer + 0.96% [.] LWLockAttemptLock + 0.85% [.] HeapTupleSatisfiesVisibility + 0.82% [.] heap_page_prune + 0.81% [.] ExecInterpExpr + 0.80% [.] SearchCatCache1 + 0.79% [.] BitmapHeapNext + 0.64% [.] LWLockRelease + 0.62% [.] MemoryContextAllocZeroAligned + 0.55% [.] _bt_checkkeys 0.54% [.] hash_any + 0.52% [.] _bt_compare 0.51% [.] ExecScan
Test 2 (90 subtransactions)
pgbench -f subtrans.sql -n -c 6 -T 600 transaction type: subtrans.sql scaling factor: 1 query mode: simple number of clients: 6 number of threads: 1 duration: 600 s number of transactions actually processed: 41400 latency average = 86.965 ms tps = 68.993634 (including connections establishing) tps = 68.993993 (excluding connections establishing)
This is what “
perf top --no-children --call-graph=fp --dsos=/usr/pgsql-12/bin/postgres” has to say:
+ 10.59% [.] LWLockAttemptLock + 7.12% [.] LWLockRelease + 2.70% [.] LWLockAcquire + 2.40% [.] SimpleLruReadPage_ReadOnly + 1.30% [.] TransactionIdIsCurrentTransactionId + 1.26% [.] tbm_iterate + 1.22% [.] hash_search_with_hash_value + 1.08% [.] AllocSetAlloc + 0.77% [.] heap_hot_search_buffer + 0.72% [.] pg_qsort + 0.72% [.] base_yyparse + 0.66% [.] SubTransGetParent + 0.62% [.] HeapTupleSatisfiesVisibility + 0.54% [.] ExecInterpExpr + 0.51% [.] SearchCatCache1
Even if we take into account that the transactions in test 2 are one longer, this is still a performance regression of 60% compared with test 1.
Implementation of subtransactions
To understand what is going on, we have to understand how transactions and subtransactions are implemented.
Whenever a transaction or subtransaction modifies data, it is assigned a transaction ID. PostgreSQL keeps track of these transaction IDs in the commit log, which is persisted in the
pg_xact subdirectory of the data directory.
However, there are some differences between transactions and subtransactions:
- Each subtransaction has a containing transaction or subtransaction (the “parent”)
- Committing a subtransaction does not require a WAL flush
- There can be only one transaction per database session, but many subtransactions
The information which (sub)transaction is the parent of a given subtransaction is persisted in the
pg_subtrans subdirectory of the data directory. Since this information becomes obsolete as soon as the containing transaction has ended, this data do not have to be preserved across a shutdown or crash.
Subtransactions and visibility
The visibility of a row version (“tuple”) in PostgreSQL is determined by the
xmax system columns, which contain the transaction ID of the creating and destroying transactions. If the transaction ID stored is that of a subtransaction, PostgreSQL also has to consult the state of the containing (sub)transaction to determine if the transaction ID is valid or not.
To determine which tuples a statement can see, PostgreSQL takes a snapshot of the database at the beginning of the statement (or the transaction). Such a snapshot consists of:
- a maximum transaction ID: everything from that transaction on is invisible
- the list of transactions and subtransactions that were active when the snapshot was taken
- the command number of the earliest visible command in the current (sub)transaction
A snapshot is initialized by looking at the process array, which is stored in shared memory and contains information about all currently running backends. This, of course, contains the current transaction ID of the backend and has room for at most 64 non-aborted subtransactions per session. If there are more than 64 such subtransactions, the snapshot is marked as suboverflowed.
Explaining the test results
A suboverflowed snapshot does not contain all data required to determine visibility, so PostgreSQL will occasionally have to resort to
pg_subtrans. These pages are cached in shared buffers, but you can see the overhead of looking them up in the high rank of
SimpleLruReadPage_ReadOnly in the
perf output. Other transactions have to update
pg_subtrans to register subtransactions, and you can see in the
perf output how they vie for lightweight locks with the readers.
Diagnosing the problem of too many subtransactions
Apart from looking at “
perf top”, there are other symptoms that point at the direction of this problem:
- Your workload performs well when run single-threaded, but badly when run in several concurrent database sessions
- You often see the wait event “
SubtransControlLock” in older releases) in
- From PostgreSQL v13 on, you can look at the monitoring view
pg_stat_slruand check if
blks_readin the row with
name = 'Subtrans'keeps growing. That indicates that PostgreSQL has to read disk pages because it needs to access subtransactions that are no longer cached.
- If you export a snapshot using the “
pg_export_snapshot()” function, the resulting file in the
pg_snapshotssubdirectory of the data directory will contain the line “
sof:1” to indicate that the subtransaction array overflowed
- From PostgreSQL v16 on, you can call the function
pg_stat_get_backend_subxact(integer)to return the number of subtransactions for a backend process and whether the subtransaction cache overflowed or not
Subtransactions are a great tool, but you should use them wisely. If you need concurrency, don’t start more than 64 subtransactions per transaction.
The diagnostics shown in this article should help you determine whether you are suffering from the problem, or not.
Finding the cause of that problem can be tricky. For example, it may not be obvious that a function with an exception handler that you call for each result row of an SQL statement (perhaps in a trigger) starts a new subtransaction.