© Laurenz Albe 2020
HOT updates are not a new feature. They were introduced by commit 282d2a03dd in 2007 and first appeared in PostgreSQL 8.3.
But since HOT is not covered by the PostgreSQL documentation (although there is a README.HOT in the source tree), it is not as widely known as it should be: Hence this article that explains the concept, shows HOT in action and gives tuning advice.
What is HOT?
HOT is an acronym for “Heap Only Tuple” (and that made a better acronym than Overflow Update CHaining). It is a feature that overcomes some of the inefficiencies of how PostgreSQL handles
Problems with PostgreSQL’s implementation of
PostgreSQL implements multiversioning by keeping the old version of the table row in the table – an
UPDATE adds a new row version (“tuple”) of the row and marks the old version as invalid.
In many respects, an
UPDATE in PostgreSQL is not much different from a
DELETE followed by an
This has a lot of advantages:
- no need for an extra storage area where old row versions are kept
ROLLBACKdoes not have to undo anything and is very fast
- no overflow problem with transactions that modify many rows
But it also has some disadvantages:
- old, obsolete (“dead”) tuples have to be removed from the table eventually (
- heavily updated tables can become “bloated” with dead tuples
- every update requires new index entries to be added, even if no indexed attribute is modified, and modifying an index is much more expensive than modifying the table (order has to be maintained)
UPDATE-heavy workloads are challenging for PostgreSQL. This is the area where HOT updates help.
Let’s create a simple table with 235 rows:
CREATE TABLE mytable ( id integer PRIMARY KEY, val integer NOT NULL ) WITH (autovacuum_enabled = off); INSERT INTO mytable SELECT *, 0 FROM generate_series(1, 235) AS n;
This table is slightly more than one 8KB block long. Let’s look at the physical address (“current tuple ID” or
ctid) of each table row:
SELECT ctid, id, val FROM mytable; ctid | id | val ---------+-----+----- (0,1) | 1 | 0 (0,2) | 2 | 0 (0,3) | 3 | 0 (0,4) | 4 | 0 (0,5) | 5 | 0 ... (0,224) | 224 | 0 (0,225) | 225 | 0 (0,226) | 226 | 0 (1,1) | 227 | 0 (1,2) | 228 | 0 (1,3) | 229 | 0 (1,4) | 230 | 0 (1,5) | 231 | 0 (1,6) | 232 | 0 (1,7) | 233 | 0 (1,8) | 234 | 0 (1,9) | 235 | 0 (235 rows)
ctid consists of two parts: the “block number”, starting at 0, and the “line pointer” (tuple number within the block), starting at 1.
So the first 226 rows fill block 0, and the last 9 are in block 1.
Let’s run an
UPDATE mytable SET val = -1 WHERE id = 42; SELECT ctid, id, val FROM mytable WHERE id = 42; ctid | id | val --------+----+----- (1,10) | 42 | -1 (1 row)
The new row version was added to block 1, which still has free space.
How HOT updates work
To understand HOT, let’s recapitulate the layout of a table page in PostgreSQL:
The array of “line pointers” is stored at the beginning of the page, and each points to an actual tuple. This indirect reference allows PostgreSQL to reorganize a page internally without changing the outside appearance.
A Heap Only Tuple is a tuple that is not referenced from outside the table block. Instead, a “forwarding address” (its line pointer number) is stored in the old row version:
That only works if the new and the old version of the row are in the same block. The external address of the row (the original line pointer) remains unchanged. To access the heap only tuple, PostgreSQL has to follow the “HOT chain” within the block.
Advantages of HOT updates
There are two main advantages of HOT updates:
- PostgreSQL doesn’t have to modify indexes.Since the external address of the tuple stays the same, the original index entry can still be used. Index scans follow the HOT chain to find the appropriate tuple.
- Dead tuples can be removed without the need for
VACUUM.If there are several HOT updates on a single row, the HOT chain grows longer. Now any backend that processes a block and detects a HOT chain with dead tuples (even a
SELECT!) will try to lock and reorganize the block, removing intermediate tuples. This is possible, because there are no outside references to these tuples.This greatly reduces the need for
Requirements for HOT updates
There are two conditions for HOT updates to be used:
- there must be enough space in the block containing the updated row
- there is no index defined on any column whose value it modified
The second condition is not obvious and is required by the current implementation of the feature.
fillfactor on tables to get HOT updates
You can make sure that the second condition from above is satisfied, but how can you make sure that there is enough free space in table blocks? For that, we have the storage parameter
fillfactor. It is a value between 10 and 100 and determines to which percentage
INSERTs will fill a table block. If you choose a value less than the default 100, you can make sure that there is enough room for HOT updates in each table block.
The best value for
fillfactor will depend on the size of the average row (larger rows need lower values) and the workload.
Note that setting
fillfactor on an existing table will not rearrange the data, it will only apply to future
INSERTs. But you can use
VACUUM (FULL) or
CLUSTER to rewrite the table, which will respect the new
There is a simple way to see if your setting is effective and if you get HOT updates:
SELECT n_tup_upd, n_tup_hot_upd FROM pg_stat_user_tables WHERE schemaname = 'myschema' AND relname = 'mytable';
This will show cumulative counts since the last time you called the function
n_tup_hot_upd (the HOT update count) grows about as fast as
n_tup_upd (the update count) to see if you get the HOT updates you want.
An example of HOT updates
Let’s change the
fillfactor for our table and repeat the experiment from above:
TRUNCATE mytable; ALTER TABLE mytable SET (fillfactor = 70); INSERT INTO mytable SELECT *, 0 FROM generate_series(1, 235); SELECT ctid, id, val FROM mytable; ctid | id | val ---------+-----+----- (0,1) | 1 | 0 (0,2) | 2 | 0 (0,3) | 3 | 0 (0,4) | 4 | 0 (0,5) | 5 | 0 ... (0,156) | 156 | 0 (0,157) | 157 | 0 (0,158) | 158 | 0 (1,1) | 159 | 0 (1,2) | 160 | 0 (1,3) | 161 | 0 ... (1,75) | 233 | 0 (1,76) | 234 | 0 (1,77) | 235 | 0 (235 rows)
This time there are fewer tuples in block 0, and some space is still free.
Let’s run the
UPDATE again and see how it looks this time:
UPDATE mytable SET val = -1 WHERE id = 42; SELECT ctid, id, val FROM mytable WHERE id = 42; ctid | id | val ---------+----+----- (0,159) | 42 | -1 (1 row)
The updated row was written into block 0 and was a HOT update.
HOT updates are the one feature that can enable PostgreSQL to handle a workload with many
UPDATE-heavy workloads, it can be a life saver to avoid indexing the updated columns and setting a
fillfactor of less than 100.
If you are interested in performance measurements for HOT updates, look at this article by my co-worker Kaarel Moppel.