VACUUM is one of the most important maintenance tasks in a PostgreSQL database. Over the years, many improvements were made to VACUUM and especially autovacuum to meet heavy workloads and make PostgreSQL fit for future tasks.
One of the optimizations is VACUUM skipping indexes and just focusing on heap tuples. This saves time and makes VACUUM/autovacuum more aggressive at the cost of leaving indexes aside. This behavior can be controlled by the INDEX_CLEANUP ON|OFF|AUTO option to VACUUM, introduced in PostgreSQL 12 and improved in PostgreSQL 14 to be used automatically in appropriate cases. There is also a relation storage parameter that allows controlling the behavior on table level.
Table of Contents
Recently, a customer asked for advice, since he has problems interpreting metrics from his database monitoring that he couldn't explain. For longer periods, VACUUM
and autovacuum
respectively don't seem to free dead tuples, causing them to show up in metrics. Especially when doing this manually, he keeps seeing dead tuples in e.g. pg_stat_user_tables system view, but only sometimes. And even more strangely, they seem to reoccur after manual VACUUM (ANALYZE) ...
PostgreSQL has an advanced technique to place tuples into pages, which are 8kB in size (if not compiled otherwise). Tuples are addressed not directly within their physical location within that page, but rather pointed-to by so-called line pointers
. Each tuple gets a special flag showing its purpose and lifetime, like the following:
1 2 3 4 5 6 7 8 |
/* * lp_flags has these possible states. An UNUSED line pointer is available * for immediate re-use, the other states are not. */ #define LP_UNUSED 0 /* unused (should always have lp_len=0) */ #define LP_NORMAL 1 /* used (should always have lp_len>0) */ #define LP_REDIRECT 2 /* HOT redirect (should have lp_len=0) */ #define LP_DEAD 3 /* dead, may or may not have storage */ |
An LP_NORMAL
flag indicates that the line pointer is currently in use and points to a valid tuple. LP_UNUSED
shows that this line pointer can immediately be reused by a new tuple. LP_DEAD
is an interesting flag though: the line pointer flagged this way doesn't necessarily point to a valid tuple anymore (for example, no space used anymore), but is still mandatory and can't be reused at the moment.
LP_REDIRECT
is something special: the tuple pointed to by such a flagged line pointer is the first of a Heap Only Tuple (HOT) tuple chain. Read this excellent article by Laurenz Albe on HOT updates in PostgreSQL for better performance.
The following graphic illustrates the described page layout in a simplified form:
As already mentioned, the PostgreSQL page layout has an array at the beginning of each page right after the page header. This line pointer array can grow from the beginning up to MaxHeapTuplesPerPage, which is platform dependent. Each entry in this array, aka line pointer, points to a specific tuple within the page. Tuples are stored from the end of the page (leaving room for some reserved space), while the line pointer array grows from the beginning. This makes it very flexible in placing and addressing tuple space within the page. A line pointer is a structure which, along with other properties, stores the specific flags in the lp_flag member.
The cause of this previously mentioned behavior is the "skip index" optimization in combination with a manual VACUUM (ANALYZE) command. Since PostgreSQL 14 (auto)VACUUM is able to perform the "skip indexes" optimization automatically, whether this optimization is used during vacuum is determined by the following formula (via source):
1 2 3 4 5 6 7 8 9 10 11 |
/* * Threshold that controls whether we bypass index vacuuming and heap * vacuuming as an optimization */ #define BYPASS_THRESHOLD_PAGES 0.02 /* i.e. 2% of rel_pages */ [...] threshold = (double) vacrel->rel_pages * BYPASS_THRESHOLD_PAGES; bypass = (vacrel->lpdead_item_pages < threshold && (TidStoreMemoryUsage(vacrel->dead_items) < (32L * 1024L * 1024L))); |
When VACUUM or autovacuum decide to skip vacuuming indexes, they cannot kill dead heap tuples entirely. Since the index tuples are still pointing to these tuples, they have to keep at least the line pointer entry in the page.
So to simplify things: If there aren't already more than 2 percent of LP_DEAD
tuples in the table and the TID storage is not yet exceeding 32 MB, then (auto)VACUUM will skip vacuuming the indexes.
By skipping the indexes, VACUUM can save some significant work, depending on the index size and number of indexes of a relation. However, it can't just free the line pointers within the table, since index entries are still pointing to the tuples. In this case, VACUUM marks these line pointers as LP_DEAD
, indicating that they are still in use, but considered dead and will be probably be freed soon. This also happens during normal VACUUM operation, but in the 2nd phase of each VACUUM run they will finally be set to LP_UNUSED
.
The described discrepancy in PostgreSQL dead tuple metrics now happens if VACUUM is combined with ANALYZE, or if an ANALYZE is run some time after the last VACUUM with skipped index cleanup. ANALYZE still recognizes the tuples marked LP_DEAD
as dead rows, which might cause them to suddenly reappear in the system metrics. This isn't a problem, but it is exactly this surprising condition that happens for people monitoring their PostgreSQL instance very closely.
The following example demonstrates the described behavior. First, create a small test table and insert a bunch of tuples:
1 2 3 4 |
CREATE TABLE test(id bigint primary key); INSERT INTO test SELECT t.id FROM generate_series(1, 100000) AS t(id); |
Since autovacuum
kicks in within autovacuum_naptime
and we just populated the table, we wait until it has done its initial job:
1 2 3 4 5 |
=# \x =# SELECT last_autovacuum FROM pg_stat_user_tables WHERE relname = 'test'; ─[ RECORD 1 ]───┬────────────────────────────── last_autovacuum │ 2025-04-24 15:08:06.541069+02 |
Initially, the pg_stat_user_tables
view shows the following statistics for this table now:
1 2 3 4 5 6 |
=# SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'test'; ─[ RECORD 1 ]────── n_live_tup │ 100000 n_dead_tup │ 0 |
Now, we decide to delete a bunch of tuples. To showcase the VACUUM
skip-index optimization, we try to keep the number of those tuples below the 2 percent page threshold. Normally autovacuum_vacuum_scale_factor
is set to 0.2
per default, so this prevents autovacuum
from kicking in, again.
1 2 3 4 5 6 7 8 |
=# DELETE FROM test WHERE id BETWEEN 1000 AND 1999; DELETE 1000 =# SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'test'; ─[ RECORD 1 ]───── n_live_tup │ 99000 n_dead_tup │ 1000 |
This now shows 1000 dead tuples. To showcase, we issue an explicit VACUUM
now, which should skip the index associated with the primary key constraint defined on the id
column. We use the VERBOSE
option to print some additional information, but especially one interesting line:
1 2 3 4 5 |
=# VACUUM (VERBOSE) test; [... many lines of output ... ] index scan bypassed: 5 pages from table (1.13% of total) have 1000 dead item identifiers [...] |
So VACUUM
tells us that it has skipped the indexes. Let's investigate the statistics of this table again:
1 2 3 4 5 |
SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'test'; ─[ RECORD 1 ]───── n_live_tup │ 98885 n_dead_tup │ 0 |
This is what we expected. However, we now want to ANALYZE
this table, too, so that we have current statistics for the optimizer:
1 2 3 4 5 6 7 8 9 10 11 |
=# ANALYZE (VERBOSE) test; INFO: analyzing "bernd.test" INFO: "test": scanned 443 of 443 pages, containing 99000 live rows and 1000 dead rows; 30000 rows in sample, 99000 estimated total rows ANALYZE Time: 26,105 ms =# SELECT n_live_tup, n_dead_tup FROM pg_stat_user_tables WHERE relname = 'test'; ─[ RECORD 1 ]───── n_live_tup │ 99000 n_dead_tup │ 1000 |
Nothing else happened in between, but the statistics are showing the dead tuples again. Also note that ANALYZE (VERBOSE)
reported that it has found 1000 dead tuples. This is because VACUUM
cannot free the tuples immediately, as indexes are still addressing the line pointers
within the pages of the table it belongs to. ANALYZE
still recognizes them though, because they're relevant for the data distribution on a page and thus important in order to calculate the costs of a query properly. Note: tuples marked as LP_DEAD
are gone, but we can't delete their line pointers
until the index is cleaned up accordingly. So given the right circumstances it might seem to you that dead tuples believed to have been successfully deleted suddenly "reappear", which is just a detail of the skip-index optimization of VACUUM
. Once VACUUM
finds the number of tuples needed to exceed the index cleanup threshold, those line pointers will be freed for reuse again, too.
VACCUM
and autovacuum
can skip cleaning up indexes since PostgreSQL 14 if they find that the number of dead tuples is below a specific threshold. This is to save on unnecessary work, since cleaning up indexes can be an expensive operation and it’s better to aggregate some amount of work to do that. This optimization can lead to some confusing behavior of reappearing dead tuples in the table statistics (e.g. pg_stat_user_tables
view). If DBAs find this optimization to be suboptimal for their workload, they can force index cleanups via the INDEX_CLEANUP ON
parameter for manual VACUUM
, or even deactivate that optimization on table level:
1 2 3 |
=# ALTER TABLE test SET (VACUUM_INDEX_CLEANUP = ON); ALTER TABLE |
Though, I highly recommend reviewing this setting carefully to determine if it’s really needed. Otherwise VACUUM
does too much unnecessary work, which might have negative impacts on very large tables and keeping up with potential high write traffic there.
Leave a Reply