© Laurenz Albe 2020
In many PostgreSQL databases, you never have to think or worry about tuning autovacuum. It runs automatically in the background and cleans up without getting in your way.
But sometimes the default configuration is not good enough, and you have to tune autovacuum to make it work properly. This article presents some typical problem scenarios and describes what to do in these cases.
The many tasks of autovacuum
There are many autovacuum configuration parameters, which makes tuning complicated. The main reason is that autovacuum has many different tasks. In a way, autovacuum has to fix all the problems arising from PostgreSQL’s Multiversioning Concurrency Control (MVCC) implementation:
- clean up “dead tuples” left behind after
- update the free space map that keeps track of free space in table blocks
- update the visibility map that is required for index-only scans
- “freeze” table rows so that the transaction ID counter can safely wrap around
- schedule regular
ANALYZEruns to keep the table statistics updated
Depending on which of these functionalities cause a problem, you need different approaches to tuning autovacuum.
Tuning autovacuum for dead tuple cleanup
The best-known autovacuum task is cleaning up of dead tuples from
DELETE operations. If autovacuum cannot keep up with cleaning up dead tuples, you should follow these three tuning steps:
Make sure that nothing keeps autovacuum from reclaiming dead tuples
Check the known reasons that keep vacuum from removing dead tuples. Most often, the culprit are long running transactions. Unless you can remove these obstacles, tuning autovacuum will be useless.
If you cannot fight the problem at its root, you can use the configuration parameter
idle_in_transaction_session_timeout to have PostgreSQL terminate sessions that stay “idle in transaction” for too long. That causes errors on the client side, but may be justified if you have no other way to keep your database operational. Similarly, to fight long-running queries, you can use
Tuning autovacuum to run faster
If autovacuum cannot keep up with cleaning up dead tuples, the solution is to make it work faster. This may seem obvious, but many people fall into the trap of thinking that making autovacuum start earlier or run more often will solve the problem.
VACUUM is a resource-intensive operation, so autovacuum by default operates deliberately slowly. The goal is to have it work in the background without being in the way of normal database operation. But if your workload creates lots of dead tuples, you will have to make it more aggressive:
autovacuum_vacuum_cost_limitfrom its default value of 200 (this is the gentle method)
autovacuum_vacuum_cost_delayfrom its default value of 2 (in older versions: 20!) milliseconds (this is the effective method)
autovacuum_vacuum_cost_delay to 0 will make autovacuum as fast as a manual
VACUUM – that is, as fast as possible.
Since not all tables grow dead tuples at the same pace, it is usually best not to change the global setting in
postgresql.conf, but to change the setting individually for busy tables:
ALTER TABLE busy_table SET (autovacuum_vacuum_cost_delay = 1);
Partitioning a table can also help with getting the job done faster; see below for more.
Change the workload so that fewer dead tuples are generated
If nothing else works, you have to see that fewer dead tuples are generated. Perhaps several
UPDATEs to a single row could be combined to a single
Often you can significantly reduce the number of dead tuples by using “HOT updates”:
- set the
fillfactorfor the table to a value less than 100, so that
INSERTs leave some free space in each block
- make sure that no column that you modify in the
SELECT or DML statement can clean up dead tuples, and there is less need for
Tuning autovacuum for index-only scans
The expensive part of an index scan is looking up the actual table rows. If all columns you want are in the index, it should not be necessary to visit the table at all. But in PostgreSQL you also have to check if a tuple is visible or not, and that information is only stored in the table.
To work around that, PostgreSQL has a “visibility map” for each table. If a table block is marked as “all visible” in the visibility map, you don’t have to visit the table for the visibility information.
So to get true index-only scans, autovacuum has to process the table and update the visibility map frequently. How you configure autovacuum for that depends on the kind of data modifications the query receives:
Tuning autovacuum for index-only scans on tables that receive
For that, you reduce
autovacuum_vacuum_scale_factor for the table, for example
ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.01);
It may be a good idea to also speed up autovacuum as described above.
Tuning autovacuum for index-only scans on tables that receive only
For older PostgreSQL versions, the best you can do is to significantly lower
autovacuum_freeze_max_age. The best value depends on the rate at which you consume transaction IDs. If you consume 100000 transaction IDs per day, and you want the table to be autovacuumed daily, you can set
ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 100000);
To measure the rate of transaction ID consumption, use the function
pg_current_xact_id() from v13 on) twice with a longer time interval in between and take the difference.
Tuning autovacuum to avoid transaction wraparound problems
Normally, autovacuum takes care of that and starts a special “anti-warparound” autovacuum worker whenever the oldest transaction ID in a table is older than
autovacuum_freeze_max_age transactions or the oldest multixact is older than
Make sure than anti-wraparound vacuum can freeze tuples in all tables
Again, you have to make sure that there is nothing that blocks autovacuum from freezing old tuples and advancing
pg_database.datminmxid. Such blockers can be:
- very long running database sessions that keep a transaction open or have temporary tables (autovacuum cannot process temporary tables)
- data corruption, which can make all autovacuum workers fail with an error
To prevent data corruption, use good hardware and always run the latest PostgreSQL minor release.
Tuning tables that receive
DELETEs for anti-wraparound vacuum
On tables that receive
DELETEs, all that you have to do is to see that autovacuum is running fast enough to get done in time (see above).
Tuning tables that receive only
INSERTs for anti-wraparound vacuum
From PostgreSQL v13 on, there are no special considerations in this case, because you get regular autovacuum runs on such tables as well.
Before that, insert-only tables were problematic: since there are no dead tuples, normal autovacuum runs are never triggered. Then, as soon as
autovacuum_multixact_freeze_max_age are exceeded, you may suddenly get a massive autovacuum run that freezes a whole large table, takes a long time and causes massive I/O.
To avoid that, reduce
autovacuum_freeze_max_age for such a table:
ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 10000000);
With very big tables, it can be advisable to use partitioning. The advantage here is you can have several autovacuum workers working on several partitions in parallel, so that the partitioned table as a whole is done faster than a single autovacuum worker could.
If you have many partitions, you should increase
autovacuum_max_workers, the maximum number of autovacuum workers.
Partitioning can also help with vacuuming tables that receive lots of updates, as long as the updates affect all partitions.
Updating table statistics is a “side job” of autovacuum.
You know that automatic statistics collection does not happen often enough if your query plans get better after a manual
ANALYZE of the table.
In that case, you can lower
autovacuum_analyze_scale_factor so that autoanalyze processes the table more often:
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.02);
An alternative is not to use the scale factor, but set
autovacuum_analyze_threshold, so that table statistics are calculated whenever a fixed number of rows changes. For example, to configure a table to be analyzed whenever more than a million rows change:
ALTER TABLE mytable SET ( autovacuum_analyze_scale_factor = 0, autovacuum_analyze_threshold = 1000000 );
Depending on your specific problem and your PostgreSQL version, there are different tuning knobs to make autovacuum do its job correctly. The many tasks of autovacuum and the many configuration parameters don’t make that any easier.
If the tips in this article are not enough for you, consider getting professional consulting.