Unlock the tremendous energy of the vacuum!
© xkcd.xom (Randall Munroe) under the Creative Commons Attribution-NonCommercial 2.5 License

Whenever rows in a PostgreSQL table are updated or deleted, dead rows are left behind. VACUUM gets rid of them so that the space can be reused. If a table doesn’t get vacuumed, it will get bloated, which wastes disk space and slows down sequential table scans (and – to a smaller extent – index scans).

VACUUM also takes care of freezing table rows so to avoid problems when the transaction ID counter wraps around, but that’s a different story.

Normally you don’t have to take care of all that, because the autovacuum daemon built into PostgreSQL does that for you.

The problem

If your tables get bloated, the first thing you check is whether autovacuum has processed them or not:

SELECT schemaname, relname, n_live_tup, n_dead_tup, last_autovacuum
FROM pg_stat_all_tables
ORDER BY n_dead_tup
    / (n_live_tup
       * current_setting('autovacuum_vacuum_scale_factor')::float8
          + current_setting('autovacuum_vacuum_threshold')::float8)

If your bloated table does not show up here, n_dead_tup is zero and last_autovacuum is NULL, you might have a problem with the statistics collector.

If the bloated table is right there on top, but last_autovacuum is NULL, you might need to configure autovacuum to be more aggressive so that it gets done with the table.

But sometimes the result will look like this:

 schemaname |    relname   | n_live_tup | n_dead_tup |   last_autovacuum
 laurenz    | vacme        |      50000 |      50000 | 2018-02-22 13:20:16
 pg_catalog | pg_attribute |         42 |        165 |
 pg_catalog | pg_amop      |        871 |        162 |
 pg_catalog | pg_class     |          9 |         31 |
 pg_catalog | pg_type      |         17 |         27 |
 pg_catalog | pg_index     |          5 |         15 |
 pg_catalog | pg_depend    |       9162 |        471 |
 pg_catalog | pg_trigger   |          0 |         12 |
 pg_catalog | pg_proc      |        183 |         16 |
 pg_catalog | pg_shdepend  |          7 |          6 |
(10 rows)

Here autovacuum has recently run, but it didn’t free the dead tuples!

We can verify the problem by running VACUUM (VERBOSE):

test=> VACUUM (VERBOSE) vacme;
INFO:  vacuuming "laurenz.vacme"
INFO:  "vacme": found 0 removable, 100000 nonremovable row versions in
       443 out of 443 pages
DETAIL:  50000 dead row versions cannot be removed yet,
         oldest xmin: 22300
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.

Why won’t VACUUM remove the dead rows?

VACUUM can only remove those row versions (also known as “tuples”) that are not needed any more. A tuple is not needed if the transaction ID of the deleting transaction (as stored in the xmax system column) is older than the oldest transaction still active in the PostgreSQL database (or the whole cluster for shared tables).

This value (22300 in the VACUUM output above) is called the “xmin horizon”.

There are three things that can hold back this xmin horizon in a PostgreSQL cluster:

  1. Long-running transactions:

    You can find those and their xmin value with the following query:

    SELECT pid, datname, usename, state, backend_xmin
    FROM pg_stat_activity
    WHERE backend_xmin IS NOT NULL
    ORDER BY age(backend_xmin) DESC;

    You can use the pg_terminate_backend() function to terminate the database session that is blocking your VACUUM.

  2. Abandoned replication slots:

    A replication slot is a data structure that keeps the PostgreSQL server from discarding information that is still needed by a standby server to catch up with the primary.

    If replication is delayed or the standby server is down, the replication slot will prevent VACUUM from deleting old rows.

    You can find all replication slots and their xmin value with this query:

    SELECT slot_name, slot_type, database, xmin
    FROM pg_replication_slots
    ORDER BY age(xmin) DESC;

    Use the pg_drop_replication_slot() function to drop replication slots that are no longer needed.

    Note: This can only happen with physical replication if hot_standby_feedback = on. For logical replication there is a similar hazard, but only system catalogs are affected. Examine the column catalog_xmin in that case.

  3. Orphaned prepared transactions:

    During two-phase commit, a distributed transaction is first prepared with the PREPARE statement and then committed with the COMMIT PREPARED statement.

    Once a transaction has been prepared, it is kept “hanging around” until it is committed or aborted. It even has to survive a server restart! Normally, transactions don’t remain in the prepared state for long, but sometimes things go wrong and a prepared transaction has to be removed manually by an administrator.

    You can find all prepared transactions and their xmin value with the following query:

    SELECT gid, prepared, owner, database, transaction AS xmin
    FROM pg_prepared_xacts
    ORDER BY age(transaction) DESC;

    Use the ROLLBACK PREPARED SQL statement to remove prepared transactions.

I recently had an interesting support case that shows how the cause of a problem can sometimes be where you would least suspect it.

About table bloat

After an UPDATE or DELETE, PostgreSQL keeps old versions of a table row around. This way, concurrent sessions that want to read the row don’t have to wait. But eventually this “garbage” will have to be cleaned up. That is the task of the autovacuum daemon.

Usually you don’t have to worry about that, but sometimes something goes wrong. Then old row versions don’t get deleted, and the table keeps growing. Apart from the wasted storage space, this will also slow down sequential scans and – to some extent – index scans.

To get rid of the bloat, you can use VACUUM (FULL) and other tools like pg_squeeze. But it is important to find and fix the cause of the table bloat so that it does not reappear.

The problem

I got called by a customer who experienced table bloat in the pg_attribute system catalog table that contains the table column metadata.

This can happen if table columns get modified or dropped frequently. Most of the time, these are temporary tables which are automatically dropped when the session or transaction ends.

The customer made heavy use of temporary tables. But they have several Linux machines with databases that experience the same workload, and only some of those had the problem.

Searching the cause

I went through the list of common causes for table bloat:

  • Database transactions that remain open (state “idle in transaction”).
    These will keep autovacuum from cleaning up row versions that have become obsolete after the start of the transaction.
  • A rate of data modification that is so high that autovacuum cannot keep up.
    In this case, the correct answer is to make autovacuum more aggressive.

Both were not the case; the second option could be ruled out because that would cause bloat on all machines and not only on some.

Then I had a look at the usage statistics for the affected table:

dbname=> SELECT * FROM pg_stat_sys_tables
dbname->          WHERE relname = 'pg_attribute';

-[ RECORD 1 ]-------+-------------
relid               | 1249
schemaname          | pg_catalog
relname             | pg_attribute
seq_scan            | 167081
seq_tup_read        | 484738
idx_scan            | 1506941
idx_tup_fetch       | 4163837
n_tup_ins           | 0
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 0
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

This confirms that autovacuum has never run. But more interesting is that we get a hint why it didn’t run:

PostgreSQL thinks that the number of dead tuples (row versions that could be removed) is 0, so it didn’t even try to remove them.

The statistics collector

A suspicion became certainty after I found the following message in the server logs:

using stale statistics instead of current ones because stats collector is
not responding

The statistics collector process is the PostgreSQL backend process that collects usage statistics.

After each activity, PostgreSQL backends send statistics about their activity. These statistics updates are sent through a UDP socket on localhost; that is created at PostgreSQL startup time. The statistics collector reads from the socket and aggregates the collected statistics.

Closing in on the problem

The statistics collector was running:

918     1 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
947   918 postgres: logger process   
964   918 postgres: checkpointer process   
965   918 postgres: writer process   
966   918 postgres: wal writer process   
967   918 postgres: autovacuum launcher process   
968   918 postgres: stats collector process   
969   918 postgres: bgworker: logical replication launcher

To see what the statistics collector was doing and to spot any problems it had, I traced its execution:

# strace -p 968
strace: Process 968 attached

The statistics collector was waiting for messages on the UDP socket, but no messages were coming through!

I had a look at the UPD socket:

# netstat -u -n -p
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State       PID/Program name    
udp6       0      0 ::1:59517       ::1:59517       ESTABLISHED 918/postmaster

Nothing suspicious so far.

But when I tried the same thing on a machine that didn’t have the problem, I got something different:

# netstat -u -n -p
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State       PID/Program name    
udp        0      0 ESTABLISHED 9303/postmaster

It turned out that on all systems that experienced table bloat, the statistics collector socket was created on the IPv6 address for localhost, while all working systems were using the IPv4 address!

But all machines had IPv6 disabled for the loopback interface:

# ifconfig lo
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet  netmask
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 6897  bytes 2372420 (2.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6897  bytes 2372420 (2.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Nailing the bug to the wall

PostgreSQL uses the POSIX function getaddrinfo(3) to resolve localhost.

Since PostgreSQL knows quite well that it is important to have a working statistics collection, it will loop through all the addresses returned by that call, create a UDP socket and test it until it has a socket that works.

So we know that IPv6 must have worked when PostgreSQL was started!

Further analysis revealed that IPv6 was disabled during the boot sequence, but there was a race condition:
Sometimes IPv6 would be disabled before PostgreSQL started, sometimes afterwards. And these latter machines were the ones where the statistics collector stopped working and tables got bloated!

After changing the boot sequence to always disable IPv6 before starting PostgreSQL, the problem was fixed.


This shows (again) how the cause of a computer problem can be in an entirely different place than you would suspect at first glance.

It also shows why a good database administrator needs to know the operating system well.