a multiple-choice test asks for the meaning of pg_wal and gives silly alternative answers
© Laurenz Albe 2023

 

“Why does my pg_wal keep growing?” That’s a question I keep hearing again and again. It is an urgent question, since the PostgreSQL server will crash if pg_wal runs out of disk space. I decided to answer the question once and for all.

What is pg_wal and why is it growing?

You can skip ahead if you are already familiar with the basics.

The PostgreSQL database logs all modifications to the database in the transaction redo log. PostgreSQL calls the transaction redo log “write-ahead log” (WAL), because we must log all data modifications before applying them to the data. This serves three purposes:

  • to recover from a crash without losing any data
  • to recover an online file system backup after it was restored
  • to replay the data modifications on a standby server

PostgreSQL writes WAL sequentially to WAL segment files in the pg_wal subdirectory of the data directory. These WAL segments are 16MB in size (but you can choose a different size during initdb). A WAL segment can be in one of three states:

  • one of the WAL segments is active, that is, PostgreSQL currently writes transaction redo information to this file
  • some WAL segments are empty and ready for future use, to avoid stalls when the active WAL segment is full
  • some WAL segments are completed and are no longer used in normal operation

Clearly, it is the completed WAL segments that concern us here. PostgreSQL will delete them automatically as soon as it no longer needs them. We have to understand this process better, so that we can address our original problem. One important warning at this point:

Never manually delete any files in pg_wal.

PostgreSQL needs these files to recover from a crash. If you are out of disk space, you’ll have to increase the disk space.

When will PostgreSQL delete a WAL segment from pg_wal?

PostgreSQL deletes completed WAL segments that it does not need any more at the end of a checkpoint. There are three criteria that PostgreSQL checks before it can delete a WAL segment:

The WAL segment in pg_wal has been archived successfully

This only applies if you have configured archive_mode to on. Then the archiver executes archive_command (or invokes the archive_library) to archive the WAL segment. If the archiver fails, it keeps retrying to archive the same WAL segment over and over, until it succeeds. This is necessary, because PostgreSQL cannot recover past a missing WAL segment. To see if the archiver is stuck, use the following query:

SELECT last_failed_wal, last_failed_time
FROM pg_stat_archiver
WHERE last_failed_time > coalesce(last_archived_time, '-infinity');

If that query returns a result, look at the PostgreSQL log file, which will show all error messages from the archiver. Fix the problem, and PostgreSQL will start removing old WAL segments after the next checkpoint.

Sometimes the problem is not that the archiver is stuck, but that it is too slow: it archives WAL segments slower than PostgreSQL generates them. One cause for that can be that your archive_command uses a slow compression method. In that case, you have to speed up archive_command to fix the problem.

The WAL segment in pg_wal contains no data more recent than the position of any replication slot

Standby servers need WAL information from the primary. If the data are no longer in WAL buffers, the WAL sender process will read them from the WAL segment files. If the WAL information needed by the standby is no longer there, replication is broken. A replication slot is a data structure that marks a position in the WAL. Whenever the standby server has consumed WAL, it sends feedback to the primary, which advances the replication slot associated with the replication. The primary never deletes WAL that is more recent than any replication slot.

You can find out how many bytes the oldest replication slot is behind:

SELECT slot_name,
       pg_wal_lsn_diff(
          pg_current_wal_lsn(),
          restart_lsn
       ) AS bytes_behind,
       active,
       wal_status
FROM pg_replication_slots
WHERE wal_status <> 'lost'
ORDER BY restart_lsn;

If you find a replication slot that is way behind, you have found the problem. Typically, such replication slots are no longer active, because the standby server no longer connects to the primary. Use the function pg_drop_replication_slot() to drop the problematic replication slots, and PostgreSQL will automatically delete old WAL segments after the next checkpoint.

To prevent replication slots that fell behind or have been abandoned from breaking your database, you can set the parameter max_slot_wal_keep_size. PostgreSQL won’t retain more WAL than that for a replication slot. If a replication slot exceeds the limit, it will eventually show up with wal_status = 'lost' in pg_replication_slots.

wal_keep_size is too high

The parameter wal_keep_size (wal_keep_segments in older PostgreSQL versions) determines how much old WAL PostgreSQL will keep around, no matter what. It can be an alternative to using replication slots, and it is useful to make pg_rewind succeed. So check to see if you’ve set that parameter to an excessive value:

SHOW wal_keep_size;

Reduce the value if that is the problem, and PostgreSQL will delete old WAL segments after the next checkpoint.

Conclusion

If your pg_wal is overflowing, check for archiver problems, replication slots that have fallen behind and excessive wal_keep_size. That should take care of the problem. If you don’t want to wait for the next checkpoint to delete files from pg_wal, you can run the CHECKPOINT command manually.

If your problem is not pg_wal, but you are experiencing table bloat, you might want to read my article about the four reasons why VACUUM cannot remove dead rows.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.