“Why does my pg_wal keep growing?” That’s a question I keep hearing again and again. It is an urgent question, since the PostgreSQL server will crash if
pg_wal runs out of disk space. I decided to answer the question once and for all.
pg_wal and why is it growing?
You can skip ahead if you are already familiar with the basics.
The PostgreSQL database logs all modifications to the database in the transaction redo log. PostgreSQL calls the transaction redo log “write-ahead log” (WAL), because we must log all data modifications before applying them to the data. This serves three purposes:
- to recover from a crash without losing any data
- to recover an online file system backup after it was restored
- to replay the data modifications on a standby server
PostgreSQL writes WAL sequentially to WAL segment files in the
pg_wal subdirectory of the data directory. These WAL segments are 16MB in size (but you can choose a different size during
initdb). A WAL segment can be in one of three states:
- one of the WAL segments is active, that is, PostgreSQL currently writes transaction redo information to this file
- some WAL segments are empty and ready for future use, to avoid stalls when the active WAL segment is full
- some WAL segments are completed and are no longer used in normal operation
Clearly, it is the completed WAL segments that concern us here. PostgreSQL will delete them automatically as soon as it no longer needs them. We have to understand this process better, so that we can address our original problem. One important warning at this point:
Never manually delete any files in
PostgreSQL needs these files to recover from a crash. If you are out of disk space, you’ll have to increase the disk space.
When will PostgreSQL delete a WAL segment from
PostgreSQL deletes completed WAL segments that it does not need any more at the end of a checkpoint. There are three criteria that PostgreSQL checks before it can delete a WAL segment:
The WAL segment in
pg_wal has been archived successfully
This only applies if you have configured
on. Then the archiver executes
archive_command (or invokes the
archive_library) to archive the WAL segment. If the archiver fails, it keeps retrying to archive the same WAL segment over and over, until it succeeds. This is necessary, because PostgreSQL cannot recover past a missing WAL segment. To see if the archiver is stuck, use the following query:
SELECT last_failed_wal, last_failed_time FROM pg_stat_archiver WHERE last_failed_time > coalesce(last_archived_time, '-infinity');
If that query returns a result, look at the PostgreSQL log file, which will show all error messages from the archiver. Fix the problem, and PostgreSQL will start removing old WAL segments after the next checkpoint.
Sometimes the problem is not that the archiver is stuck, but that it is too slow: it archives WAL segments slower than PostgreSQL generates them. One cause for that can be that your
archive_command uses a slow compression method. In that case, you have to speed up
archive_command to fix the problem.
The WAL segment in
pg_wal contains no data more recent than the position of any replication slot
Standby servers need WAL information from the primary. If the data are no longer in WAL buffers, the WAL sender process will read them from the WAL segment files. If the WAL information needed by the standby is no longer there, replication is broken. A replication slot is a data structure that marks a position in the WAL. Whenever the standby server has consumed WAL, it sends feedback to the primary, which advances the replication slot associated with the replication. The primary never deletes WAL that is more recent than any replication slot.
You can find out how many bytes the oldest replication slot is behind:
SELECT slot_name, pg_wal_lsn_diff( pg_current_wal_lsn(), restart_lsn ) AS bytes_behind, active, wal_status FROM pg_replication_slots WHERE wal_status <> 'lost' ORDER BY restart_lsn;
If you find a replication slot that is way behind, you have found the problem. Typically, such replication slots are no longer
active, because the standby server no longer connects to the primary. Use the function
pg_drop_replication_slot() to drop the problematic replication slots, and PostgreSQL will automatically delete old WAL segments after the next checkpoint.
To prevent replication slots that fell behind or have been abandoned from breaking your database, you can set the parameter
max_slot_wal_keep_size. PostgreSQL won’t retain more WAL than that for a replication slot. If a replication slot exceeds the limit, it will eventually show up with
wal_status = 'lost' in
wal_keep_size is too high
wal_keep_segments in older PostgreSQL versions) determines how much old WAL PostgreSQL will keep around, no matter what. It can be an alternative to using replication slots, and it is useful to make
pg_rewind succeed. So check to see if you’ve set that parameter to an excessive value:
Reduce the value if that is the problem, and PostgreSQL will delete old WAL segments after the next checkpoint.
pg_wal is overflowing, check for archiver problems, replication slots that have fallen behind and excessive
wal_keep_size. That should take care of the problem. If you don’t want to wait for the next checkpoint to delete files from
pg_wal, you can run the
CHECKPOINT command manually.
If your problem is not
pg_wal, but you are experiencing table bloat, you might want to read my article about the four reasons why
VACUUM cannot remove dead rows.