I want an exclusive backup
© Laurenz Albe 2019

 

The “exclusive backup” method of calling pg_start_backup('label') before backup and pg_stop_backup() afterwards is scheduled for removal in the future.

This article describes the problems with the old method and discusses the options for those who still use this backup method.

The “exclusive” backup method

Before pg_basebackup was invented, there was only one online file-system level backup method:

  • call “SELECT pg_start_backup('label')”, where 'label' is an arbitrary string
  • backup all the files in the PostgreSQL data directory with an arbitrary backup method
  • call “SELECT pg_stop_backup()

This method is called exclusive because only one such backup can be performed simultaneously.

pg_start_backup creates a file backup_label in the data directory that contains the location of the checkpoint starting the backup. This makes sure that during startup, PostgreSQL does not recover from the latest checkpoint registered in pg_control. Doing so would cause data corruption, since the backup may contain data files from before that checkpoint. Don’t forget that database activity, including checkpointing, continues normally in backup mode!

The problem with the exclusive backup method

This backup method can cause trouble if PostgreSQL or the operating system crash during backup mode.
When PostgreSQL starts up after such a crash, it will find the backup_label file and deduce that it is recovering a backup. There is no way to distinguish the data directory of a server crashed while in backup mode from a backup!

Consequently, PostgreSQL will try to recover from the checkpoint in backup_label. Lacking a recovery.conf file with a restore_command, it will resort to the transaction log (=WAL) files in pg_wal (pg_xlog on older versions).

But the database might have been in backup mode for a longer time before the crash. If there has been enough data modification activity in that time, the WAL segment with the starting checkpoint may already have been archived and removed.

The startup process will then fail with this error message:

ERROR:  could not find redo location referenced by checkpoint record
HINT:  If you are not restoring from a backup, try removing the file "backup_label".

You have to manually remove the backup_label file left behind from the failed backup to be able to restart PostgreSQL.
Today, in the age of automated provisioning, requiring manual intervention is even less tolerated than it used to be. So this behavior is not acceptable in many cases.

Overcoming the problem with pg_basebackup

In PostgreSQL 9.1, pg_basebackup was introduced, which provides a much simpler method to create an online file-system backup.

It introduced the “non-exclusive” backup method, meaning that several such backups can be performed at the same time. backup_label is not written to the data directory but added only to the backup. Consequently, pg_basebackup is not vulnerable to the problem described above.

pg_basebackup makes backups simple, but since it copies all data files via a single database connection, it can take too long to back up a large database.
To deal with such databases, you still had to resort to the “low-level backup API” provided by pg_start_backup and pg_stop_backup with all its problems.

The improved “low-level backup API”

Version 9.6 brought the non-exclusive backup to pg_start_backup and pg_stop_backup.
Backups can now be performed like this:

  • call “SELECT pg_start_backup('label', FALSE, FALSE)” to start the backup and keep the database session open
  • backup all the files in the PostgreSQL data directory with an arbitrary backup method
  • call “SELECT * FROM pg_stop_backup(FALSE)in in the same session where you started the backup to end backup mode
    This will return the contents of the backup_label file, which you have to add to the backup yourself.

Deprecation of the exclusive backup method

Since version 9.6, the documentation contains the following sentence:

The non-exclusive method is recommended and the exclusive one is deprecated and will eventually be removed.

If you are still using the exclusive backup method, DON’T PANIC.

PostgreSQL releases are supported for 5 years after their release date, and that is also the customary time for a feature to be deprecated before it is removed. So you probably have until 2021 to adjust your backup scripts if you are using the exclusive backup method.

Using a pre-backup and post-backup script

Your backup may be driven by a company-wide backup software, or maybe you use snapshots on the storage subsystem to back up a large database.

In both cases, it is not unusual that the backup software offers to run a “pre-backup” and a “post-backup” command on the target machine. The pre-backup script prepares the machine for being backed up, and the post-backup script resumes normal operation.

In such a situation it is difficult to switch from exclusive backup to non-exclusive backup: You cannot easily keep the database session where you ran pg_start_backup open, because the backup will only start once the pre-backup script has ended. But you need to keep that session open, so that you can run pg_stop_backup in the same session to complete the backup!

People with such a backup scenario will probably find it hardest to move away from the exclusive backup method.

Pre-backup and post-backup scripts for non-exclusive backups

To overcome this problem, I have written pre- and post-backup scripts that use non-exclusive backups. They are available here.

They work by creating a table in the database postgres and a “co-process” that stays around when the pre-backup script is done. The post-backup script notifies the co-process to complete the backup and write the contents of the backup_label file to the database table. You can get that information either from the standard output of the post-backup script or from the database table.

There is one last thing you have to do: you have to store the backup_label file with the checkpoint information along with the backup. The file must be present after the backup has been restored. Remember that if you start PostgreSQL on a restored data directory without the correct backup_label file, the result will be data corruption. This is because the pg_control file in the backup usually contains a later checkpoint than the one taken during pg_start_backup.