CYBERTEC PostgreSQL Logo

"Exclusive backup" method is deprecated - what now?

02.2019 / Category: / Tags: | |
exclusive backup deprecated - updated for Postgresql v15
© Laurenz Albe 2019

 

UPDATE 12.07.2022: The “exclusive backup” method of calling pg_start_backup('label') before backup and pg_stop_backup() afterwards has been removed in PostgreSQL v15.

This article describes the problems with the old method and discusses the options for those who still use this backup method. I include scripts below to help you.

The “exclusive” backup method

Before pg_basebackup was invented, there was only one online file-system level backup method:

  • call “SELECT pg_start_backup('label')”, where 'label' is an arbitrary string
  • backup all the files in the PostgreSQL data directory with an arbitrary backup method
  • call “SELECT pg_stop_backup()

This method is called exclusive because only one such backup can be performed simultaneously.

pg_start_backup creates a file backup_label in the data directory that contains the location of the checkpoint starting the backup. This makes sure that during startup, PostgreSQL does not recover from the latest checkpoint registered in pg_control. Doing so would cause data corruption, since the backup may contain data files from before that checkpoint. Don't forget that database activity, including checkpointing, continues normally in backup mode!

The problem with the exclusive backup method

This backup method can cause trouble if PostgreSQL or the operating system crash during backup mode.
When PostgreSQL starts up after such a crash, it will find the backup_label file and deduce that it is recovering a backup. There is no way to distinguish the data directory of a server crashed while in backup mode from a backup!

Consequently, PostgreSQL will try to recover from the checkpoint in backup_label. Lacking a recovery.conf file with a restore_command, it will resort to the transaction log (=WAL) files in pg_wal (pg_xlog on older versions).

But the database might have been in backup mode for a longer time before the crash. If there has been enough data modification activity in that time, the WAL segment with the starting checkpoint may already have been archived and removed.

The startup process will then fail with this error message:

You have to manually remove the backup_label file left behind from the failed backup to be able to restart PostgreSQL.
Today, in the age of automated provisioning, requiring manual intervention is even less tolerated than it used to be. So this behavior is not acceptable in many cases.

Overcoming the problem with pg_basebackup

In PostgreSQL 9.1, pg_basebackup was introduced, which provides a much simpler method to create an online file-system backup.

It introduced the “non-exclusive” backup method, meaning that several such backups can be performed at the same time. backup_label is not written to the data directory but added only to the backup. Consequently, pg_basebackup is not vulnerable to the problem described above.

pg_basebackup makes backups simple, but since it copies all data files via a single database connection, it can take too long to back up a large database.
To deal with such databases, you still had to resort to the “low-level backup API” provided by pg_start_backup and pg_stop_backup with all its problems.

The improved “low-level backup API”

Version 9.6 brought the non-exclusive backup to pg_start_backup and pg_stop_backup.
Backups can now be performed like this:

  • call “SELECT pg_start_backup('label', FALSE, FALSE)” (or “SELECT pg_backup_start('label', FALSE)” from v15 on) to start the backup and keep the database session open
  • backup all the files in the PostgreSQL data directory with an arbitrary backup method
  • call “SELECT * FROM pg_stop_backup(FALSE)” ( or “SELECT * FROM pg_backup_stop()” FROM v15 on) in in the same session where you started the backup to end backup mode
    This will return the contents of the backup_label file, which you have to add to the backup yourself.

Removal of the exclusive backup method

Since version 9.6, the documentation contained the following sentence:

The non-exclusive method is recommended and the exclusive one is deprecated and will eventually be removed.

If you are still using the exclusive backup method, DON'T PANIC.

PostgreSQL releases are supported for 5 years after their release date. So you have until 2026 (the end-of-life date for PostgreSQL v14) to adjust your backup scripts if you are using the exclusive backup method.

Using a pre-backup and post-backup script

Your backup may be driven by a company-wide backup software, or maybe you use snapshots on the storage subsystem to back up a large database.

In both cases, it is not unusual that the backup software offers to run a “pre-backup” and a “post-backup” command on the target machine. The pre-backup script prepares the machine for being backed up, and the post-backup script resumes normal operation.

In such a situation it is difficult to switch from exclusive backup to non-exclusive backup: You cannot easily keep the database session where you ran pg_start_backup open, because the backup will only start once the pre-backup script has ended. But you need to keep that session open, so that you can run pg_stop_backup in the same session to complete the backup!

People with such a backup scenario will probably find it hardest to move away from the exclusive backup method.

Pre-backup and post-backup scripts for non-exclusive backups

To overcome this problem, I have written pre- and post-backup scripts that use non-exclusive backups. They are available here.

They work by creating a table in the database postgres and a “co-process” that stays around when the pre-backup script is done. The post-backup script notifies the co-process to complete the backup and write the contents of the backup_label file to the database table. You can get that information either from the standard output of the post-backup script or from the database table.

There is one last thing you have to do: you have to store the backup_label file with the checkpoint information along with the backup. The file must be present after the backup has been restored. Remember that if you start PostgreSQL on a restored data directory without the correct backup_label file, the result will be data corruption. This is because the pg_control file in the backup usually contains a later checkpoint than the one taken during pg_start_backup.

9 responses to “"Exclusive backup" method is deprecated - what now?”

  1. Hello

    Nice article, and nice scripts, they can save the day.
    I recently started dealing with PostgreSQL (v11.0) and I jumped into the issue.

    I was thinking that if the company-wide backup software backup command is available , it could be executed within a single psql script with the shell execution !
    This way the session would stay open until the backup is finished and continue with the script commands
    Does this make sense?

    Regarding pg_basebackup, unfortunately I cannot find a way to use it since
    - We use additional tablespaces
    - We don't have enough space to keep the backup in the diks (even with zip) so we need to streem it to the tape
    Is that someway possible with pg_pagebackup?

    I was wondering if these are resolved with any of the backup restore tools like barman, I will start looking at them as well

    Any response will be appreciated

    Thanks

    Kostas Yiannakos

    • Using a psql script with ! to run some backup software is a fine option.

      Have you tried the tar format of pg_basebackup? That doesn't require you to have directories in the same place as the tablespaces in the cluster being backed up, and it allowed backup compression.

      I know too little about the available third-party tools to answer your last question.

  2. Hello,

    Excellent article!

    We have this situation:

    We use a backup tool that makes backups to tape.

    We have 2 types of backup:

    1 - full cluster backup (pg_start_backup () and pg_stop_backup () exclusive)
    2 - backup of the wals archived by "archive_command" (exclusive pg_start_backup () and pg_stop_backup (), to obtain the start and end point, and to be able to eliminate the wals backed up with pg_archivecleanup)

    We have a 30 TB database, the backup takes 2 days to run.

    The problem we have is that we cannot run backup 2 while backup 1 is running. How can we do this? There are times when the partition for the wals is not enough to keep 2 days of archived wals.

    Thanks,

    David.

    • First, get enough disk storage; running out of space for WAL archives is not an option. You want to keep more WAL archives than is necessary to restore the latest backup. What if the latest backup has failed, or you need older data?

      Second, make sure that WAL archives are stored on different storage than your database, not just a different partition. Otherwise, you will be left without a backup if the storage breaks.

      Third, move to a storage that allows you to do faster backups with snapshots or so.

      Finally, as the article recommends, start using the non-exclusive backup method.

  3. Thank you for this project and the corresponding blog post.

    The necessity of this code is further evidence that Postgresql dev team has lost their way. They "solved" the problem of the most obscure edge-case imaginable, and solved it by forcing us to use very complicated or fragile solutions. The same thing with the "recovery.signal" file.

    I am migrating my 9.x servers to 12, and began adapting my backup scripts. The jaw-droping realization that you now need to keep the connection to the database was met with a vocal "what the fu-" that woke my neighbor's dog. "How the hell do they expect us to script this, with a co-process?" I consider myself a bash expert, but co-processes are quite obscure because they are difficult to debug and their use-cases very rare. They are usually an extreme workaround for when nothing else is suitable. This is such a case. We have a goddamn service already in postgresql. Why do we need a second just to connect to it? It's non-sense.

    Thus your scripts are very much welcome.

    Most of my criticism of your work is not toward it per se, but the fact that it's a solution that should not have been necessary. The postgresql non-exclusive mode should not need a continuous connection, and it should report on the pg_wal files that it actually wrote, and not leave us to create the backup label file -- which by their own admission is very dangerous not to include in a restore state, and list the WALs that also need to be backed up instead of a cryptic range. By trying to solve an edge case, they architected a monstrosity that will fail most of the time unless utter care is involved. Care It would be better if it were one sourceable script that provides two different functions.

    My other criticism of your work -- which is specific to it -- is that it's (1) overly complicated, which is probably because (2) it uses the database itself to store state,; and most critically (3) doesn't create a labelfile. Yes, you say that's an exercise left up to the user, but why?

    On the latter point, your scripts don't actually support multiple invocations, so why not write the label in the pg_wal directory, with the timestamp appended to the filename. These must be backed up separately anyway.

    I think points (1) and (2) are tied together because you assume that these scripts must be run separately and independently of each other. But Why?

    Let's assume you have some backup software such as TSM which supports node-specific "pre" and "post" backup commands. Here's the problem: the "post" command only runs after the backup is complete. But now you must back up a label and WALs. How do you do that in this scenario? All scenarios that come to mind involved manually backing up files within a script. Soooo... what's the point of relying on these trigger scripts? I just can't see one.

    If you had everything in one script, using functions, it would be much simpler: you wouldn't need the database, you could rely on only one co-process. Also, if there's a failure with the main script, and post never runs, you don't end up with a hung process in the background. (The timeout is good, but a "trap cleanup EXIT" is better.)

  4. Thank you Laurenz, it is amaing and exactly what I have looked for many hours!

    I want that the name of backup_label output file includes LABEL date and time. I'm very-very new in the databases and SQL, so how to select LABEL only value ?

    Anton

      • Thank you Laurenz

        I'm going to backup Postgresql data dir based on storage snapshot functionality. The Postgresql database size is ~500TB.
        So your pre/post scripts are very useful.
        But in this case, how to manage WAL archive files, removing unnecessary old WAL files ?

        Anton

        • Base backups create a special .backup file in the WAL archives. You can use that to run

          pg_archivecleanup /path/to/archive 000000010000003700000010.00000020.backup

          That will remove all WAL files that are older than what you need to restore that base backup.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram