CYBERTEC Logo

How to corrupt your PostgreSQL database

08.2022 / Category: / Tags: | |

Of course most people do not want to corrupt their databases. These people will profit from avoiding the techniques used in this article. But for some, it might be useful to corrupt a database, for example if you want to test a tool or procedure that will be used to detect or fix data corruption.

Prerequisites

We need a database with some data in it, and for some of our experiments, we will need to have some ongoing activity. For that, we can use the built-in PostgreSQL benchmark pgbench. We use scale factor 100, so that the largest table contains 10 million rows:

Load will be generated with 5 concurrent client sessions:

Creating a corrupt database by setting fsync = off

Let's set fsync = off in postgresql.conf and power off the server while it is under load.

After a few attempts, we can detect data corruption with the amcheck extension:

What happened? Data were no longer flushed to disk in the correct order, so that data modifications could hit the disk before the WAL did. This leads to data corruption during crash recovery.

Creating a corrupt database from a backup

While pgbench is running, we create a base backup:

Note that since I am using PostgreSQL v15, the function to start backup mode is pg_backup_start() rather than pg_start_backup(). This is because the exclusive backup API, which had been deprecated since PostgreSQL 9.6, was finally removed in v15. To find out more, read my updated post in the link.

Let's figure out the object IDs of the database and of the primary key index of pgbench_accounts:

We create a backup by copying the data directory. Afterwards, we copy the primary key index of pgbench_accounts and the commit log again to make sure that they are more recent than the rest:

The crucial part: do not create backup_label

Now we exit backup mode, but ignore the contents of the backup_label file returned from pg_backup_stop():

Then, let's make sure that the last checkpoint in the control file is different:

Great! Let's start the server:

Now an index scan on pgbench_accounts fails, because the index contains more recent data than the table:

What happened? By omitting the backup_label file from the backup, we recovered from the wrong checkpoint, so the data in the table and its index were no longer consistent. Note that we can get the same effect without pg_backup_start() and pg_backup_stop(), I only wanted to emphasize the importance of backup_label.

Creating a corrupt database with pg_resetwal

While the database is under load from pgbench, we crash it with

Then we run pg_resetwal:

Then we start the server and use amcheck like before to check the index for integrity:

What happened? pg_resetwal is only safe to use on a cluster that was shutdown cleanly. The option -f is intended as a last-ditch effort to get a corrupted server to start and salvage some data. Only experts should use it.

Creating a corrupt database with pg_upgrade --link

We create a second cluster with initdb:

Then we edit postgresql.conf and choose a different port number. After shutting down the original cluster, we run an “upgrade” in link mode:

pg_upgrade renamed the control file of the old cluster, so that it cannot get started by accident. We'll undo that:

Now we can start both clusters and run pgbench on both. Soon we will see error messages like

What happened? Since both clusters share the same data files, we managed to start two servers on the same data files. This leads to data corruption.

Creating a corrupt database by manipulating data files

For that, we figure out the file name that belongs to the table pgbench_accounts:

Now we stop the server and write some garbage into the first data block:

Then we start the server and try to select from the table:

What happened? We tampered with the data files, so it's unsurprising that the table is corrupted.

Creating a corrupt database with catalog modifications

Who needs ALTER TABLE to drop a table column? We can simply run

After that, an attempt to query the table will result in an error:

What happened? We ignored that dropping a column sets attisdropped to TRUE in pg_attribute rather than actually removing the entry. Moreover, we didn't check for dependencies in pg_depend, nor did we properly lock the table against concurrent access. Modifying catalog tables is unsupported, and if it breaks the database, you get to keep both pieces.

Conclusion

We have seen a number of ways how you can corrupt a PostgreSQL database. Some of these were obvious, some might surprise the beginner. If you don't want a corrupted database,

  • don't mess with the system catalogs
  • never modify anything in the data directory (with the exception of configuration files)
  • don't run with fsync = off
  • don't call pg_resetwal -f on a crashed server
  • remove the old cluster after an upgrade with pg_upgrade --link
  • don't delete or omit backup_label
  • run a supported version of PostgreSQL to avoid known software bugs
  • run on reliable hardware

I hope you can save some databases with this information! If you'd like to know more about troubleshooting PostgreSQL performance, read my post on join strategies.

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram