PostgreSQL is very good at keeping your data safe so it doesn’t disappear by itself. Unfortunately, the same holds in reverse–if data has been deleted, it stays unrecoverable.
In this article, we’ll explore options for recovering deleted data from PostgreSQL tables.

Option 1: cautious people use transactions

If you never make any mistakes when working with data, there won’t be any need for emergency procedures. But since we are all human, things like this happen:

-- remove an entry from our contact list
db=> DELETE FROM addressbook WHERE name = 'Heinz Schmidt';
DELETE 3
-- PANIC:  WHY WERE THERE 3 ROWS AND NOT JUST ONE

A good habit to acquire is to always use transactions, and only COMMIT them after checking everything is okay.

-- remove an entry from our contact list
db=> BEGIN;
BEGIN
db=*> DELETE FROM addressbook WHERE name = 'Heinz Schmidt';
DELETE 3
-- NOTICE:  huh?
db=*> ROLLBACK;
ROLLBACK

With a transaction, the unexpected row count was much less stressful.

Option 2: careful people have backups

If you have working backups, you can recover from any data handling mistakes by fetching everything from backup again.

-- remove an entry from our contact list
db=> DELETE FROM addressbook WHERE name = 'Heinz Schmidt';
DELETE 3
-- WARNING:  oh no, not again
db=> SELECT now();
             now
-----------------------------
 2024-03-11 16:22:25.1679+01

You could now perform a Point In Time Recovery (PITR) to a timestamp just before the undesirable DELETE command.

Option 3: quick people use pg_dirtyread

Let’s say options 1 and 2 failed, and we really have to recover the data from the running PostgreSQL instance. The good news is that DELETE does not actually delete data, it just marks it as invisible for subsequent transactions. This is done to allow concurrent transactions to still read the data. The actual removal of rows occurs only when VACUUM (or autovacuum) cleans up the table. (For those interested in more details about that mechanism, see the MVCC chapter in the PostgreSQL documentation.)

There is no built-in way in PostgreSQL to get at the deleted-but-still-present rows, but there is a PostgreSQL extension that I am maintaining that allows this: pg_dirtyread.

-- remove an entry from our contact list
db=> DELETE FROM addressbook WHERE name = 'Heinz Schmidt';
DELETE 3
-- NOTICE:  sigh
db=> SELECT * FROM addressbook;
        name         |     city
---------------------+-------------
 Christoph Berg      | Krefeld
 Hans-Jürgen Schönig | Wöllersdorf
(2 rows)
-- WARNING:  put on safety goggles now, we'll need superuser privileges

We’ll have to install pg_dirtyread from a package (or compile it from source) and create the extension in the database where the accident happened:

$ sudo apt install postgresql-16-dirtyread

db=# CREATE EXTENSION pg_dirtyread;
CREATE EXTENSION

The extension provides a function pg_dirtyread('tablename') that reads a table like PostgreSQL itself does, but ignoring any markers for row deletion. SQL insists that we supply a column list with datatype annotations when calling it, so we first check the table definition:

db=# \d addressbook 
           Table "public.addressbook"
 Column | Type | Collation | Nullable | Default 
--------+------+-----------+----------+---------
 name   | text |           |          | 
 city   | text |           |          | 

From this we can compile the pg_dirtyread invocation:

db=# SELECT * from pg_dirtyread('addressbook') addressbook(name text, city text);
        name         |     city
---------------------+-------------
 Christoph Berg      | Krefeld
 Heinz Schmidt       | Berlin
 Heinz Schmidt       | Wien
 Heinz Schmidt       | Basel
 Hans-Jürgen Schönig | Wöllersdorf
(5 rows)
-- NOTICE:  phew

Our data is there! We can copy the missing rows to a new table, and inject that into the original table:

db=# CREATE TABLE addressbook_recover AS
     SELECT * from pg_dirtyread('addressbook') addressbook(name text, city text)
     WHERE name = 'Heinz Schmidt';
SELECT 3
db=# SELECT * from addressbook_recover ;
     name      |  city
---------------+--------
 Heinz Schmidt | Berlin
 Heinz Schmidt | Wien
 Heinz Schmidt | Basel
(3 rows)
db=# INSERT INTO addressbook SELECT * FROM addressbook_recover;
INSERT 3

Using pg_dirtyread works as long as VACUUM has not done its garbage-collection duties. VACUUM is triggered by the autovacuum launcher every minute on tables that have changed by at least 20%. If your faulty DELETE hit more than that (or it raised the accumulated bloat over that threshold), you have less than 60 seconds to shut down the database and turn off autovacuum before that happens. (Please do leave autovacuum=on by default in your database. While turning it off makes undeleting rows easier, bad things will happen if tables are not periodically cleaned from bloat.)

Option 4: desperate people use full page writes

If pg_dirtyread arrives too late because rows have already been garbage-collected, there is still hope. PostgreSQL keeps track of all changes in the write-ahead log (WAL). While these change records only contain the data from after the changes, the first time each page (the 8kB unit in which PostgreSQL processes data on disk) is touched, an image of the whole page is written to the WAL. These full page writes (FPW) can be collected to extract the deleted rows.

First, we need some low-level info about where to look:

db=# select oid from pg_database where datname = current_database();
 oid
-----
   5

db=# select relfilenode from pg_class where relname = 'addressbook';
 relfilenode
-------------
      125616

db=# select pg_walfile_name(pg_current_wal_lsn());
     pg_walfile_name      
--------------------------
 000000010000000700000037

We can the use pg_waldump to decode the WAL:

$ /usr/lib/postgresql/16/bin/pg_waldump --relation=1663/5/125616 16/main/pg_wal/000000010000000700000037 | grep DELETE
rmgr: Heap        len (rec/tot):     59/   359, tx:       1894, lsn: 7/373798E0, prev 7/373798A8, desc: DELETE xmax: 1894, off: 2, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/125616 blk 0 FPW
rmgr: Heap        len (rec/tot):     54/    54, tx:       1894, lsn: 7/37379A48, prev 7/373798E0, desc: DELETE xmax: 1894, off: 3, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/125616 blk 0
rmgr: Heap        len (rec/tot):     54/    54, tx:       1894, lsn: 7/37379A80, prev 7/37379A48, desc: DELETE xmax: 1894, off: 4, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/125616 blk 0

We can see our 3 DELETEd rows, and the first WAL record is marked as containing a FPW.

With PG16’s pg_waldump, we can extract the FPW to a file:

$ /usr/lib/postgresql/16/bin/pg_waldump --save-fullpage=fpw --relation=1663/5/125616 16/main/pg_wal/000000010000000700000037 | grep DELETE
rmgr: Heap        len (rec/tot):     59/   359, tx:       1894, lsn: 7/373798E0, prev 7/373798A8, desc: DELETE xmax: 1894, off: 2, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/125616 blk 0 FPW
rmgr: Heap        len (rec/tot):     54/    54, tx:       1894, lsn: 7/37379A48, prev 7/373798E0, desc: DELETE xmax: 1894, off: 3, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/125616 blk 0
rmgr: Heap        len (rec/tot):     54/    54, tx:       1894, lsn: 7/37379A80, prev 7/37379A48, desc: DELETE xmax: 1894, off: 4, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/5/125616 blk 0
$ ls -l fpw
-rw-r--r-- 1 postgres postgres 8192  5. Mär 17:18 00000001-00000007-373798E0.1663.5.125616.0_main
-rw-r--r-- 1 postgres postgres 8192  5. Mär 17:18 00000001-00000007-37379E00.1663.5.125616.0_main
$ rm fpw/00000001-00000007-37379E00.1663.5.125616.0_main

It actually extracted two FPWs, but looking at the LSN, we are only interested in the first one, so I deleted the second one.

Let’s feed that back to PostgreSQL by creating a new table, and concatenating the FPW files to form the table contents:

db=# create table addressbook_fpw (like addressbook);
CREATE TABLE
db=# select relfilenode from pg_class where relname = 'addressbook';
 relfilenode
-------------
      125628

$ sudo systemctl stop postgresql
$ cat fpw/* > base/5/125628
$ sudo systemctl start postgresql

Since some of the rows in the FPWs are already marked as deleted, we still have to use pg_dirtyread:

db=# SELECT * FROM pg_dirtyread('addressbook_fpw') addressbook(name text, city text);
        name         |    city
---------------------+-------------
 Christoph Berg      | Krefeld
 Heinz Schmidt       | Berlin
 Heinz Schmidt       | Wien
 Heinz Schmidt       | Basel
 Hans-Jürgen Schönig | Wöllersdorf
(5 Zeilen)

db=# INSERT into addressbook
     SELECT * FROM pg_dirtyread('addressbook_fpw') addressbook(name text, city text)
     WHERE name = 'Heinz Schmidt';
INSERT 3
-- NOTICE:  hopefully for the last time

This method is fragile, it works best when no commands other than the problematic DELETE have touched the table since the last CHECKPOINT. If other rows had been deleted after the last VACUUM, they might reappear as well. If your PostgreSQL version is older than 16, the --save-fullpage switch functionality needs to be backported.

Option 5: smart people have support contracts

Most of the options listed here require a fair amount of knowledge about the PostgreSQL internals. If you want to be on the safe side, consider getting a support contract from CYBERTEC where PostgreSQL experts help you with running PostgreSQL in your company or organization. Notably, the last customer I helped with data recovery was excited:

Customer Feedback_1  Customer Feedback_2