UPDATED August 2023 – Table bloat in PostgreSQL: Many people wonder why deleting data from a table in a PostgreSQL database does not shrink files on disk. You would expect storage consumption to go down when data is deleted. This is not always the case.

To show how this works, I have compiled some examples.

Let’s get started with a basic table and add 100k rows to it:

test=# CREATE TABLE t_test (id int4);
CREATE TABLE
test=# INSERT INTO t_test SELECT * FROM generate_series(1, 100000);
INSERT 0 100000

The size of the table will be around 3.5 MB on disk. We can find out about the size of a table using pg_relation_size:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
3544 kB
(1 row)

Now let’s delete half of the rows:

test=# DELETE FROM t_test WHERE id > 50000;
DELETE 50000

Many users would expect this table to shrink to half the size. This is not the case:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
3544 kB
(1 row)

The reason why the storage space doesn’t decrease is that when you delete some rows, it does not necessarily mean that nobody else can see the deleted rows anymore. There might still be transactions around which might at some point need those row versions you are about to delete. In other words: PostgreSQL does not delete a row instantly, but marks it invisible, so that certain transactions cannot see it.

To clean up those rows which are really dead, we can run VACUUM:

test=# VACUUM t_test;
VACUUM

In our example the table has been reduced in size:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
1776 kB
(1 row)

How shrinking works

VACUUM is able to reclaim space and return it to the filesystem, but this is not always possible. It is only possible to shrink a file by cutting off stuff at the end. Cutting out data at the end of a data file is only possible if (and only if) there are no valid rows anymore AFTER a certain position in the file. In our example we have loaded data in ascending order into our PostgreSQL table (1, 2, 3, etc.). If we delete everything larger than 50.000 it means that the second half of the table will be gone. VACUUM will go through the table and figure out that the second half of the data file contains nothing but chunk and truncate the data file to return space to the filesystem.

When shrinking does not reduce table bloat

In the first example, we arranged things in a way that we can see VACUUM returning space to the file system.

Now, let’s construct an example where this is not possible:

test=# DROP TABLE t_test;
DROP TABLE

First of all we drop the table and create a new one:

test=# CREATE TABLE t_test (id int4);
CREATE TABLE

Then we load 100.000 rows just like before:

test=# INSERT INTO t_test SELECT * FROM generate_series(1, 100000);
INSERT 0 100000

In this example we will delete the first half of the data:

test=# DELETE FROM t_test WHERE id < 50000;
DELETE 49999

Just like before we can run VACUUM:

test=# VACUUM t_test;
VACUUM

This time the size of the underlying table is the same size:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
 pg_size_pretty
----------------
3544 kB
(1 row)

Remember what we said about cleanup and VACUUM? VACUUM can only shrink a file when free space is at the end. However, our example has been set up in a way that this is not the case.

CTIDs and their role in understanding PostgreSQL table bloat

In PostgreSQL there is a thing called CTID, which says where a certain row is. ctid=(0, 14) would mean that we are talking about the 14th row in the first data block of the table.

Take a look at our table and check out the highest 5 CTIDs:

test=# SELECT ctid, * FROM t_test ORDER BY ctid DESC LIMIT 5;
      ctid | id
-----------+--------
 (442,108) | 100000
 (442,107) | 99999
 (442,106) | 99998
 (442,105) | 99997
 (442,104) | 99996
(5 rows)

As you can see, the last row is in block 442. Do the math: 442 * 8192 = 3.5 MB. This is why the table has not changed in size.

Let’s check the lowest CTID:

test=# SELECT min(ctid) FROM t_test;
    min
----------
(221,54)
(1 row)

We can see that the first valid row is somewhere in the middle of the table. Therefore the table has not been truncated by VACUUM.

UPDATE AUGUST 2023:

Postgres version 12 introduced a new table storage parameter vacuum_truncate that controls the functionality to truncate off any empty pages at the end of a table. For example, you might want to avoid vacuum truncation for the following reasons:

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.