If your PostgreSQL database is purely maintained (lack of VACUUM) or badly structured, you might face some table bloat.

Table bloat is somewhat nasty because it slows down your database and eats up more storage than needed. Therefore it can be important to figure out, if a table is bloated or not. The PostgreSQL contrib package provides a package called pgstattuple to diagnose table bloat. 

To use the module we have to activate it:

test=# CREATE EXTENSION pgstattuple;
CREATE EXTENSION

For a test we create a table and add some 10.000 rows to it on the fly:

test=# CREATE TABLE t_test AS SELECT * FROM generate_series(1, 10000);
SELECT 10000

We can already see, how pgstattuple works. To make sure the psql output stays readable, we have called …

test=# \x
Expanded display is on.

In our example pgstattuple will return the following data:

test=# SELECT * FROM pgstattuple('t_test');
-[ RECORD 1 ]------+-------
table_len          | 368640
tuple_count        | 10000
tuple_len          | 280000
tuple_percent      | 75.95
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 7380
free_percent       | 2

As you can see the size of the table is somewhat over 368k. Our table has a fill grade of around 76%. Note that those numbers don’t add up to 100% completely. This is due to some overhead. In reality the fill grade of a freshly loaded table will be a lot higher than in our trivial single column case.

To demonstrate table bloat we can delete some data. In this example we delete one third of those rows:

test=# DELETE FROM t_test WHERE generate_series % 3 = 0;
DELETE 3333

Note, the size of the table on disk is still the same. Some of those valid rows have just been moved to the “dead tuple” section of the output:

test=# SELECT * FROM pgstattuple('t_test');
-[ RECORD 1 ]------+-------
table_len          | 368640
tuple_count        | 6667
tuple_len          | 186676
tuple_percent      | 50.64
dead_tuple_count   | 3333
dead_tuple_len     | 93324
dead_tuple_percent | 25.32
free_space         | 7380
free_percent       | 2

The first lesson here is that DELETE does not shrink a table on disk. It merely marks rows as dead. This is highly important – many people are misled by this behavior.

To reclaim the space occupied by those dead rows we can call VACUUM:

test=# VACUUM t_test;
VACUUM

What we see here is that those dead rows have been converted to free space. And yes, the size of the table is still the same:

test=# SELECT * FROM pgstattuple('t_test');
-[ RECORD 1 ]------+-------
table_len          | 368640
tuple_count        | 6667
tuple_len          | 186676
tuple_percent      | 50.64
dead_tuple_count   | 0
dead_tuple_len     | 0
dead_tuple_percent | 0
free_space         | 114036
free_percent       | 30.93

This free space can now be used to store new rows inside your table.

Checking many tables

In our example we have checked one table. But what if you want to check your entire database to see who is to blame?

In this case you have to make use of the PostgreSQL type system and a system catalog. Let us discuss the type system first: In the previous example we have seen that pgstattuple can be called in the FROM-clause because it returns an entire row. But we have to do this for all tables – not just for some. To make sure that we can put the list of objects we want to check into the FROM-clause (in the form of the pg_class system table) we have to move the function call to the SELECT clause. The call will return a composite type consisting of all fields:

test=# SELECT relname, pgstattuple(oid) FROM pg_class WHERE relkind = 'r' LIMIT 3;
      relname | pgstattuple
--------------+----------------------------------------------------
 pg_statistic | (147456,387,115800,78.53,27,8161,5.53,19840,13.45)
      pg_type | (73728,357,61965,84.05,20,3440,4.67,4828,6.55)
    pg_authid | (8192,3,324,3.96,0,0,0,7816,95.41)
(3 rows)

This is not really readable so we have to break up those fields again. Some brackets and a star will do the job:

test=# SELECT relname, (pgstattuple(oid)).* FROM pg_class WHERE relkind = 'r' LIMIT 3;
-[ RECORD 1 ]-------+-------------
 relname            | pg_statistic
 table_len          | 147456
 tuple_count        | 387
 tuple_len          | 115800
 tuple_percent      | 78.53
 dead_tuple_count   | 27
 dead_tuple_len     | 8161
 dead_tuple_percent | 5.53
 free_space         | 19840
 free_percent       | 13.45

-[ RECORD 2 ]-------+------------- 
 relname            | pg_type
 table_len          | 73728
 tuple_count        | 357
 tuple_len          | 61965
 tuple_percent      | 84.05
 dead_tuple_count   | 20
 dead_tuple_len     | 3440
 dead_tuple_percent | 4.67
 free_space         | 4828
 free_percent       | 6.55

-[ RECORD 3 ]-------+-------------
 relname            | pg_authid
 table_len          | 8192
 tuple_count        | 3
 tuple_len          | 324
 tuple_percent      | 3.96
 dead_tuple_count   | 0
 dead_tuple_len     | 0
 dead_tuple_percent | 0
 free_space         | 7816
 free_percent       | 95.41

We can now sort and filter at will to figure out, which table is to blame.

———-
We welcome your feedback on our brand new Facebook page: www.fb.com/cybertec.postgresql