Estimating table bloat in PostgreSQL

Recently there have been a couple of very exhaustive and rather technical posts on Planet Postgresql on the topic of bloat. Bloat in short, for those new to Postgres, is an unescapable by-product of Postgres' excellent concurrency-enablement model called MVCC (multi-version concurrency control), that creates new copies of rows when you update data in a concurrent access scenario. In the long run this causes your tables (and also indexes) to use more disk space than they actually need to, especially when running busy multi-user applications where at times "autovacuum" cannot keep up with the amount of data changes.

Table of Contents

But ok, let's leave out the deeper bloat and bloat fighting details (VACUUM FULL, CLUSTER, pg_repack) this time and concentrate on how can you actually quickly estimate if you indeed have a "bloat problem" on one of your tables?

Estimation approaches

There are 2 most common approaches here - to use a Postgres extension called "pgstattuple" (part of the official contrib modules) which includes a function named pgstattuple_approx() available since Postgres version 9.5 or to rely on a community developed pure SQL query where no setup is needed, and which works also for older versions. Community scripts can be found from Postgresql Wiki bloat page. There are couple of slightly different scripts around, I chose here the one redirecting us ultimately to a GitHub page here (thanks ioguix!). Both estimation methods do some educated guessing based on available statistics and have very low cost. For exact information on bloatedness the "pgstattuple" extension also has a pgstattuple() method that we will throw in for comparison, but this method possibly comes at high cost as it performs a full table scan.

So let's go forward and try to exemplify the process of bloat generation and then detection with our different estimation methods.

Generating bloat artificially

The big question first is - how to artificially generate bloat for our test purposes? Turns out it's not trivial.
For test data generation itself (UPDATE statements) we'll use the helpful command line performance testing tool "pgbench" that's included with standard Postgres installation. The tricky part though is that we need to ensure that old versions of updated rows wouldn't be cleaned up and marked for re-use during our test, thus not producing our wished bloat. One way to achieve that would be to turn off autovacuum in the postgresql.conf or per table basis with "ALTER TABLE SET (autovacuum_enabled = false)", but the easiest way in my opinion is to open a new parallel transaction in REPEATABLE READ mode beforehand and execute a data reading query under that session to make the transaction really active. This ensures that all the rows visible at that moment would be kept around until our "repeatable" transaction finishes. This method would also counter any possible HOT pruning (on-the-fly page level mini-vacuuming) scenarios for cases where updated fields have no indexes.

First, we'll set up the test schema with a scaling factor of 10, meaning 1 million bank accounts will be initialized in our selected test table called "pgbench_accounts". Then, we'll run 1 million account updates, which should double the table size and create a bloat ratio of 50%.

Running the test

krl@r840:~$ pgbench -i -s10 pgbench
creating tables...
100000 of 1000000 tuples (10%) done (elapsed 0.04 s, remaining 0.32 s)
........
1000000 of 1000000 tuples (100%) done (elapsed 0.44 s, remaining 0.00 s)
vacuum...
set primary keys...
done.

pgbench=# dt+
 List of relations
 Schema | Name | Type | Owner | Size | Description
--------+------------------+-------+--------+---------+-------------
 public | pgbench_accounts | table | krl | 128 MB |
 public | pgbench_branches | table | krl | 40 kB |
 public | pgbench_history | table | krl | 0 bytes |
 public | pgbench_tellers | table | krl | 40 kB |
(4 rows)

krl@r840:~$ pgbench -i -s10 pgbench

creating tables...

100000 of 1000000 tuples (10%) done (elapsed 0.04 s, remaining 0.32 s)

........

1000000 of 1000000 tuples (100%) done (elapsed 0.44 s, remaining 0.00 s)

vacuum...

set primary keys...

done.

pgbench=# dt+

List of relations

--------+------------------+-------+--------+---------+-------------

(4 rows)

Now let's execute our "vacuum stopper" query in another session, before we start our data update process.

krl@pgbench=# begin transaction isolation level repeatable read ;
BEGIN
krl@pgbench=#* select * from pgbench_accounts limit 1;
 aid │ bid │ abalance │ filler
─────┼─────┼──────────┼─────────────────────────────────────────────────────────
 3 │ 1 │ 0 │

krl@pgbench=# begin transaction isolation level repeatable read ;

BEGIN

krl@pgbench=#* select * from pgbench_accounts limit 1;

aid │ bid │ abalance │ filler

─────┼─────┼──────────┼─────────────────────────────────────────────────────────

3 │ 1 │ 0 │

Now let's launch our bloat generation test.

Here I'm using the 'prepared transaction' protocol for best performance and note that the --transactions flag is per client.

krl@r840:~$ pgbench --no-vacuum --client=2 --jobs=2 --transactions=500000 --protocol=prepared 
pgbench
transaction type: TPC-B (sort of)
scaling factor: 10
query mode: prepared
number of clients: 2
number of threads: 2
number of transactions per client: 500000
number of transactions actually processed: 1000000/1000000
latency average: 0.000 ms
tps = 1903.859294 (including connections establishing)
tps = 1904.357546 (excluding connections establishing)

krl@pgbench=# dt+
                        List of relations
 Schema │       Name       │ Type  │ Owner │  Size  │ Description
────────┼──────────────────┼───────┼───────┼────────┼─────────────
 public │ pgbench_accounts │ table │ krl   │ 256 MB │
 public │ pgbench_branches │ table │ krl   │ 35 MB  │
 public │ pgbench_history  │ table │ krl   │ 50 MB  │
 public │ pgbench_tellers  │ table │ krl   │ 42 MB  │
(4 rows)

krl@r840:~$ pgbench --no-vacuum --client=2 --jobs=2 --transactions=500000 --protocol=prepared

pgbench

transaction type: TPC-B (sort of)

scaling factor: 10

query mode: prepared

number of clients: 2

number of threads: 2

number of transactions per client: 500000

number of transactions actually processed: 1000000/1000000

latency average: 0.000 ms

tps = 1903.859294 (including connections establishing)

tps = 1904.357546 (excluding connections establishing)

krl@pgbench=# dt+

List of relations

Schema │ Name │ Type │ Owner │ Size │ Description

────────┼──────────────────┼───────┼───────┼────────┼─────────────

public │ pgbench_accounts │ table │ krl │ 256 MB │

public │ pgbench_branches │ table │ krl │ 35 MB │

public │ pgbench_history │ table │ krl │ 50 MB │

public │ pgbench_tellers │ table │ krl │ 42 MB │

(4 rows)

Finished...and seems that our test table "pgbench_accounts" has indeed doubled in size.

Now, let's install the "pgstattuple" extension and try to get some estimates.

For all provided functions take a look into the documentation - https://www.postgresql.org/docs/9.5/static/pgstattuple.html
For updating the internal statistics and freeing up old version of rows to help with the guesswork we also need to close our parallel session and run VACUUM ANALYZE.

krl@pgbench=#* rollback ;
ROLLBACK
krl@pgbench=# VACUUM ANALYZE pgbench_accounts;
VACUUM
krl@pgbench=# CREATE EXTENSION pgstattuple ;
CREATE EXTENSION

krl@pgbench=#* rollback ;

ROLLBACK

krl@pgbench=# VACUUM ANALYZE pgbench_accounts;

VACUUM

krl@pgbench=# CREATE EXTENSION pgstattuple ;

CREATE EXTENSION

Let's look at the estimate numbers.

I've only selected most pertinent columns here and also added our table name to the WHERE clause of the community query to reduce noise.
NB! Note that in case of pgstattuple_approx() the scanned_percent value of 0 indicates that we indeed have an estimated value, thus under unfavourable conditions this function can still do table scans!

krl@pgbench=# select table_len, scanned_percent, approx_free_space, approx_free_percent 
from pgstattuple_approx('pgbench_accounts');
 table_len │ scanned_percent │ approx_free_space │ approx_free_percent
───────────┼─────────────────┼───────────────────┼─────────────────────
 268607488 │ 0 │ 131164800 │ 48.8314011558754
(1 row)

krl@pgbench=# i ~/temp/bloat_investigation/table_bloat_ioguix.sql
 real_size │ extra_size │ extra_ratio
───────────┼────────────┼──────────────────
 268607488 │ 136126464 │ 50.6785812315106
(1 row)

pgbench=# select table_len, free_space, free_percent from pgstattuple('pgbench_accounts');
 table_len │ free_space │ free_percent
───────────┼────────────┼──────────────
 268607488 │ 131689396 │ 49.03
(1 row)

krl@pgbench=# select table_len, scanned_percent, approx_free_space, approx_free_percent

from pgstattuple_approx('pgbench_accounts');

table_len │ scanned_percent │ approx_free_space │ approx_free_percent

───────────┼─────────────────┼───────────────────┼─────────────────────

268607488 │ 0 │ 131164800 │ 48.8314011558754

(1 row)

krl@pgbench=# i ~/temp/bloat_investigation/table_bloat_ioguix.sql

real_size │ extra_size │ extra_ratio

───────────┼────────────┼──────────────────

268607488 │ 136126464 │ 50.6785812315106

(1 row)

pgbench=# select table_len, free_space, free_percent from pgstattuple('pgbench_accounts');

table_len │ free_space │ free_percent

───────────┼────────────┼──────────────

268607488 │ 131689396 │ 49.03

(1 row)

Making sense of the results

Now we have the estimates and the truthful number. What we see is that the pgstattuple_approx(approx_free_space=131164800) and also the pure SQL estimation (extra_size=136126464) are very close to the real value of 131689396 (pgstattuple.free_space), with according deviations of ~0.5% and ~3%.
So to finish with a conclusion - estimation methods for simple tables with mostly fixed data-type columns work surprisingly well and one can really build a bloat removal check or alert on top of them.

In case you need any assistance, please feel free to contact us.

Estimating table bloat in PostgreSQL

Estimation approaches

Generating bloat artificially

Running the test

Now let's launch our bloat generation test.

Now, let's install the "pgstattuple" extension and try to get some estimates.

Let's look at the estimate numbers.

Making sense of the results

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts