If you are running PostgreSQL in production you might have already noticed that adjusting checkpoints can be a very beneficial thing to tune the server and to improve database performance overall. However, there is more to this: Increasing checkpoint distances can also help to actually reduce the amount of WAL that is created in the first place. That is right. It is not just a performance issue, but it also has an impact on the volume of log created.
Table of Contents
You want to know why? Let us take a closer look!
For my test I am using a normal PostgreSQL 12 installation. To do that I am running initdb by hand:
1 |
% initdb -D /some_where/db12 restart |
Then I have adjusted two parameters in postgresql.conf:
1 2 |
max_wal_size = 128MB synchronous_commit = off |
and restarted the server:
1 |
% pg_ctl -D /some_where/db12 restart |
First of all I have reduced max_wal_size to make sure that PostgreSQL checkpoint a lot more often than this would be the case in the default installation. Performance will drop like a stone. The second parameter is to tell PostgreSQL to use asynchronous commits. This does not impact WAL generation but will simply speed up the benchmark to get results faster.
Finally I am using pgbench to create a test database containing 10 million rows (= 100 x 100.000)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
% createdb test % pgbench -s 100 -i test dropping old tables... NOTICE: table 'pgbench_accounts' does not exist, skipping NOTICE: table 'pgbench_branches' does not exist, skipping NOTICE: table 'pgbench_history' does not exist, skipping NOTICE: table 'pgbench_tellers' does not exist, skipping creating tables... generating data... 100000 of 10000000 tuples (1%) done (elapsed 0.13 s, remaining 12.90 s) 200000 of 10000000 tuples (2%) done (elapsed 0.28 s, remaining 13.79 s) … 9700000 of 10000000 tuples (97%) done (elapsed 12.32 s, remaining 0.38 s) 9800000 of 10000000 tuples (98%) done (elapsed 12.46 s, remaining 0.25 s) 9900000 of 10000000 tuples (99%) done (elapsed 12.59 s, remaining 0.13 s) 10000000 of 10000000 tuples (100%) done (elapsed 12.68 s, remaining 0.00 s) vacuuming... creating primary keys... done. |
Let us run a test now: We want to run 10 x 1 million transactions. If you want to repeat the tests on your database make sure the test runs long enough. Let us take a look at the test!
1 2 3 4 5 6 7 8 9 10 11 12 |
% pgbench -c 10 -j 5 -t 1000000 test starting vacuum...end. transaction type: scaling factor: 100 query mode: simple number of clients: 10 number of threads: 5 number of transactions per client: 1000000 number of transactions actually processed: 10000000/10000000 latency average = 0.730 ms tps = 13699.655677 (including connections establishing) tps = 13699.737288 (excluding connections establishing) |
As you can see we managed to run roughly 13.700 transactions per second (on my personal laptop). Let us take a look at the amount of WAL created:
1 2 3 4 5 |
test=# SELECT pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 13/522921E8 (1 row) |
The way to figure out the current WAL position is to call the pg_current_insert_lsn() function. To most people this format might be incomprehensible and therefore it is easier to transform the value into a readable number:
1 2 3 4 5 |
test=# SELECT pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn; ?column? ------------- 82982805992 (1 row) |
The pg_size_pretty() function will return the number in a readily understandable format:
1 2 3 4 5 |
test=# SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn); pg_size_pretty ---------------- 77 GB (1 row) |
Let us stop the database, delete the data directory and initialize it again. This time my postgresql.conf parameters will be set to way higher values:
1 2 3 |
max_wal_size = 150GB checkpoint_timeout = 1d synchronous_commit = off |
Then we start the database again, populate it with data and run pgbench with the exact same parameters. Keep in mind that to repeat the process it is necessary to run a fixed number of transactions instead of running the test for a certain time. This way we can fairly compare the amount of WAL created.
Let us take a look and see how much WAL has been created this time:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# SELECT pg_current_wal_insert_lsn(); pg_current_wal_insert_lsn --------------------------- 1/4F210328 (1 row) test=# SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn); pg_size_pretty ---------------- 5362 MB (1 row) |
Wow, only 5.3 GB of WAL. This is a major improvement but why did it happen?
The idea of the transaction log is to make sure that in case of a crash the system will be able to recover nicely. However, PostgreSQL cannot keep the transaction log around forever. The purpose of a checkpoint is ensuring that all data is safe in the data file before the WAL can be removed from disk.
Here comes the trick: If a block is modified after a checkpoint for the first time it has to be logged completely. The entire block will be written to the WAL. Every additional change does not require the entire block to be written – only the changes have to be logged which is, of course, a lot less.
This is the reason for the major difference we see here. By increasing the checkpoint distances to a really high value, we basically reduced checkpoints to an absolute minimum which in turn reduces the number of “first blocks” written.
Changing WAL settings is not just good for performance in the long run – it will also reduce the volume of WAL.
If you want to know more about PostgreSQL and want to learn something about VACUUM check out my blog about this important topic.
Also, if you like our posts you might want to share them on LinkedIn or on Facebook. We are looking forward to your comments.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Once again, a very interesting article. Many thanks.
Great article!
I didn't know that you can calculate how much WAL data is generate by using the
pg_current_wal_insert_lsn()
function. This query:SELECT pg_size_pretty(pg_current_wal_insert_lsn() - '0/00000000'::pg_lsn);
Looks like something that should be tracked by a monitoring system (its rate at least).
Two small suggestions for simple improvements to the article:
⚫ Changing
max_wal_size
andsynchronous_commit
don't require a restart, just a reload will do the job.⚫ It would be great if the article mentioned that setting
max_wal_size = 150GB
andcheckpoint_timeout = 1d
would mean that the checkpoint will be created after 150GB of WALs generated, or after 1 day, whichever comes first. This means in case of a PG crash it will need to read all 150GB (or 1 day) of WALs before it becomes ready to accept connections.In case the funcion : pg_wal_lsn_diff() never came to your attention
SELECT pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_insert_lsn(),'0/00000000')