By Kaarel Moppel – The “backend_flush_after” PostgreSQL server configuration parameter was introduced some time ago, in version 9.6. It has been flying under the radar, and had not caught my attention previously. However, I recently was pasted (not being on Twitter) a tweet from one of the Postgres core developers Andres Freund. The tweet basically said: – if your workload is bigger than
shared_buffers, you should enable the
backend_flush_after parameter for improved throughput and also jitter. Hmm, who wouldn’t like an extra boost on performance for free? FOMO kicked in… but before adding this parameter to my “standard setup toolbox”, I hurried to test things out – my own eye is king! So here’s a quick test and my conclusion on the effects of enabling (not enabled by default!)
What does this parameter actually do?
Trying to interpret the documentation (link here) in my own words –
backend_flush_after is basically designed to enable sending “hints” to the OS, that if the user has written more than X bytes (configurable from 0 to max. 2MB), it would be very nice if the kernel could already flush recently changed data files in the background. That way, when the “checkpointer” comes or the kernel’s “dirty” limit is reached, there would be less bulk “fsyncing” to do – meaning less IO contention (spikes) for our user sessions. Thus resulting in smoother response times.
Be warned though – unlike most Postgres settings this one is not guaranteed to function. It currently only works on Linux systems which have
sync_file_range() functionality available – which again depends on the kernel version and file system used. In short, this explains why the parameter has not gotten too much attention. Similar story actually also with the “sister” parameters –
wal_writer_flush_after…with the difference that they are already enabled by default!
NB! Also note that this parameter, being controlled and initiated by Postgres, might be the only way to influence the kernel IO subsystem when using some managed / cloud PostgreSQL service!
Test setup for backend_flush_after
- Hardware: 4vCPU, 8GB, 160 GB local SSD, Ubuntu 18.04 (dirty_ratio=20, dirty_background_ratio=10, no Swap) droplet on DigitalOcean (Frankfurt)
- Software: PostgreSQL 11.4 at defaults, except – checkpoint_completion_target=0.9 (which is quite a typical setting to “smooth” IO), shared_buffers=’2GB’
- Test case: standard “pgbench” OLTP runs with 2 clients per CPU, 2h runs i.e.: “pgbench -T 7200 -c 8 -M prepared –random-seed=5432”
- Test 1 settings: Workload fitting into Shared Buffers (–scale=100)
- Test 2 settings: Workload 4x bigger than RAM (–scale=2200). FYI – to calculate the needed “scale factor” I use this calculator
As you might have noticed – although the tweet mentioned workloads bigger than
shared_buffers, in the spirit of good old “doubt everything”, I still decided to test both cases 🙂
Test results for backend_flush_after
With Test 1, where the workload fit into
shared_buffers, there’s actually nothing worthwhile to mention – my radars picked up no remotely significant difference, Andres was right! And basically test #2 also confirmed what was declared – but see the table below for numbers. NB! Numbers were measured on the server side from
pg_stat_statements. During the tests, system CPU utilization was on average around ~55% and IO-wait (vmstat “wa” column) was about 25%, which is much more than a typical system would exhibit. However, that highlights the
backend_flush_after IO optimizations better. Also note that the results table only includes numbers for the main
pgbench_accounts SQL statement. Differences for the other mini-tables (which get fully cached) were on the “noise level”.
UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2
|Test||Mean time (ms)||Change (%)||Stddev time (ms)||Change (%)|
|Workload=4x mem., backend_flush_after=0 (default)||0.637||–||0.758||–|
|Workload=4x mem., backend_flush_after=512kB||0.632||-0.8||0.606||-20.0|
|Workload=4x mem., backend_flush_after=2MB||0.609||-4.4||0.552||-27.2|
First off – as it was a very simple test, I wouldn’t assign too much importance to the numbers themselves. But it showed that indeed, the
backend_flush_after setting makes things a bit better when using the biggest “chunk size”. This is especially visible with transaction time standard deviations…and more importantly – it doesn’t make things worse! For heavily loaded setups, I’ll use it without fear in the future, especially with spinning disks (if anyone still uses them), where the difference should be even more pronounced. Bear in mind though, that when it comes to the Linux kernel disk subsystem, there’s a bunch of other parameters that are relevant, like
dirty_background_ratio, “swappiness” and the type of scheduler: the effects of tuning those could be even more pronounced!