The above-mentioned PostgreSQL server configuration parameter was introduced already some time ago, in version 9.6, but has been flying under the radar so to say and had not caught my attention previously. Until I recently was pasted (not being on Twitter) a tweet from one of the Postgres core developers Andres Freund, that basically said – if your workload is bigger than Shared Buffers, you should enable the “ backend_flush_after” parameter for improved throughput and also jitter. Hmm, who wouldn’t like an extra boost on performance for free? FOMO kicked in… but before adding this parameter to my “standard setup toolbox” I hurried to test things out – own eye is king! So here a small test and my conclusion on effects of enabling (not enabled by default!) “backend_flush_after”.
What does this parameter actually do?
Trying to interpret the documentation (link here) in my own wording – “backend_flush_after” is basically designed to enable sending “hints” to the OS, that if user has written more than X bytes (configurable from 0 to max. 2MB) it would be very nice if the kernel could already do some flushing of recently changed data files in the background, so that when the “checkpointer” comes or the kernel’s “dirty” limit is reached, there would be less bulk “fsyncing” to do – meaning less IO contention (spikes) for our user sessions, thus smoother response times.
Be warned though – unlike most Postgres settings this one actually is not guaranteed to function, and currently only can work on Linux systems, having sync_file_range() functionality available – which again depends on kernel version and used file system. In short, this explains why the parameter has not gotten too much attention. Similar story actually also with the “sister” parameters – “bgwriter_flush_after”, “checkpoint_flush_after”, “wal_writer_flush_after”…with the difference that they are already enabled by default!
NB! Also note that this parameter, being controlled and initiated by Postgres, might be the only way to influence the kernel IO subsystem when using some managed / cloud PostgreSQL service!
- Hardware: 4vCPU, 8GB, 160 GB local SSD, Ubuntu 18.04 (dirty_ratio=20, dirty_background_ratio=10, no Swap) droplet on DigitalOcean (Frankfurt)
- Software: PostgreSQL 11.4 at defaults, except – checkpoint_completion_target=0.9 (which is quite a typical setting to “smooth” IO), shared_buffers=’2GB’
- Test case: standard “pgbench” OLTP runs with 2 clients per CPU, 2h runs i.e.: “pgbench -T 7200 -c 8 -M prepared –random-seed=5432”
- Test 1 settings: Workload fitting into Shared Buffers (–scale=100)
- Test 2 settings: Workload 4x bigger than RAM (–scale=2200). FYI – to calculate the needed “scale factor” I use this calculator
As you might have noticed – although the tweet mentioned workloads bigger than Shared Buffers, in the spirit of good old “doubt everything”, I decided to test both cases still:)
With Test 1, where workload was fitting into shared_buffers, there’s actually nothing worthwhile to mention – my radars picked up no remotely significant difference, Andres was right! And basically test #2 also confirmed what was declared – but see the table below for numbers. NB! Numbers were measured on server side from “pg_stat_statements” and during the tests system CPU utilization was on average around ~55% and IO-wait (vmstat “wa” column) showing 25% which is much more than a typical system would exhibit but what should highlight the “backend_flush_after” IO optimizations better. Also note that the results table includes numbers only for the main UPDATE “pgbench_accounts” SQL statement as differences for the other mini-tables (which get fully cached) were on the “noise level”.
UPDATE pgbench_accounts SET abalance = abalance + $1 WHERE aid = $2
|Test||Mean time (ms)||Change (%)||Stddev time (ms)||Change (%)|
|Workload=4x mem., backend_flush_after=0 (default)||0.637||–||0.758||–|
|Workload=4x mem., backend_flush_after=512kB||0.632||-0.8||0.606||-20.0|
|Workload=4x mem., backend_flush_after=2MB||0.609||-4.4||0.552||-27.2|
First off – as it was a very simple test, I wouldn’t assign too much importance to the numbers themselves. But it showed that indeed, the “backend_flush_after” setting makes things a bit better when using the biggest “chunk size”, especially visible with transaction time standard deviations…and more importantly – it doesn’t also make things worse! For heavily loaded setups I’ll use it without fear in the future, especially with spinning disks (if anyone still using them), where the difference should be even more pronounced. Bear in mind though that when it comes to Linux kernel disk subsystem, there’s a bunch of other parameters that are relevant, like “dirty_ratio”, “dirty_background_ratio”, “swappiness” and the type of scheduler, and the effects of tuning those could be even bigger!