September 2016

By Kaarel Moppel - If you're not yet familiar with the Postgres "synchronous_commit" parameter, you should definitely keep reading. It's one of the important parameters and contains an above-average number of options, which could be a bit daunting for the casual DBA. And in the soon to be released PostgreSQL 9.6, another new option called "remote_apply" was added. So, I thought I'll take it for a spin out of curiosity, while also trying to explain the other options in simple terms. I also wanted to perform some testing in a streaming replication scenario, with and without synchronous replication.

Options available

Allowed values up to Postgres version 9.5 were - "on", "remote_write", "local" and "off" and now with 9.6 "remote_apply" will join the party. Short descriptions of all of these in plain-text could be something like that of below. NB! Latter 3 values are effective only in synchronous streaming replication mode and fall back to "on" value when no replicas have been listed in the "synchronous_standby_names" parameter.

* on - Transaction commit always waits until the data is really flushed to the transaction log (aka WAL or XLOG) making sure the transaction is really persisted. In synchronous streaming replication mode also the replica needs to do the same.

* off - Commit can be acknowledged to the calling client before the transaction is actually flushed to the transaction log, making it possible to lose some recent (<1s by default) allegedly-committed transactions in case of a server crash. Synchronous replication is ignored. This is one of the most known Postgres performance tweaks. * local - Forces the "on" behavior (guaranteed data flush) only on the primary node. Usually used as user set session parameter for speeding up non-critical data imports on primary for example.

* remote_write - Provides a weaker guarantee than "on", transaction counts as commited if primary does a guaranteed flush and the replica just gets a write confirmation from the operating system (prone to replica corruption if replica machine crashes before hitting storage).

* remote_apply - Provides the strongest replica consistency - commit doesn't return before replica flushes and also applies the data. Clients could either query the primary or the standby, they would have exactly the same view on the data.

Performance impact of different "synchronous_commit" settings

Now what is the performance impact of these different values in a streaming replication scenario? Logically we could assume that we could just order them based on given guarantees (which should translate to amount of work done) like that:

off (async) > on (async) > remote_write (sync) > on|local (sync) > remote_apply (sync)

But what would the penalty in numbers be when using higher consistency levels? How much slower would the transactions get? Let's do a set of quick tests again with our good old buddy pgbench!

For testing, I created a primary-replica setup on AWS and a script running through different parameter values, re-starting and re-initializing the test schema for every parameter. The script can be found here so I will skip details for brevity and just present you the TPS (transactions per second) results.

Results and summary

On a pair of AWS EC2 i2.xlarge instances (good starting point for a busy application as its I/O optimized, 4 vCPU, 30.5 GB RAM, 800 GB SSD) I got the following numbers for a shortish 10 minute test period on a "pgbench" scale 100 size dataset (dataset fits in RAM) with 8 concurrent clients:

“on” async - 4256 TPS (FYI – here 1 transaction means 3 updates, 1 insert, 1 select)

“off” async - 6211 TPS (+45% compared to default "async on")

“on” sync - 3329 TPS (-22% compared to default "async on")

“remote_write” sync - 3720 TPS (+12% compared to "sync on")

“remote_apply” sync - 3055 TPS (-8% compared to "sync on")

First, as always – performance numbers rely on a lot of things, with synchronous replication especially a lot on network performance/distance, so your mileage will definitely vary, but I think we can still note a few things from here:

by turning off "synchronous_commit" you get a very significant TPS boost and it indeed can be used as a temporary performance crutch
when going for synchronous replication you will lose ca 1/4 of your performance
when going for synchronous replication, the differences between “remote_apply” and the default "on" are < 10% so if not already against the wall performance wise and using the replica also for read queries then it would make sense to enable this safest setting to get a mirrored view on the data.

“Our database is slow. What if we just buy more disks? Is it going to fix things?”. I think every PostgreSQL database consultant in the world has heard this kind of question already more than once. Of course the very same questions are asked by our customers here at CYBERTEC. While more disks are surely a nice thing to have, it is not always economical to buy more hardware to fix problems, which are in many cases not caused by bad disk performance.

pg_stat_statements: Digging into the details of disk performance

To answer the question whether additional disks make sense or not, it is important to extract statistics from the system. In my opinion, the best tool to do that pg_stat_statements, which is currently part of the PostgreSQL contrib module.

It will give you deep insights into what is going on inside the server and it will also give a clue, what happens on the I/O side. In short: It is possible to measure “disk wait”. Therefore, it is always a good idea to enable this module by default. The overhead is minimal, so it is definitely worth to add this thing to the server.

Here is how it works:

First of all you have to set …

shared_preload_libraries = 'pg_stat_statements'

1	shared_preload_libraries = 'pg_stat_statements'

… postgresql.conf and restart the server.

Then you can run …

CREATE EXTENSION pg_stat_statements;

1	CREATE EXTENSION pg_stat_statements;

… in your database. This will create a view, which contains most of the information you will need. This includes, but is not limited to, how often a query was called, the total runtime of a certain type of query, caching behavior and so on.

The pg_stat_statements view will contain 4 fields, which will be vital to our investigation: query, total_time, blk_read_time and blk_write_time.

The blk_* fields will tell us how much time a certain query has spent on reading and writing. We can then compare this to total_time to see if I/O time is relevant or not. In case you've got enough memory, data will reside in RAM anyway, and so the disk might only be needed to store changes.

There is one important aspect, which is often missed: blk_* is by default empty as PostgreSQL does not sum up I/O time by default due to potentially high overhead.

`pg_test_timing`: Measuring overhead

To sum up I/O times, set track_io_timing to true in postgresql.conf. In this case, pg_stat_statements will start to show the data you need.

However, before you do that, consider running pg_test_timing: Remember, if you want to measure I/O timing you have to check time twice per block to determine runtime. This can cause overhead. pg_test_timing will show how much overhead there is:

iMac:~ hs$ pg_test_timing
Testing timing overhead for 3 seconds.
Per loop time including overhead: 37.97 nsec

Histogram of timing durations:
< usec   % of total      count
1       96.25135   76053866
2        3.74404    2958388
4        0.00022        172
8        0.00004         31
16       0.00410       3238
32       0.00019        150
64       0.00004         32
128      0.00001          6
256      0.00001          6
512      0.00000          2
1024     0.00001          4

iMac:~ hs$ pg_test_timing

Testing timing overhead for 3 seconds.

Per loop time including overhead: 37.97 nsec

Histogram of timing durations:

< usec % of total count

1 96.25135 76053866

2 3.74404 2958388

4 0.00022 172

8 0.00004 31

16 0.00410 3238

32 0.00019 150

64 0.00004 32

128 0.00001 6

256 0.00001 6

512 0.00000 2

1024 0.00001 4

On my iMac, the average overhead for a call is 37.97 nanoseconds. On a good Intel server you can maybe reach 14-15 nsec. If you happen to run bad virtualization solutions this number can easily explode to 1400 or even 1900 nsec. The value pg_test_timing will return will also depend on the operating system you are using. It seems to be the case that Windows 8.1 was the first version of Windows, which managed to come close to what Linux is able to deliver.

Drawing your conclusions concerning disk performance

Having good data is really the key to making good decisions. Buying more and better disks really only makes sense if you are able to detect a disk bottleneck using pg_stat_statements. However, before you do that: Try to figure out if the queries causing the problems can actually be improved. More hardware is really just the last thing you want to try.

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.