If you’re not yet familiar with the Postgres “synchronous_commit” parameter you should definitely keep reading. It’s one of the important-ish parameters with an above average amount of options, which could be a bit daunting for the casual DBA. And in soon to be released PostgreSQL 9.6 another new option called “remote_apply” was added so I thought I’ll take it for a spin out of curiosity, while also trying to explain the other options in simple terms and to perform some testing in a streaming replication scenario, with and without synchronous replication.
Allowed values up to Postgres version 9.5 were – “on”, “remote_write”, “local” and “off” and now with 9.6 “remote_apply” will join the party. Short descriptions of all of these in plain-text could be something like that of below. NB! Latter 3 values are effective only in synchronous streaming replication mode and fall back to “on” value when no replicas have been listed in the “synchronous_standby_names” parameter.
* on – Transaction commit always waits until the data is really flushed to the transaction log (aka WAL or XLOG) making sure the transaction is really persisted. In synchronous streaming replication mode also the replica needs to do the same.
* off – Commit can be acknowledged to the calling client before the transaction is actually flushed to the transaction log, making it possible to lose some recent (<1s by default) allegedly-committed transactions in case of a server crash. Synchronous replication is ignored. This is one of the most known Postgres performance tweaks.
* local – Forces the “on” behavior (guaranteed data flush) only on the primary node. Usually used as user set session parameter for speeding up non-critical data imports on primary for example.
* remote_write – Provides a weaker guarantee than “on”, transaction counts as commited if primary does a guaranteed flush and the replica just gets a write confirmation from the operating system (prone to replica corruption if replica machine crashes before hitting storage).
* remote_apply – Provides the strongest replica consistency – commit doesn’t return before replica flushes and also applies the data. Clients could either query the primary or the standby, they would have exactly the same view on the data.
Performance impact of different “synchronous_commit” settings
Now what is the performance impact of these different values in a streaming replication scenario? Logically we could assume that we could just order them based on given guarantees (which should translate to amount of work done) like that:
off (async) > on (async) > remote_write (sync) > on|local (sync) > remote_apply (sync)
But what would the penalty in numbers be when using higher consistency levels? How much slower would the transactions get? Let’s do a set of quick tests again with our good old buddy pgbench!
For testing, I created a primary-replica setup on AWS and a script running through different parameter values, re-starting and re-initializing the test schema for every parameter. The script can be found here so I will skip details for brevity and just present you the TPS (transactions per second) results.
Results and summary
On a pair of AWS EC2 i2.xlarge instances (good starting point for a busy application as its I/O optimized, 4 vCPU, 30.5 GB RAM, 800 GB SSD) I got the following numbers for a shortish 10 minute test period on a “pgbench” scale 100 size dataset (dataset fits in RAM) with 8 concurrent clients:
“on” async – 4256 TPS (FYI – here 1 transaction means 3 updates, 1 insert, 1 select)
“off” async – 6211 TPS (+45% compared to default “async on”)
“on” sync – 3329 TPS (-22% compared to default “async on”)
“remote_write” sync – 3720 TPS (+12% compared to “sync on”)
“remote_apply” sync – 3055 TPS (-8% compared to “sync on”)
First, as always – performance numbers rely on a lot of things, with synchronous replication especially a lot on network performance/distance, so your mileage will definitely vary, but I think we can still note a few things from here:
- by turning off “synchronous_commit” you get a very significant TPS boost and it indeed can be used as a temporary performance crutch
- when going for synchronous replication you will lose ca 1/4 of your performance
- when going for synchronous replication, the differences between “remote_apply” and the default “on” are < 10% so if not already against the wall performance wise and using the replica also for read queries then it would make sense to enable this safest setting to get a mirrored view on the data.