CYBERTEC PostgreSQL Logo

The "synchronous_commit" parameter and streaming replication

09.2016 / Category: / Tags:

By Kaarel Moppel - If you're not yet familiar with the Postgres "synchronous_commit"  parameter, you should definitely keep reading. It's one of the important parameters and contains an above-average number of options, which could be a bit daunting for the casual DBA. And in the soon to be released PostgreSQL 9.6, another new option called "remote_apply" was added. So, I thought I'll take it for a spin out of curiosity, while also trying to explain the other options in simple terms. I also wanted to perform some testing in a streaming replication scenario, with and without synchronous replication.

Options available

Allowed values up to Postgres version 9.5 were - "on", "remote_write", "local" and "off" and now with 9.6 "remote_apply" will join the party. Short descriptions of all of these in plain-text could be something like that of below. NB! Latter 3 values are effective only in synchronous streaming replication mode and fall back to "on" value when no replicas have been listed in the "synchronous_standby_names" parameter.

* on - Transaction commit always waits until the data is really flushed to the transaction log (aka WAL or XLOG) making sure the transaction is really persisted. In synchronous streaming replication mode also the replica needs to do the same.

* off - Commit can be acknowledged to the calling client before the transaction is actually flushed to the transaction log, making it possible to lose some recent (<1s by default) allegedly-committed transactions in case of a server crash. Synchronous replication is ignored. This is one of the most known Postgres performance tweaks. * local -  Forces the "on" behavior (guaranteed data flush) only on the primary node. Usually used as user set session parameter for speeding up non-critical data imports on primary for example.

* remote_write - Provides a weaker guarantee than "on", transaction counts as commited if primary does a guaranteed flush and the replica just gets a write confirmation from the operating system (prone to replica corruption if replica machine crashes before hitting storage).

* remote_apply - Provides the strongest replica consistency - commit doesn't return before replica flushes and also applies the data. Clients could either query the primary or the standby, they would have exactly the same view on the data.

Performance impact of different "synchronous_commit" settings

Now what is the performance impact of these different values in a streaming replication scenario? Logically we could assume that we could just order them based on given guarantees (which should translate to amount of work done) like that:

off (async) > on (async) > remote_write (sync)  > on|local (sync)  > remote_apply (sync)

But what would the penalty in numbers be when using higher consistency levels? How much slower would the transactions get? Let's do a set of quick tests again with our good old buddy pgbench!

For testing, I created a primary-replica setup on AWS and a script running through different parameter values, re-starting and re-initializing the test schema for every parameter. The script can be found here so I will skip details for brevity and just present you the TPS (transactions per second) results.

Results and summary

On a pair of AWS EC2 i2.xlarge instances (good starting point for a busy application as its I/O optimized, 4 vCPU, 30.5 GB RAM, 800 GB SSD) I got the following numbers for a shortish 10 minute test period on a "pgbench" scale 100 size dataset (dataset fits in RAM) with 8 concurrent clients:

“on” async - 4256 TPS (FYI – here 1 transaction means 3 updates, 1 insert, 1 select)

“off” async - 6211 TPS (+45% compared to default "async on")

“on” sync - 3329 TPS (-22% compared to default "async on")

“remote_write” sync - 3720 TPS (+12% compared to "sync on")

“remote_apply” sync - 3055 TPS (-8% compared to "sync on")

First, as always – performance numbers rely on a lot of things, with synchronous replication especially a lot on network performance/distance, so your mileage will definitely vary, but I think we can still note a few things from here:

  • by turning off "synchronous_commit" you get a very significant TPS boost and it indeed can be used as a temporary performance crutch
  • when going for synchronous replication you will lose ca 1/4 of your performance
  • when going for synchronous replication, the differences between “remote_apply” and the default "on" are < 10% so if not already against the wall performance wise and using the replica also for read queries then it would make sense to enable this safest setting to get a mirrored view on the data.
0 0 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Darshan Shah
Darshan Shah
4 years ago

Hello Kaarel,
I have set up pg_dump on standby server. I have streaming replication configured betwen primary and standby. Sometimes backup fails with below error.
ERROR: canceling statement due to conflict with recovery
Detail: User was holding a relation lock for too long.
Statement: COPY public.tablename (id, queuetime, reexecuteafter, queuedata, jsonbody, iteration, isexecuted, lastexecuted) TO stdout;

Is synchronous_commit=on the root cause for this ? I checked same time no other process running on primary server. At 1:00 am vacuum analyze and reindex processes run which takes around 30 minutes.
backup using pg_dump starts at 2:00 am.

Any hint ?

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram