replication conflict?
© Laurenz Albe 2020

Streaming replication in PostgreSQL is a well-established master-slave replication technique. It is simple to set up, stable and performs well. So many people are surprised when they learn about replication conflicts — after all, the standby server is read-only.

This article describes replication conflicts and tells you how to deal with them.

What is a replication conflict?

A replication conflict occurs whenever the recovery process cannot apply WAL information from the primary server to the standby, because the change would disrupt query processing there. These conflicts cannot happen with queries on the primary server, but they happen on the streaming replication standby server because the primary server has limited knowledge about what is going on on the standby.

There are several kinds of replication conflicts:

Snapshot replication conflicts

This is the most frequent replication conflict.

Snapshot conflicts can occur if VACUUM processes a table and removes dead tuples. This removal is replayed on the standby. Now a query on the standby may have started before VACUUM on the primary (it has an older snapshot), so it can still see the tuples that should be removed. This constitutes a snapshot conflict.

Lock replication conflicts

The queries on a standby server take an ACCESS EXCLUSIVE lock on the tables they are reading. So any ACCESS EXCLUSIVE lock on the primary (which conflicts with ACCESS SHARE) must be replayed on the standby to keep incompatible operations on the table from happening. PostgreSQL takes such a lock for operations that conflict with SELECT, for example DROP TABLE, TRUNCATE and many ALTER TABLE statements. If the standby should replay such a lock on a table that a query uses, we have a lock conflict.

Buffer pin replication conflicts

One way to reduce the need for VACUUM is to use HOT updates. Then any query on the primary that accesses a page with dead heap-only tuples and can get an exclusive lock on it will prune the HOT chains. PostgreSQL always holds such page locks for a short time, so there is no conflict with processing on the primary. There are other causes for page locks, but this is perhaps the most frequent one.

When the standby server should replay such an exclusive page lock and a query is using the page (“has the page pinned” in PostgreSQL jargon), you get a buffer pin replication conflict. Pages can be pinned for a while, for example during a sequential scan of a table on the outer side of a nested loop join.

HOT chain pruning can of course also lead to snapshot replication conflicts.

Rare kinds of replication conflicts

The following types of conflict are rare and will not bother you:

  • Deadlock replication conflicts: A query on the standby blocks while using the shared buffer that is needed to replay WAL from the primary. PostgreSQL will cancel such a query immediately.
  • Tablespace replication conflicts: A tablespace is in temp_tablespaces on the standby server, and a query has temporary files there. When a DROP TABLESPACE occurs from the primary, we get a conflict. PostgreSQL cancels all queries on the standby in that case.
  • Database replication conflicts: Replication of DROP DATABASE causes a conflict if the standby has active sessions on the database. PostgreSQL terminates all connections to the database on the standby in that case.

Monitoring replication conflicts

The statistics view pg_stat_database_conflicts contains a detailed account of all replication conflicts that happened since the last statistics reset. You have to look at that view on the standby server, not the primary, because that is where replication conflicts occur.

Note that this view does not show all replication conflicts that occurred, it only shows the ones that led to a canceled query on the standby (see the next section).

How does the standby server resolve replication conflicts?

The parameter max_standby_streaming_delay determines what happens when WAL replay encounters a replication conflict (there is a similar parameter max_standby_archive_delay that does the same thing for archive recovery). PostgreSQL suspends replay of the WAL information for at most max_standby_streaming_delay milliseconds. If the conflicting query is still running after that time, PostgreSQL cancels it with an error message like:

ERROR:  canceling statement due to conflict with recovery
DETAIL:  User query might have needed to see row versions that must be removed.

The detail message shows that this was from a snapshot replication conflict.

max_standby_streaming_delay has a default value of 30 seconds, so queries on the standby get a “grace time” of half a minute to finish before they get canceled if they cause a replication conflict. This is a middle ground between the extreme settings of 0 (PostgreSQL cancels queries immediately, no delay for replay) and the special value -1 (PostgreSQL never cancels queries, arbitrarily long replay delay).

To understand how to best configure PostgreSQL to deal with replication conflicts, we have to take a look at the use cases for streaming replication.

Use cases for streaming replication

High availability

Streaming replication is the basis for most high availability solutions. Together with a software like Patroni that manages failover, it provides a robust shared-nothing architecture to build a fault tolerant system.

Clearly, the main objective with high availability is to have as little replication delay as possible. This way, standby promotion is fast and little information is lost during the failover. In this case, you want to set max_standby_streaming_delay low.

Note that you need not lose more data during a failover if the standby is behind with applying WAL – the WAL information is still streamed to the standby and written to pg_wal. But it will take longer for the standby to catch up, so the failover time increases.

Off-loading big queries

Big queries for reporting or data analysis can generate load that may overload the productive system. The best solution for that is a data warehouse specifically designed for such queries. But often a standby server can serve as a “poor man’s data warehouse”.

Another example for off-loading are database backups: backing up a standby server puts no stress on the primary system.

The main objective in this case is to let the queries (or the backup) complete without interruption.

In this case, you want to set max_standby_streaming_delay to a value greater than the lifetime of the longest query, and it is no problem if there is a delay replaying WAL.

Horizontal scaling

You can use standby servers to distribute database workload across several machines. In practice, there are several limitations to the usefulness of this approach:

  • all writing statements have to go to the primary server, so only reading can be scaled
  • the application has to be able to direct queries and data modifications to different databases
  • the application has to cope with the problem that data modifications may not immediately be visible to queries (synchronous replication avoids this, but the performance impact on writing transactions is staggering)

Another difficulty you face is that there is no good setting for max_standby_streaming_delay: a low value will make queries on the standby fail, while a high value will cause queries on the standby to see stale data.

How to deal with conflicting requirements

Ideally, a standby server serves only a single purpose, so that you can adjust max_standby_streaming_delay or hot_standby accordingly. So the king’s way it to have dedicated standby servers for failover and off-loading work.

But sometimes you cannot afford a second standby server, or you may be stuck in a situation like in the “horizontal scaling” scenario above. Then your only option is to reduce the number of replication conflicts as much as possible.

Avoiding replication conflicts

Avoiding all conflicts by disabling hot standby

Obviously there can be no replication conflicts if there are no queries on the standby server. So if you set hot_standby = off on the standby, you don’t have to worry about this at all.

But while this method is simple and effective, it will only be feasible if the standby server is exclusively used for high availability. If you are not in that fortunate position, read on.

Avoiding lock conflicts

The obvious measure to avoid lock conflicts is not to issue statements that take an ACCESS EXCLUSIVE lock on the table. The most important statements that do so are:

  • DROP TABLE
  • TRUNCATE
  • LOCK
  • DROP INDEX
  • DROP TRIGGER
  • ALTER TABLE

But there is one kind of ACCESS EXCLUSIVE lock that you cannot avoid that way: locks from VACUUM truncation. When VACUUM has finished processing a table, and the pages at the end of the table have become empty, it tries to get a short ACCESS EXCLUSIVE lock on the table. If that succeeds, it will truncate the empty pages and immediately release the lock. While such locks don’t disrupt processing on the primary, they can cause replication conflicts on the standby.

There are two ways to avoid VACUUM truncation:

  • From PostgreSQL v12 on, you can disable the feature for individual tables with
    ALTER TABLE some_table SET (vacuum_truncate = off);
    
  • You can set old_snapshot_threshold on the primary to a value other than -1. This disables VACUUM truncation as an undocumented side effect.

Avoiding snapshot conflicts

The way to reduce such conflicts is to keep the primary from removing dead tuples that might still be visible on the standby. There are two parameters that help with this:

  • Set hot_standby_feedback on the primary to on. Then the feedback messages from standby to primary server will contain the snapshot xmin of the oldest active transaction on the standby, and the primary will not remove any tuples that this transaction still could see.
    That will get rid most of these replication conflicts, but now long running queries on the standby can lead to table bloat on the primary, which is why that setting is not enabled by default. Consider the risk carefully.
  • Set vacuum_defer_cleanup_age on the primary to a value greater than 0. Then VACUUM will not clean up dead tuples unless they are more than vacuum_defer_cleanup_age transactions old. This is less specific than hot_standby_feedback and can also lead to table bloat.

Note that while hot_standby_feedback = on will get rid of most of the snapshot replication conflicts, it will not necessarily eliminate buffer pin conflicts, since the page that the standby is using could contain some very old tuples. Moreover, I have seen snapshot conflicts in databases even when hot_standby_feedback was on, although after consulting the source I don’t understand how that could happen. Maybe a reader can enlighten me :^)

Avoiding buffer pin conflicts

There is no very good way to avoid these conflicts. Perhaps you can reduce the number of HOT updates, but that would harm performance on the primary.

Conclusion

The best way to avoid replication conflicts is to have dedicated standby servers: one for high availability and one for off-loading queries or backups. Then you can easily configure each to avoid replication conflicts.

If you cannot afford that expense, or you want to use a standby for horizontal scaling, you will have to adjust hot_standby_feedback, max_standby_streaming_delay and the vacuum_truncate storage parameter to get as few canceled queries as possible while avoiding excessive table bloat and long replication delays.