© Laurenz Albe 2020
Streaming replication in PostgreSQL is a well-established master-slave replication technique. It is simple to set up, stable and performs well. So many people are surprised when they learn about replication conflicts — after all, the standby server is read-only.
This article describes replication conflicts and tells you how to deal with them.
What is a replication conflict?
A replication conflict occurs whenever the recovery process cannot apply WAL information from the primary server to the standby, because the change would disrupt query processing there. These conflicts cannot happen with queries on the primary server, but they happen on the streaming replication standby server because the primary server has limited knowledge about what is going on on the standby.
There are several kinds of replication conflicts:
Snapshot replication conflicts
This is the most frequent replication conflict.
Snapshot conflicts can occur if
VACUUM processes a table and removes dead tuples. This removal is replayed on the standby. Now a query on the standby may have started before
VACUUM on the primary (it has an older snapshot), so it can still see the tuples that should be removed. This constitutes a snapshot conflict.
Lock replication conflicts
The queries on a standby server take an
ACCESS SHARE lock on the tables they are reading. So any
ACCESS EXCLUSIVE lock on the primary (which conflicts with
ACCESS SHARE) must be replayed on the standby to keep incompatible operations on the table from happening. PostgreSQL takes such a lock for operations that conflict with
SELECT, for example
TRUNCATE and many
ALTER TABLE statements. If the standby should replay such a lock on a table that a query uses, we have a lock conflict.
Buffer pin replication conflicts
One way to reduce the need for
VACUUM is to use HOT updates. Then any query on the primary that accesses a page with dead heap-only tuples and can get an exclusive lock on it will prune the HOT chains. PostgreSQL always holds such page locks for a short time, so there is no conflict with processing on the primary. There are other causes for page locks, but this is perhaps the most frequent one.
When the standby server should replay such an exclusive page lock and a query is using the page (“has the page pinned” in PostgreSQL jargon), you get a buffer pin replication conflict. Pages can be pinned for a while, for example during a sequential scan of a table on the outer side of a nested loop join.
HOT chain pruning can of course also lead to snapshot replication conflicts.
Rare kinds of replication conflicts
The following types of conflict are rare and will not bother you:
- Deadlock replication conflicts: A query on the standby blocks while using the shared buffer that is needed to replay WAL from the primary. PostgreSQL will cancel such a query immediately.
- Tablespace replication conflicts: A tablespace is in
temp_tablespaceson the standby server, and a query has temporary files there. When a
DROP TABLESPACEoccurs from the primary, we get a conflict. PostgreSQL cancels all queries on the standby in that case.
- Database replication conflicts: Replication of
DROP DATABASEcauses a conflict if the standby has active sessions on the database. PostgreSQL terminates all connections to the database on the standby in that case.
Monitoring replication conflicts
The statistics view
pg_stat_database_conflicts contains a detailed account of all replication conflicts that happened since the last statistics reset. You have to look at that view on the standby server, not the primary, because that is where replication conflicts occur.
Note that this view does not show all replication conflicts that occurred, it only shows the ones that led to a canceled query on the standby (see the next section).
How does the standby server resolve replication conflicts?
max_standby_streaming_delay determines what happens when WAL replay encounters a replication conflict (there is a similar parameter
max_standby_archive_delay that does the same thing for archive recovery). PostgreSQL suspends replay of the WAL information for at most
max_standby_streaming_delay milliseconds. If the conflicting query is still running after that time, PostgreSQL cancels it with an error message like:
ERROR: canceling statement due to conflict with recovery DETAIL: User query might have needed to see row versions that must be removed.
The detail message shows that this was from a snapshot replication conflict.
max_standby_streaming_delay has a default value of 30 seconds, so queries on the standby get a “grace time” of half a minute to finish before they get canceled if they cause a replication conflict. This is a middle ground between the extreme settings of 0 (PostgreSQL cancels queries immediately, no delay for replay) and the special value -1 (PostgreSQL never cancels queries, arbitrarily long replay delay).
To understand how to best configure PostgreSQL to deal with replication conflicts, we have to take a look at the use cases for streaming replication.
Use cases for streaming replication
Streaming replication is the basis for most high availability solutions. Together with a software like Patroni that manages failover, it provides a robust shared-nothing architecture to build a fault tolerant system.
Clearly, the main objective with high availability is to have as little replication delay as possible. This way, standby promotion is fast and little information is lost during the failover. In this case, you want to set
Note that you need not lose more data during a failover if the standby is behind with applying WAL – the WAL information is still streamed to the standby and written to
pg_wal. But it will take longer for the standby to catch up, so the failover time increases.
Off-loading big queries
Big queries for reporting or data analysis can generate load that may overload the productive system. The best solution for that is a data warehouse specifically designed for such queries. But often a standby server can serve as a “poor man’s data warehouse”.
Another example for off-loading are database backups: backing up a standby server puts no stress on the primary system.
The main objective in this case is to let the queries (or the backup) complete without interruption.
In this case, you want to set
max_standby_streaming_delay to a value greater than the lifetime of the longest query, and it is no problem if there is a delay replaying WAL.
You can use standby servers to distribute database workload across several machines. In practice, there are several limitations to the usefulness of this approach:
- all writing statements have to go to the primary server, so only reading can be scaled
- the application has to be able to direct queries and data modifications to different databases
- the application has to cope with the problem that data modifications may not immediately be visible to queries (synchronous replication avoids this, but the performance impact on writing transactions is staggering)
Another difficulty you face is that there is no good setting for
max_standby_streaming_delay: a low value will make queries on the standby fail, while a high value will cause queries on the standby to see stale data.
How to deal with conflicting requirements
Ideally, a standby server serves only a single purpose, so that you can adjust
hot_standby accordingly. So the king’s way it to have dedicated standby servers for failover and off-loading work.
But sometimes you cannot afford a second standby server, or you may be stuck in a situation like in the “horizontal scaling” scenario above. Then your only option is to reduce the number of replication conflicts as much as possible.
Avoiding replication conflicts
Avoiding all conflicts by disabling hot standby
Obviously there can be no replication conflicts if there are no queries on the standby server. So if you set
hot_standby = off on the standby, you don’t have to worry about this at all.
But while this method is simple and effective, it will only be feasible if the standby server is exclusively used for high availability. If you are not in that fortunate position, read on.
Avoiding lock conflicts
The obvious measure to avoid lock conflicts is not to issue statements that take an
ACCESS EXCLUSIVE lock on the table. The most important statements that do so are:
- DROP TABLE
- DROP INDEX
- DROP TRIGGER
- ALTER TABLE
But there is one kind of
ACCESS EXCLUSIVE lock that you cannot avoid that way: locks from
VACUUM truncation. When
VACUUM has finished processing a table, and the pages at the end of the table have become empty, it tries to get a short
ACCESS EXCLUSIVE lock on the table. If that succeeds, it will truncate the empty pages and immediately release the lock. While such locks don’t disrupt processing on the primary, they can cause replication conflicts on the standby.
There are two ways to avoid
- From PostgreSQL v12 on, you can disable the feature for individual tables with
ALTER TABLE some_table SET (vacuum_truncate = off);
- You can set
old_snapshot_thresholdon the primary to a value other than -1. This disables
VACUUMtruncation as an undocumented side effect.
Avoiding snapshot conflicts
The way to reduce such conflicts is to keep the primary from removing dead tuples that might still be visible on the standby. There are two parameters that help with this:
hot_standby_feedbackon the primary to
on. Then the feedback messages from standby to primary server will contain the snapshot
xminof the oldest active transaction on the standby, and the primary will not remove any tuples that this transaction still could see.
That will get rid most of these replication conflicts, but now long running queries on the standby can lead to table bloat on the primary, which is why that setting is not enabled by default. Consider the risk carefully.
vacuum_defer_cleanup_ageon the primary to a value greater than 0. Then
VACUUMwill not clean up dead tuples unless they are more than
vacuum_defer_cleanup_agetransactions old. This is less specific than
hot_standby_feedbackand can also lead to table bloat.
Note that while
hot_standby_feedback = on will get rid of most of the snapshot replication conflicts, it will not necessarily eliminate buffer pin conflicts, since the page that the standby is using could contain some very old tuples. Moreover, I have seen snapshot conflicts in databases even when
on, although after consulting the source I don’t understand how that could happen. Maybe a reader can enlighten me :^)
Avoiding buffer pin conflicts
There is no very good way to avoid these conflicts. Perhaps you can reduce the number of HOT updates, but that would harm performance on the primary.
The best way to avoid replication conflicts is to have dedicated standby servers: one for high availability and one for off-loading queries or backups. Then you can easily configure each to avoid replication conflicts.
If you cannot afford that expense, or you want to use a standby for horizontal scaling, you will have to adjust
max_standby_streaming_delay and the
vacuum_truncate storage parameter to get as few canceled queries as possible while avoiding excessive table bloat and long replication delays.