CYBERTEC PostgreSQL Logo

What hot_standby_feedback in PostgreSQL really does

08.2018 / Category: / Tags:

Many of you out there using PostgreSQL streaming replication might wonder what this hot_standby_feedback parameter in postgresql.conf really does. Support customers keep asking this question, so it might be useful to share this knowledge with a broader audience of PostgreSQL users out there.

What VACUUM does in PostgreSQL

VACUUM is an essential command in PostgreSQL its goal is to clean out dead rows, which are not needed by anyone anymore. The idea is to reuse space inside a table later as new data comes in. The important thing is: The purpose of VACUUM is to reuse space inside a table - this does not necessarily imply that a relation will shrink. Also: Keep in mind that VACUUM can only clean out dead rows, if they are not need anymore by some other transaction running on your PostgreSQL server.

Consider the following image:

hot_standby_feedback VACUUM

As you can see we have two connections here. The first connection on the left side is running a lengthy SELECT statement. Now keep in mind: An SQL statement will basically “freeze” its view of the data. Within an SQL statement the world does not “change” - the query will always see the same set of data regardless of changes made concurrently. That is really important to understand.

Let us take a look at the second transaction. It will delete some data and commit. The question that naturally arises is: When can PostgreSQL really delete this row from disk? DELETE itself cannot really clean the row from disk because there might still be a ROLLBACK instead of a COMMIT. In other words a rows must not be deleted on DELETE. PostgreSQL can only mark it as dead for the current transaction. As you can see other transactions might still be able to see those deleted rows.
However, even COMMIT does not have the right to really clean out the row. Remember: The transaction on the left side can still see the dead row because the SELECT statement does not change its snapshot while it is running. COMMIT is therefore too early to clean out the row.

This is when VACUUM enters the scenario. VACUUM is here to clean rows, which cannot be seen by any other transaction anymore. In my image there are two VACUUM operations going on. The first one cannot clean the dead row yet because it is still seen by the left transaction.
However, the second VACUUM can clean this row because it is not used by the reading transaction anymore.

On a single server the situation is therefore pretty clear. VACUUM can clean out rows, which are not seen anymore.

Replication conflicts in PostgreSQL

What happens in a primary/ standby scenario? The situation is slightly more complicated because how can the primary know that some strange transaction is going on one of the standbys?

Here is an image showing a typical scenario:

PostgreSQL VACUUM and table bloat
Prevent table bloat with VACUUM in PostgreSQL

In this case a SELECT statement on the replica is running for a couple of minutes. In the meantime, a change is made on the primary (UPDATE, DELETE, etc.). This is still no problem. Remember: DELETE does not really delete the row - it simply marks it as dead, but it is still visible to other transactions, which are allowed to see the “dead” row. The situation becomes critical if a VACUUM on the primary is allowed to really delete row from disk. VACUUM is allowed to do that because it has no idea that somebody on a standby is still going to need the row. The result is a replication conflict. By default a replication conflict is resolved after 30 seconds:

If you have ever seen a message like that - it's exactly the kind of problem we are talking about here.

hot_standby_feedback can prevent replication conflicts

To solve this kind of problem, we can teach the standby to periodically inform the primary about the oldest transaction running on the standby. If the primary knows about old transactions on the standby, it can make VACUUM keep rows until the standbys are done.
This is exactly what hot_standby_feedback does. It prevents rows from being deleted too early from a standby's point of view. The idea is to inform the primary about the oldest transaction ID on the standby so that VACUUM can delay its cleanup action for certain rows.

The benefit is obvious: hot_standby_feedback will dramatically reduce the number of replication conflicts. However, there are also downsides: Remember, VACUUM will delay its cleanup operations. If the standby never terminates a query, it can lead to table bloat on the primary, which can be dangerous in the long run.

5 1 vote
Article Rating
Subscribe
Notify of
guest
32 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
choose
choose
1 year ago

Thanks for the great explanation.
You told me that the replication conflicts were caused by Vacuum.
I'm having the same problem, and when I checked on the master when the problem occurred, last_autovacuum was in the distant past and the manual vacuum time is null...
How can I understand it?

laurenz
laurenz
1 year ago
Reply to  choose

Perhaps after reading this article it will be clearer.

Cihan Baran
Cihan Baran
2 years ago

Hi,
Is hot_standby_feedback parameter disable on all replicas a good choose or not?
Thanks a lot

laurenz
laurenz
2 years ago
Reply to  Cihan Baran

It is a safe choice, since now queries on the standby cannot affect the primary.
If it is a good choice depends on your requirements...

Cihan Baran
Cihan Baran
2 years ago
Reply to  laurenz

Thanks a lot I will try and the result will be write here by me.

automatic vacuum of table "table_name": index scans: 0
pages: 0 removed, 163281 remain, 0 skipped due to pins, 154596 skipped frozen
tuples: 0 removed, 14631559 remain, 198814 are dead but not yet removable, oldest xmin: 210406736
buffer usage: 16664 hits, 90 misses, 0 dirtied
avg read rate: 4.366 MB/s, avg write rate: 0.000 MB/s
system usage: CPU: user: 0.08 s, system: 0.00 s, elapsed: 0.16 s

A vacuum problem like the above is solved but I'm not sure. I will monitor the system after that it may be query conflicted with the replica server. I don't know the system how reacts under the heavy load.

Regards

crackerplace
crackerplace
2 years ago

Assume I have users table in primary and there are some updates for this table.
And in secondary I am running select *from users where user_id = 123 and there are no updates or deletes for this user in primary ?
So will snapshot conflicts still occur as I am seeing in my case.

laurenz
laurenz
2 years ago
Reply to  crackerplace

Yes, there will still be conflicts, because the database cannot know in advance that you won't access any removed data. For example, you could run a second query in the same (repeatable read) transaction.

In a manner of speaking, there can be false positive replication conflicts.

You my want to read my article.

crackerplace
crackerplace
2 years ago
Reply to  laurenz

Thanks for the response.Yeah I read it.
For the above example of users table and for the sake of this discussion, the data for user with id 123 in primary is not modified and now I am selecting the same users data from in replica.
So lets assume the select on replica sees the database snapshot as s1.
So if I see a replica conflict, does it happen when any modification(delete, update (so inserts don't affect ?)) happens to just users table
after s1 or any modification happens to whole db after s1 in primary ?

laurenz
laurenz
2 years ago
Reply to  crackerplace

Perhaps there is a misunderstanding.
Concurrent updates or deletes will never directly cause a replication conflict.
Those conflicts are caused by old row versions being removed by VACUUM or HOT chain pruning (which happen as a consequence of updates or deletes, but at some later time).

crackerplace
crackerplace
2 years ago
Reply to  laurenz

Sure, I understand the dead tuples cleanup on slave when in conflict with running query causes the conflict error.
Sorry for prolonging this discussion.I will break my question into 2.

user table and account table have no relation.

a)A vacuum runs on master for removing dead tuples in user table due to some activity on the table and which is streamed to replica.Somewhere during this time I am querying for a record with id 123 from user table in replica(also assume there are no changes to this specific record in master as such).In this case there is a chance for replication conflict as the replica query is on the same table which had some dead row cleanup.

a)Vacuum runs master for removing dead tuples in account table due to some activity on the table and which is streamed to replica.Somewhere during this time I am querying for a record with id 123 from user table in replica(also assume there are no changes to this record in master as such).In this case is there a chance for replication conflict as both the tables are different i.e the one which had dead tuples removed and one which we queried.

Thanks.

laurenz
laurenz
2 years ago
Reply to  crackerplace

Thanks for the clarification.
Your case a) can result in a replication conflict, while case b) cannot.

crackerplace
crackerplace
2 years ago
Reply to  laurenz

Thanks for the clarification.
Surprising that select *from user where id = 123 has a where clause which selects only one row but just because the user table has some other row updates or deletes, it can cause a replication conflict on the replica query.
Thanks once again.

laurenz
laurenz
2 years ago
Reply to  crackerplace

Actually, I take back what I said above. Case b) can cause conflicts as well. The tables are not considered, only the transaction numbers and the query snapshot.

crackerplace
crackerplace
2 years ago
Reply to  crackerplace

Thanks for responding

crackerplace
crackerplace
2 years ago
Reply to  laurenz

True, I get that.Sorry for prolonging this discussion.
The dead tuples when pruned in master and which streamed to replica cause the replication conflicts(confl_snapshot).

So I will break down my question into 2.

a) If I am querying user table for id 123 on replica and there is some vacuum running for removing dead tuples in a non related table in master (assume audit table) and this event(wal) is streamed to the replica.So is there a possibility of a replication conflict here considering the select query on replica is for user table and the dead tuples were removed from audit table in master(which got streamed to replica).

b) If I am querying user table for id 123 on replica and there is vacuum running for removing dead tuples in the same user table in master and this event(wal) is streamed to replica,So is there a possibility of a conflict here.

I am for sure b) can happen, but will a) happen is something am not sure.

Thanks for your time.

Talal Majali
Talal Majali
3 years ago

Thank you for the great article, one thing that i cannot understand, even without vacuum running on the primary database you still encounter replication lag in the replica, any explanation for this?

laurenz
laurenz
3 years ago
Reply to  Talal Majali

There are other sources of replication conflicts, see this article.

JC
JC
4 years ago

Great examples. The two images needs to still be swapped though!

Raveesh Sharma
Raveesh Sharma
4 years ago

Thank you .. Great explanation

Pablo Luna
Pablo Luna
5 years ago

Awesome explanation. Thanks a lot!

Dharshan
5 years ago

Thanks for the great explanation. I think the two images are interchanged though 🙂

Daniel Gallo
Daniel Gallo
5 years ago

Dear, I have a query to make you
As much as you enable hot_standby_feedback = on, queries continue to be canceled, in addition to enabling this parameter you have to modify any of these?
#max_standby_archive_delay = 30s
#max_standby_streaming_delay = 30s
#wal_receiver_status_interval = 10s
The idea that we have is to have a primary, a standby for high availability and a third standby base in cascade from the replica to be able to execute reports and not interfere with the other two bases

laurenz
laurenz
5 years ago
Reply to  Daniel Gallo

There are replication conflicts other than by tuples removed by VACUUM on the primary. These are caused by ACCESS EXCLUSIVE locks.

Such locks are taken by commands like TRUNCATE, DROP TABLE, many ALTER TABLE variants and CREATE INDEX, to name a few.

Another notorious cause of such locks and the resulting query cancelation is the truncation of empty blocks at the end of a table during VACUUM.

You can set "max_standby_streaming_delay" to -1 to avoid the problem, but then the application of changes streamed from the primary is blocked until your transaction completes.

Andres Fernando Leon
Andres Fernando Leon
5 years ago

so how do i enable disable this in an AWS RDS instance that i can only access through psql?

ebpowell
ebpowell
6 years ago

Great explanation! Really fills in some gaps in the docs...
Thank you.

reddy manjunath
reddy manjunath
1 year ago

What is the role of hot_stand_by feedback parameter in Logical replication...?
Are there any such similar conflicts in logical replication also?

laurenz
laurenz
1 year ago

hot_standby_feedback plays no role in logical replication.
Replication conflicts during logical replication manifest when the SQL statements performed by the logical replication worker on the subscriber fail. Replication then becomes stuck, and depending on your configuration retries until it succeeds or disables the subscription.

laurenz
laurenz
1 year ago

No, that does not cause a conflict. Read up about PostgreSQL's implementation of multi-version concurrency control.

reddy manjunath
reddy manjunath
1 year ago
Reply to  laurenz

i think your saying that...even update/delete logical message from publisher side will not effect the read query in subscriber side because subscriber will not process the vacuum till the read query is completed...right?

reddy manjunath
reddy manjunath
1 year ago

Thank you for your response...but consider...a situation below

If we are querying a table to read data from subscriber side....which takes almost 60 secs and in mean time if update/delete is made on the same table in publisher side....then data in subscriber table also changes due to logical replication....right?....therefore it causes conflict......how does logical replication handles this...?

thanks in advance.

laurenz
laurenz
1 year ago

It is connected to that, but there is more to it. Readers and writers never block each other in PostgreSQL. I cannot describe PostgreSQL MVCC implementation in a brief answer.

reddy manjunath
reddy manjunath
1 year ago

thank you

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram