CYBERTEC Logo

Why are my PostgreSQL updates getting slower?

An unusual cause of slow updates in a PostgreSQL database
© Laurenz Albe 2022

 

Recently, a customer asked me why a bulk load into a PostgreSQL table kept slowing down as time went by. They saw that it was a trigger on the table that took longer and longer, and in that trigger, the updates were getting slower all the time. Now slow updates are a frequent problem, so I thought it would be interesting to explain what's going on here in detail.

An example that demonstrates updates getting slower in PostgreSQL

Object definitions

Our example scenario is a table that receives bulk loads. Because we frequently need to query the sum of the values in the table, we keep a summary table that is maintained by a trigger (see this article for an explanation):

We are using a deferred constraint trigger here so that the update takes place at the very end of the transaction. Then the row in the sum table is not locked any longer than absolutely necessary, and we get the best possible concurrency. Also, we are careful not to place any index on sum (which would be tempting to enforce that there is only a single row) so that we can get fast HOT updates.

How we perform bulk loading

We will bulk load the table with a script like this:

Since this script mixes a COPY statement with data, we need to execute it with psql:

On my machine, the above script takes 60 seconds, which is a terribly long time to load 100000 rows. If I drop the trigger on item, the same script runs in less than 70 milliseconds.

How to prove that the problem is updates getting slower?

The problem clearly is the trigger, and the trigger function contains only a single UPDATE statement. Why does that statement take so long? We can use the auto_explain extension to see details of the execution of the individual UPDATE statements. For that, we add the following lines to postgresql.conf:

We need to restart PostgreSQL for the changed shared_preload_libraries setting to take effect. Note that these are not settings I would want to use on a busy server. For one, auto_explain.log_analyze = on will log all statements, which can be more than your disk can handle. Also, auto_explain.log_analyze = on will slow down all statements considerably, since PostgreSQL then measures statement execution in detail.

However, if we start the experiment over, we get each of the 100000 UPDATE statements from the trigger logged, and the result is enlightening:

The first UPDATE:

Then execution gets slower and slower, until the 100000th UPDATE reads:

Explaining the cause of updates getting slower

Row versions keep accumulating

The time is clearly spent in a sequential scan of sum, which must be growing to a size of 443 8kB-pages. To understand that, remember that an UPDATE in PostgreSQL does not overwrite the existing row, but adds a new row version to the table. This is similar (but not exactly the same) as a DELETE followed by an INSERT. So the 100000 UPDATEs actually add 100000 new row versions to the table.

VACUUM cannot clean up

It is the job of VACUUM to remove these “dead row versions” produced by UPDATE statements. Normally, the autovacuum process takes care of this automatically. So why doesn't it work in our case? The reason is that VACUUM can only remove row versions that are older than any currently active transaction. But our COPY statement is a single transaction, which normally is a good thing, since it makes bulk loading fast. So autovacuum can only clean up the dead row versions in sum after the COPY statement is done, and by that time the sum table has already been bloated out of shape.

HOT pruning is no help either

But wait a minute: we carefully designed our setup to get HOT updates so that PostgreSQL does not have to run VACUUM to clean up dead row versions. Every SQL statement can perform HOT pruning to get rid of dead row versions cheaply. However, HOT pruning is subject to the same limitation as VACUUM and also cannot clean up row versions from currently open transactions.

Indexes cannot help

After a first glance at the slow execution plan, you may think that the reason for the slowdown is that PostgreSQL performs a sequential scan, and an index would speed up processing. That would not work with the sum table in its current definition, but we could certainly add a primary key column. However, that would harm performance rather than improve it: PostgreSQL would still have to follow the HOT chain within each block to the end. Also, every UPDATE that is not HOT (because there is no more room in the current page) would create a new (identical) primary key index entry, and scanning through these entries would make us visit every page of the table. So, if anything, an index would make the UPDATE still slower.

Examining the table with pageinspect

Reasoning is one thing, but a look at the actual table data is more convincing. Normally, we cannot see dead row versions, but with the functions from the pageinspect extension a superuser can see all the data in a table. If we are quick enough and run the following query after COPY is done, but before autovacuum has cleaned up the table, we can see what is in the first page of sum:

The result will be as follows:

Each entry is the replacement for the previous one, all entries are dead (xmax is set), and all UPDATEs except the last one (which created a row in the next page) were HOT.

A remedy for updates getting slower

Now that we understand the problem, the solution is clear: don't repeatedly update the same row in a single transaction. The best way to avoid that will depend on the details of the original workload. In our case, one possible solution would be a statement level trigger using a transition table:

With this trigger, only a single UPDATE on sum will be performed for each COPY statement. With that, the bulk load takes only slightly longer than 90 milliseconds on my machine!

Conclusion

The most common reason for updates getting slower over time is if you repeatedly update the same rows in a single transaction. The simple solution is to avoid such repeated updates: after all, only the last update of a row in a transaction will be visible after the transaction is done.

5 2 votes
Article Rating
Subscribe
Notify of
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Jan Heckman
Jan Heckman
1 year ago

With great respect for your contributions, I think this post of yours is useful, but sort of covers up the situation where there are no repeated updates (of the same record) and the time spent is nevertheless way too large. On PG14, updating a large table joined on a primary proved prohibitively slow and I recommend, as a work-around, to always create a new table and discard the old table, rather than updating the (old) table. This has cost me days for absolutely no good reasons and the work-around may well get in the way, too.
Being a C(++) programmer, I would enormously welcome the option to have a real update without new inserts, esp with tables of about 10000000 rows, which is fairly usual for me - I simply cannot have to wait for this for 3 or more hours, esp when noticing the database remaining passive endlessly (both cpu, disk, and updates) (pgadmin).
So I hope you will use your influence to put a stop to this over-protection of badly conceived update queries. I way prefer to suffer the consequences.

laurenz
laurenz
1 year ago
Reply to  Jan Heckman

I an sorry to hear of your problem, but as you said, it must be a different problem than the one I am talking about.
Your problem would require closer investigation. However, I don't think that, based on a single insufficiently investigated experience, you should claim that PostgreSQL v14 is generally slower than older versions when it comes to UPDATEs. I think I would have heard about this if it were a wide-spread problem.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    2
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram