rocket approaches Mars - an unusual cause of slow updates in a PostgreSQL database
© Laurenz Albe 2022

 

Recently, a customer asked me why a bulk load into a PostgreSQL table kept slowing down as time went by. They saw that it was a trigger on the table that took longer and longer, and in that trigger, the updates were getting slower all the time. Now slow updates are a frequent problem, so I thought it would be interesting to explain what’s going on here in detail.

An example that demonstrates updates getting slower in PostgreSQL

Object definitions

Our example scenario is a table that receives bulk loads. Because we frequently need to query the sum of the values in the table, we keep a summary table that is maintained by a trigger (see this article for an explanation):

CREATE TABLE item (
   id bigint GENERATED ALWAYS AS IDENTITY,
   value bigint NOT NULL
);

CREATE TABLE sum (
   total bigint NOT NULL
);

INSERT INTO sum VALUES (0);

CREATE FUNCTION add_to_sum() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   UPDATE sum SET total = total + NEW.value;
   RETURN NEW;
END;$$;

CREATE CONSTRAINT TRIGGER add_to_sum
   AFTER INSERT ON item
   DEFERRABLE INITIALLY DEFERRED
   FOR EACH ROW
   EXECUTE FUNCTION add_to_sum();

We are using a deferred constraint trigger here so that the update takes place at the very end of the transaction. Then the row in the sum table is not locked any longer than absolutely necessary, and we get the best possible concurrency. Also, we are careful not to place any index on sum (which would be tempting to enforce that there is only a single row) so that we can get fast HOT updates.

How we perform bulk loading

We will bulk load the table with a script like this:

\timing on

COPY item (value) FROM STDIN;
11638
422
17165
[a total of 100000 random numbers]
\.

Since this script mixes a COPY statement with data, we need to execute it with psql:

psql -d dbname -f script.sql

On my machine, the above script takes 60 seconds, which is a terribly long time to load 100000 rows. If I drop the trigger on item, the same script runs in less than 70 milliseconds.

How to prove that the problem is updates getting slower?

The problem clearly is the trigger, and the trigger function contains only a single UPDATE statement. Why does that statement take so long? We can use the auto_explain extension to see details of the execution of the individual UPDATE statements. For that, we add the following lines to postgresql.conf:

shared_preload_libraries = 'auto_explain'
auto_explain.log_min_duration = 0
auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_wal = on
auto_explain.log_nested_statements = on

We need to restart PostgreSQL for the changed shared_preload_libraries setting to take effect. Note that these are not settings I would want to use on a busy server. For one, auto_explain.log_analyze = on will log all statements, which can be more than your disk can handle. Also, auto_explain.log_analyze = on will slow down all statements considerably, since PostgreSQL then measures statement execution in detail.

However, if we start the experiment over, we get each of the 100000 UPDATE statements from the trigger logged, and the result is enlightening:

The first UPDATE:

Query Text: UPDATE sum SET total = total + NEW.value
Update on sum  (cost=0.00..38.25 rows=0 width=0) (actual time=0.016..0.017 rows=0 loops=1)
  Buffers: shared hit=3
  WAL: records=1 bytes=74
  ->  Seq Scan on sum  (cost=0.00..38.25 rows=2260 width=14) (actual time=0.005..0.005 rows=1 loops=1)
        Buffers: shared hit=1

Then execution gets slower and slower, until the 100000th UPDATE reads:

Query Text: UPDATE sum SET total = total + NEW.value
Update on sum  (cost=0.00..38.25 rows=0 width=0) (actual time=1.641..1.641 rows=0 loops=1)
  Buffers: shared hit=445
  WAL: records=1 bytes=74
  ->  Seq Scan on sum  (cost=0.00..38.25 rows=2260 width=14) (actual time=1.637..1.637 rows=1 loops=1)
        Buffers: shared hit=443

Explaining the cause of updates getting slower

Row versions keep accumulating

The time is clearly spent in a sequential scan of sum, which must be growing to a size of 443 8kB-pages. To understand that, remember that an UPDATE in PostgreSQL does not overwrite the existing row, but adds a new row version to the table. This is similar (but not exactly the same) as a DELETE followed by an INSERT. So the 100000 UPDATEs actually add 100000 new row versions to the table.

VACUUM cannot clean up

It is the job of VACUUM to remove these “dead row versions” produced by UPDATE statements. Normally, the autovacuum process takes care of this automatically. So why doesn’t it work in our case? The reason is that VACUUM can only remove row versions that are older than any currently active transaction. But our COPY statement is a single transaction, which normally is a good thing, since it makes bulk loading fast. So autovacuum can only clean up the dead row versions in sum after the COPY statement is done, and by that time the sum table has already been bloated out of shape.

HOT pruning is no help either

But wait a minute: we carefully designed our setup to get HOT updates so that PostgreSQL does not have to run VACUUM to clean up dead row versions. Every SQL statement can perform HOT pruning to get rid of dead row versions cheaply. However, HOT pruning is subject to the same limitation as VACUUM and also cannot clean up row versions from currently open transactions.

Indexes cannot help

After a first glance at the slow execution plan, you may think that the reason for the slowdown is that PostgreSQL performs a sequential scan, and an index would speed up processing. That would not work with the sum table in its current definition, but we could certainly add a primary key column. However, that would harm performance rather than improve it: PostgreSQL would still have to follow the HOT chain within each block to the end. Also, every UPDATE that is not HOT (because there is no more room in the current page) would create a new (identical) primary key index entry, and scanning through these entries would make us visit every page of the table. So, if anything, an index would make the UPDATE still slower.

Examining the table with pageinspect

Reasoning is one thing, but a look at the actual table data is more convincing. Normally, we cannot see dead row versions, but with the functions from the pageinspect extension a superuser can see all the data in a table. If we are quick enough and run the following query after COPY is done, but before autovacuum has cleaned up the table, we can see what is in the first page of sum:

SELECT format('(0,%s)', lp) AS tid,
       t_ctid AS replacement,
       t_xmax <> 0 AS dead,
       t_infomask2 & 16384 <> 0 AS hot_updated,
       t_attrs[1] AS total
FROM heap_page_item_attrs(get_raw_page('sum', 0), 'sum');

The result will be as follows:

   tid   │ replacement │ dead │ hot_updated │       total        
═════════╪═════════════╪══════╪═════════════╪════════════════════
 (0,1)   │ (0,2)       │ t    │ t           │ \x0000000000000000
 (0,2)   │ (0,3)       │ t    │ t           │ \x214e000000000000
 (0,3)   │ (0,4)       │ t    │ t           │ \xe766000000000000
 (0,4)   │ (0,5)       │ t    │ t           │ \xeaa0000000000000
 (0,5)   │ (0,6)       │ t    │ t           │ \x9113010000000000
 (0,6)   │ (0,7)       │ t    │ t           │ \x5c23010000000000
 ...
 (0,225) │ (0,226)     │ t    │ t           │ \x1356380000000000
 (0,226) │ (1,1)       │ t    │ f           │ \xab99380000000000
(226 rows)

Each entry is the replacement for the previous one, all entries are dead (xmax is set), and all UPDATEs except the last one (which created a row in the next page) were HOT.

A remedy for updates getting slower

Now that we understand the problem, the solution is clear: don’t repeatedly update the same row in a single transaction. The best way to avoid that will depend on the details of the original workload. In our case, one possible solution would be a statement level trigger using a transition table:

DROP TRIGGER add_to_sum ON item;

CREATE OR REPLACE FUNCTION add_to_sum() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   UPDATE sum SET total = total +
      (SELECT sum(value) FROM new_rows);
   RETURN NULL;
END;$$;

CREATE TRIGGER add_to_sum
   AFTER INSERT ON item
   REFERENCING NEW TABLE AS new_rows
   FOR EACH STATEMENT
   EXECUTE FUNCTION add_to_sum();

With this trigger, only a single UPDATE on sum will be performed for each COPY statement. With that, the bulk load takes only slightly longer than 90 milliseconds on my machine!

Conclusion

The most common reason for updates getting slower over time is if you repeatedly update the same rows in a single transaction. The simple solution is to avoid such repeated updates: after all, only the last update of a row in a transaction will be visible after the transaction is done.