Table of Contents
Recently, a customer asked me why a bulk load into a PostgreSQL table kept slowing down as time went by. They saw that it was a trigger on the table that took longer and longer, and in that trigger, the updates were getting slower all the time. Now slow updates are a frequent problem, so I thought it would be interesting to explain what's going on here in detail.
Our example scenario is a table that receives bulk loads. Because we frequently need to query the sum of the values in the table, we keep a summary table that is maintained by a trigger (see this article for an explanation):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
CREATE TABLE item ( id bigint GENERATED ALWAYS AS IDENTITY, value bigint NOT NULL ); CREATE TABLE sum ( total bigint NOT NULL ); INSERT INTO sum VALUES (0); CREATE FUNCTION add_to_sum() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN UPDATE sum SET total = total + NEW.value; RETURN NEW; END;$$; CREATE CONSTRAINT TRIGGER add_to_sum AFTER INSERT ON item DEFERRABLE INITIALLY DEFERRED FOR EACH ROW EXECUTE FUNCTION add_to_sum(); |
We are using a deferred constraint trigger here so that the update takes place at the very end of the transaction. Then the row in the sum
table is not locked any longer than absolutely necessary, and we get the best possible concurrency. Also, we are careful not to place any index on sum
(which would be tempting to enforce that there is only a single row) so that we can get fast HOT updates.
We will bulk load the table with a script like this:
1 2 3 4 5 6 7 8 |
timing on COPY item (value) FROM STDIN; 11638 422 17165 [a total of 100000 random numbers] . |
Since this script mixes a COPY
statement with data, we need to execute it with psql
:
1 |
psql -d dbname -f script.sql |
On my machine, the above script takes 60 seconds, which is a terribly long time to load 100000 rows. If I drop the trigger on item
, the same script runs in less than 70 milliseconds.
The problem clearly is the trigger, and the trigger function contains only a single UPDATE
statement. Why does that statement take so long? We can use the auto_explain extension to see details of the execution of the individual UPDATE
statements. For that, we add the following lines to postgresql.conf
:
1 2 3 4 5 6 |
shared_preload_libraries = 'auto_explain' auto_explain.log_min_duration = 0 auto_explain.log_analyze = on auto_explain.log_buffers = on auto_explain.log_wal = on auto_explain.log_nested_statements = on |
We need to restart PostgreSQL for the changed shared_preload_libraries
setting to take effect. Note that these are not settings I would want to use on a busy server. For one, auto_explain.log_analyze = on
will log all statements, which can be more than your disk can handle. Also, auto_explain.log_analyze = on
will slow down all statements considerably, since PostgreSQL then measures statement execution in detail.
However, if we start the experiment over, we get each of the 100000 UPDATE
statements from the trigger logged, and the result is enlightening:
The first UPDATE
:
1 2 3 4 5 6 |
Query Text: UPDATE sum SET total = total + NEW.value Update on sum (cost=0.00..38.25 rows=0 width=0) (actual time=0.016..0.017 rows=0 loops=1) Buffers: shared hit=3 WAL: records=1 bytes=74 -> Seq Scan on sum (cost=0.00..38.25 rows=2260 width=14) (actual time=0.005..0.005 rows=1 loops=1) Buffers: shared hit=1 |
Then execution gets slower and slower, until the 100000th UPDATE
reads:
1 2 3 4 5 6 |
Query Text: UPDATE sum SET total = total + NEW.value Update on sum (cost=0.00..38.25 rows=0 width=0) (actual time=1.641..1.641 rows=0 loops=1) Buffers: shared hit=445 WAL: records=1 bytes=74 -> Seq Scan on sum (cost=0.00..38.25 rows=2260 width=14) (actual time=1.637..1.637 rows=1 loops=1) Buffers: shared hit=443 |
The time is clearly spent in a sequential scan of sum
, which must be growing to a size of 443 8kB-pages. To understand that, remember that an UPDATE
in PostgreSQL does not overwrite the existing row, but adds a new row version to the table. This is similar (but not exactly the same) as a DELETE
followed by an INSERT
. So the 100000 UPDATE
s actually add 100000 new row versions to the table.
VACUUM
cannot clean upIt is the job of VACUUM
to remove these “dead row versions” produced by UPDATE
statements. Normally, the autovacuum process takes care of this automatically. So why doesn't it work in our case? The reason is that VACUUM
can only remove row versions that are older than any currently active transaction. But our COPY
statement is a single transaction, which normally is a good thing, since it makes bulk loading fast. So autovacuum can only clean up the dead row versions in sum
after the COPY
statement is done, and by that time the sum
table has already been bloated out of shape.
But wait a minute: we carefully designed our setup to get HOT updates so that PostgreSQL does not have to run VACUUM
to clean up dead row versions. Every SQL statement can perform HOT pruning to get rid of dead row versions cheaply. However, HOT pruning is subject to the same limitation as VACUUM
and also cannot clean up row versions from currently open transactions.
After a first glance at the slow execution plan, you may think that the reason for the slowdown is that PostgreSQL performs a sequential scan, and an index would speed up processing. That would not work with the sum
table in its current definition, but we could certainly add a primary key column. However, that would harm performance rather than improve it: PostgreSQL would still have to follow the HOT chain within each block to the end. Also, every UPDATE
that is not HOT (because there is no more room in the current page) would create a new (identical) primary key index entry, and scanning through these entries would make us visit every page of the table. So, if anything, an index would make the UPDATE
still slower.
pageinspect
Reasoning is one thing, but a look at the actual table data is more convincing. Normally, we cannot see dead row versions, but with the functions from the pageinspect extension a superuser can see all the data in a table. If we are quick enough and run the following query after COPY
is done, but before autovacuum has cleaned up the table, we can see what is in the first page of sum
:
1 2 3 4 5 6 |
SELECT format('(0,%s)', lp) AS tid, t_ctid AS replacement, t_xmax <> 0 AS dead, t_infomask2 & 16384 <> 0 AS hot_updated, t_attrs[1] AS total FROM heap_page_item_attrs(get_raw_page('sum', 0), 'sum'); |
The result will be as follows:
1 2 3 4 5 6 7 8 9 10 11 12 |
tid │ replacement │ dead │ hot_updated │ total ═════════╪═════════════╪══════╪═════════════╪════════════════════ (0,1) │ (0,2) │ t │ t │ x0000000000000000 (0,2) │ (0,3) │ t │ t │ x214e000000000000 (0,3) │ (0,4) │ t │ t │ xe766000000000000 (0,4) │ (0,5) │ t │ t │ xeaa0000000000000 (0,5) │ (0,6) │ t │ t │ x9113010000000000 (0,6) │ (0,7) │ t │ t │ x5c23010000000000 ... (0,225) │ (0,226) │ t │ t │ x1356380000000000 (0,226) │ (1,1) │ t │ f │ xab99380000000000 (226 rows) |
Each entry is the replacement for the previous one, all entries are dead (xmax
is set), and all UPDATE
s except the last one (which created a row in the next page) were HOT.
Now that we understand the problem, the solution is clear: don't repeatedly update the same row in a single transaction. The best way to avoid that will depend on the details of the original workload. In our case, one possible solution would be a statement level trigger using a transition table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
DROP TRIGGER add_to_sum ON item; CREATE OR REPLACE FUNCTION add_to_sum() RETURNS trigger LANGUAGE plpgsql AS $$BEGIN UPDATE sum SET total = total + (SELECT sum(value) FROM new_rows); RETURN NULL; END;$$; CREATE TRIGGER add_to_sum AFTER INSERT ON item REFERENCING NEW TABLE AS new_rows FOR EACH STATEMENT EXECUTE FUNCTION add_to_sum(); |
With this trigger, only a single UPDATE
on sum
will be performed for each COPY
statement. With that, the bulk load takes only slightly longer than 90 milliseconds on my machine!
The most common reason for updates getting slower over time is if you repeatedly update the same rows in a single transaction. The simple solution is to avoid such repeated updates: after all, only the last update of a row in a transaction will be visible after the transaction is done.
VACUUM
does not always remove dead rows, see my blog on 4 Reasons Why VACUUM Won't Remove Dead Rows+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
With great respect for your contributions, I think this post of yours is useful, but sort of covers up the situation where there are no repeated updates (of the same record) and the time spent is nevertheless way too large. On PG14, updating a large table joined on a primary proved prohibitively slow and I recommend, as a work-around, to always create a new table and discard the old table, rather than updating the (old) table. This has cost me days for absolutely no good reasons and the work-around may well get in the way, too.
Being a C( ) programmer, I would enormously welcome the option to have a real update without new inserts, esp with tables of about 10000000 rows, which is fairly usual for me - I simply cannot have to wait for this for 3 or more hours, esp when noticing the database remaining passive endlessly (both cpu, disk, and updates) (pgadmin).
So I hope you will use your influence to put a stop to this over-protection of badly conceived update queries. I way prefer to suffer the consequences.
I an sorry to hear of your problem, but as you said, it must be a different problem than the one I am talking about.
Your problem would require closer investigation. However, I don't think that, based on a single insufficiently investigated experience, you should claim that PostgreSQL v14 is generally slower than older versions when it comes to
UPDATE
s. I think I would have heard about this if it were a wide-spread problem.