A few years ago, I wrote a short post on a similar topic; since then, I’ve often seen that the whole concept of suggesting to do more INSERT-s in critical parts of the code seems pretty strange to most customers. It’s even alien to those who are not new to databases in general. So I thought that the technique could use another go-round since I think it’s quite a nice design trick that can alleviate some certain problems when used correctly.
Insert-only data modelling
So what is “insert-only data modelling” about? Basically, it’s very simple – it’s about reducing the amount of UPDATE-s and doing more INSERT-s instead 🙂 Of course, there’s a limit somewhere – you cannot endlessly generate new data if your objects are mutating a lot, so the whole concept is not really well-suited for most use-cases – BUT, it can very well help to avoid specific “peaks” with PostgreSQL if applied correctly… since in the current implementation, the
UPDATE operation is a bit of an expensive one, and basically a
INSERT is happening behind the scenes. Not to forget about the much better HOT-updates though, but they’re out of reach when we, for example, change some indexed columns.
A problematic customer use case
The example at hand was a problematic application — basically, a kind of workflow engine. It had objects going through a pipeline of sorts over a week or so, changing statuses along the way — and every status change was recorded in the database. It’s a pretty typical situation.
The transaction volumes were not too high, a couple of hundred transactions per second on average — which is not a lot at all nowadays. However, given the customer’s modest hardware, it was a problem. They had already tried to upgrade the hardware, but the problem persisted due to a special nuance; their load problems only appeared for an hour or so. Most of the time, it was very quiet… so they found it difficult to justify the costs of scaling-up hardware and wanted to tune Postgres or the schema instead.
Some modest tuning “winnings” were indeed possible as is usually the case… but nevertheless, during peak hours, disk I/O has maxed out on status UPDATE-s as the cloud VM-s’ disks were just slow. The only trick that gave us some more room to breathe was “asynchronous committing” – which unfortunately has a nasty side effect; namely, that you could lose a couple of hundred milliseconds’ worth of the last transactions in the case of a crash… so it must be applied wisely. By the way, it’s better to apply such specified settings dynamically, and only apply them for the problematic hours! Luckily, Postgres allows you to change most parameters dynamically via SQL in a transparent way with
ALTER SYSTEM and
pg_reload_conf(), so that the very next transaction can make use of the new setting.
But OK – besides the standard tuning things like more aggressive Autovacuum settings, switching to better disks, using ENUM-s to reduce row size, etc. — another thing I recommended to reduce peak I/O was to move that part of the logic into an “insert-only” database flow. That meant that we reduced the (mostly) random
UPDATE traffic over the whole table! It also had quite a load of historical data, so that cache rates were not too great, either.
The above suggestion is actually the crux of this post! Additionally, here’s a short overview of some pros and cons of “insert-only” data modelling.
- More sequential I/O patterns, data more digestible in case of weak disks
- A lot fewer “full page writes” after checkpointing
- Faster single transactions
- We get action history “for free”
Note that the main benefit of more sequential I/O patterns would in real life speed up things considerably only if both of the following are true:
- we’re not using fast modern SSD disks, and
- when the “active” dataset does not fit well into Shared Buffers / RAM.
- More complex schema and SQL needed;
- Bigger total disk footprint;
- Need to schedule some maintenance work over non-peak hours;
- Can never benefit from HOT updates which are, generally, desirable.
The biggest downside from the “ops” angle here is, I think, the database growth, so that you are at some point in the future forced to set up maintenance jobs that delete or archive some historic data during the “low hours”. This should generally work nicely, though, given that there are more “low hours” than “peak hours”.
To keep from becoming too theoretical and in order to “visualize” possible performance gains a bit, I threw together a sample schema mimicking the problematic use case and did some test runs. I planted Postgres onto a relatively weak cloud instance with network-mounted HDD disks, thinking that should bring out the difference well. Such a disk setup cannot be generally recommended for OLTP databases where performance is relevant, and most cloud providers, luckily, don’t offer anything like that anymore.
For the exact test schema see here, but for the traditional update-based workload, the most critical part (the status updates) looked something like as follows:
/* 1st generate 100 million test rows: INSERT INTO standard_flow SELECT generate_series(1, 1e8)... */ /* test loop */ -- get a random object SELECT (random() * max(session_id))::int8 as session_id FROM standard_flow; -- loop 3 times for 1 object UPDATE standard_flow SET state = 'APPSTATE_1' WHERE session_id = :session_id; -- sleep a bit to simulate that per object actions are not consecutive SELECT pg_sleep(0.1 * random());
An for the “insert-only” part:
/* init test data by “copying” over and multiplying from the “standard” schema … INSERT INTO alternative_flow… INSERT INTO alternative_flow_state … generate_series(1, 3) i; */ /* test loop */ -- get a random object SELECT (random() * max(session_id))::int8 as session_id FROM alternative_flow; -- loop 3 times for 1 object INSERT INTO alternative_flow_state (session_id, state) VALUES (:session_id, 'APPSTATE_1'); -- sleep a bit to simulate that per object actions are not consecutive SELECT pg_sleep(0.1 * random());
Hardware / software infos: 4 vCPU, 8GB RAM, HDD, PostgreSQL v12.4 with defaults except shared_buffers=2GB, pgbench simultaneous clients 32, 2x more data than RAM for the standard flow, 3.5x more for the alternative flow.
Looking at the numbers
Duration of one status change operation measured via pg_stat_statements:
|Mean time (ms)
|Stdev dev time (ms)
|Shared Buffers hit %
So what does this simple test tell us? It shows that one can nicely optimize the schema to gain time in certain parts of “the whole” if needed. But quite a nice “per status update” improvement of roughly 2.5x here – this could easily get rid of your “peak”!
Don’t forget that the standard UPDATE-based test was given a nice head start when inserting the initial “old” data directly into the final state – there was no bloat, which after many UPDATE-s per row, could become a substantial factor.
To summarize – there are many ways to optimize peaks, either via hardware or software, and generally, you’ll be holding better cards if you have some idea about how the underlying implementations work.
When you get to peaks of thousands of transactions per second, you ultimately need to start thinking about how much data is being pushed around with every click and in which patterns. Keep in mind, some physical limitations, sadly, still apply; things are not yet on the quantum level, and disk access can quickly get laggy.
As always, remember that there are no free lunches – with “insert-only” data modelling you’ll still need to give some winnings back in other areas of your setup, and the total I/O amounts will increase. The “quiet hours” for maintenance are also definitely something to throw into the equation — we now basically need to do the job of the autovacuum ourselves, so you better know what you’re doing. Keep an eye on things via some monitoring etc, as is of course recommended with all more complex applications 🙂
Also note: a lot of such databases “peaks” are sometimes better rectified with some special purpose queuing solutions (ActiveMQ, RabbitMQ, pgq) in the application layer to begin with, so that we can keep our DB model simple.