First, the big question – should we be using good old triggers at all? Well, actually I’m not going to recommend anything here, as it’s an opinionated topic:) People well-versed in databases would probably see good use cases for them, whereas modern application developers would mostly say it’s an awful practice – doing some “magic stuff”™ kind of secretly. Avoiding that holy war, let’s say that we already need to use triggers – boss’s orders. But then comes the question: Should we be afraid of using triggers in PostgreSQL due to possible performance penalties? Should we plan to beef up the hardware or do some app optimizations beforehand? From my personal experience – no, mostly nothing will change in the big picture if “used moderately”. But let’s try to generate some numbers as that’s where the truth lies…
A pgbench test scenario for trigger performance
All applications are generally unique, so the most critical part of usable performance tests boils down to actually setting up a more or less plausible use case. As Postgres comes bundled with the quick benchmarking tool pgbench, I usually tend to take it’s schema as a baseline and do some modifications on that, based on the type of application that the customer has. In the case of deploying triggers, the most usual use case is probably “auditing” – making sure on database level that we store some data on the author/reason for those changes on all rows. So to simulate such basic auditing I decided to just add two audit columns for all pgbench tables receiving updates (3 of them) in the default transaction mode. So let’s create last_modified_on (as timestamp) and last_modified_by (as text).
Then fill them with triggers, basically something like this:
... ALTER TABLE pgbench_(accounts|branches|tellers) ADD COLUMN last_modified_on timestamptz, ADD COLUMN last_modified_by text; ... CREATE FUNCTION trg_last_modified_audit() RETURNS TRIGGER AS $$ IF NEW.last_modified_by IS NULL THEN NEW.last_modified_by = session_user; END IF; IF NEW.last_modified_on IS NULL THEN NEW.last_modified_on = current_timestamp; END IF; RETURN NEW; $$ LANGUAGE plpgsql; ...
Hardware / Software related to testing trigger performance
Next, I booted up a moderately specced (2 CPU, 4 GB RAM, 48 GB SSD) test machine on Linode (Ubuntu 18.04) and installed the latest Postgres (v10.4) binaries from the official Postgres project managed repository, leaving all postgresql.conf settings, except shared_buffers (which I set at 3GB), at default. For the pgbench scaling factor number, I chose 100, giving us a ~1.3GB database (see here for more how to choose those scale numbers) so that everything is basically cached and we can factor out most IO jitter – checkpoints, background writer and autovacuum are still kind of random of course, but typically they’re there also for real-life systems so not sure if removing them is a good idea.
NB! For carrying out the actual testing, I then compiled the latest (11devel) pgbench, to make use of repeatable test cases (the new –random-seed parameter!), initialized the schema and ran the simplest possible pgbench test for triggers/untriggered case for 2h with 3 loops. Basically something like what you can see below (for the full script see here).
pgbench -i -s 100 pgbench -T 7200 --random-seed=2018
Summary on simple row modification trigger performance
So the first test I did compared pgbench transactions “as is” with just 2 auditing columns, added for the 3 tables getting updates vs the “triggered” case, where for each of those tables also a trigger was installed that was setting the auditing timestamp/username. The avg. transaction latency results came back kind of as expected: 1.173 ms vs 1.178 ms i.e. 0.4% difference – meaning basically no difference in average transaction latency at all for transactions where 3 simple data checking/filling triggers are executed in the background!
: if having a typical OLTP transaction touching a couple of tables, PL/PgSQL triggers containing just simple business logic can be used without further performance considerations!
Hmm…but how many simple triggers would you then need to see some noticeable runtime difference in our use case? Probably at least a dozen! In a typical short lived OLTP transaction context we’re still mostly IO (largely disk fsync speed) and especially for multi-statement transactions also network (round-trip) bound…thus worrying on some extra CPU cycles spent in triggers can be spared.
So what could be tried more to get an idea of penalties resulting from triggers? First we could make the transactions thinner by getting rid of network round trip latency – a single UPDATE on the pgbench_accounts would be good for that in our case. Then we could also let triggers insert some data into some other tables…but that’s enough content for another blog post I believe. See you soonish!