Having recently witnessed quite an ingenious hack to implement some good old “println” style development debugging from stored procedures into a file, it prompted me to post knowledge to the Interwebs on two other ways how such a goal can be implemented more transparently. Also with help of some other good old legacy technology in one case. By the way, the main reason for them going for the hack was that being relatively new to databases they didn’t know how to name this thing that they were implementing, so Google wasn’t able to help – once again proof that naming things is one of the hardest problems of computing 🙂
What’s an “autonomous transaction”?
But to start with let’s explain what is an “autonomous transaction” as most people working with databases probably haven’t needed / heard of them and actually they’re not a standard or anything also, thus not too supported by various database engines. The phrase itself comes from the Oracle world I believe and it basically denotes “fire and forget” (sub) transactions that are not connected to the main transactions. And sadly also Postgres does not have direct built-in support for that – you’re always in a real transaction and need to invent a bit if the need arises.
And what would be the common use case? Mostly some logging / auditing / progress tracking into tables, in such a way that the information on the attempt would persist even when the main transaction is rolled back due to an error for example. Remember – in a standard transaction everything is thrown away in case no special measures (savepoints or exception handling sub-blocks in stored procedures) are taken.
The hacky way
So how did the initial implementation that wowed me a bit, looked like?
CREATE FUNCTION public.log(appid text, msg text) RETURNS void LANGUAGE plpgsql SECURITY DEFINER AS $function$ BEGIN execute $$ copy (select $$ || quote_literal(msg) || $$) to program 'tee -a /tmp/$$ ||appid|| $$.log' $$; END; $function$;
Not bad – it works and is relatively short and simple, basically a oneliner. But there are some issues:
- it’s not really self-explanatory, relying on a bit cryptic COPY structure, meant originally for backups or data import / export
- it cannot be always used due to the COPY PROGRAM’s superuser requirement (or the “pg_execute_server_program” grant as of PG 11) that is not always available in cloud / managed environments
- you need physical access to the database server to read the log
In short it could be improved a bit.
Alternative #1 – dblink
First and probably most common alternative would be to use the old “dblink” funtionality. The feature has been there since ages but as there is basically now something better with Postgres Foreign Data Wrappers (FDW) for most use cases (where there’s a fixed set of remote tables), I’m kind of considering it a legacy technology…but it’s still a good fit for this “fire and forget” purpose. To learn more about “dblink” (with random / ad-hoc queries on remote Postgres databases being the main use case) please look at the documentation here.
-- set up the "logging sink" table CREATE TABLE log( msg text not null, sender_ts timestamptz not null, local_ts timestamptz default now() ); CREATE EXTENSION dblink; -- dblink is a "contrib" extension -- define the logger function. -- Note the use of clock_timestamp() -- transactions freeze the "now()" timestamp CREATE FUNCTION log_dblink(msg text) RETURNS void LANGUAGE sql AS $function$ select dblink('host=/var/run/postgresql port=5432 user=postgres dbname=postgres', format('insert into log select %L, %L', msg, clock_timestamp()::text)) $function$;
The advantages are that now superuser is not needed for usage of dblink (installation of extension still needs it) once a password is specified in the connect string, so basically anyone can use it and also no physical access is needed – data is stored in a normal table and available via SQL.
Alternative #2 – PL/Python + some Python driver
The second approach would be something more complex again but not so hacky I would say – stored procedures in PL/Python and opening an explicit external transaction! The code (assuming PL/Python packages and the “psycopg2” Python driver have been installed already) would then look something like that:
CREATE EXTENSION plpythonu; CREATE FUNCTION public.log_python(msg text) RETURNS void LANGUAGE plpythonu AS $function$ import psycopg2 from datetime import datetime conn = psycopg2.connect(host='/var/run/postgresql', port=5432) conn.autocommit = True cur = conn.cursor() cur.execute("INSERT INTO log select %s, %s", (msg, datetime.now())) $function$;
Although being semantically cleaner to read the downside here is that PL/Pythonu still requires more setup and full superuser access (that’s becoming increasingly rare). But on the positive side it’s surprisingly quite some milliseconds faster than the “dblink”! But a tip – to gain back on speed of “dblink”, one could actually use PgBouncer to reduce connection initialization time – which toll on a single simple operation should not be underestimated! According to my quick testing PgBouncer removed those extra milliseconds and things were on par.
To close it off – sadly as we saw no really perfect solutions for autonomous transactions, but “dblink” would be my preferred one. An idea though for a better solution – would be really cool if the Postgres FDW would actually have some option to define such autonomous “fire and forget” or “out of transaction” servers / tables that ignore the main transaction so that a simple INSERT would suffice. Let’s hope something like that will pop up one day.