BY Kaarel Moppel – Having recently witnessed quite an ingenious hack to implement some good old “println” style development debugging from stored procedures into a file, it prompted me to post knowledge to the Interwebs on two other ways how such a goal can be implemented more transparently. Also, with help of some other good old legacy technology in one case. By the way, the main reason for them going for the hack was that being relatively new to databases they didn’t know how to name this thing that they were implementing, so Google wasn’t able to help – once again proof that naming things is one of the hardest problems of computing 🙂

What’s an “autonomous transaction”?

But to start with let’s explain what an “autonomous transaction” is, since most people who work with databases probably haven’t needed / heard of them and actually they’re not a standard or anything. They are not too well-supported by various database engines. The phrase itself comes from the Oracle world, I believe, and it basically denotes “fire and forget” (sub)-transactions that are not connected to the main transactions. And sadly, also Postgres does not have direct built-in support for them – you’re always in a real transaction and need to invent a bit, if the need arises.

What would be the common use cases? Mostly some logging / auditing / progress tracking into tables, in such a way that the information on the attempt would persist even when the main transaction is rolled back – for example, due to an error. Remember – in a standard transaction, everything is thrown away in case no special measures (savepoints or exception handling sub-blocks in stored procedures) are taken.

The hacky way to write autonomous transactions

So what did the initial implementation that wowed me a bit look like?

CREATE FUNCTION public.log(appid  text, msg text)
 RETURNS void
 LANGUAGE plpgsql
 SECURITY DEFINER
AS $function$
BEGIN
execute $$ copy (select $$ || quote_literal(msg) || $$) to program 'tee -a /tmp/$$ ||appid|| $$.log' $$;
END;
$function$;

Not bad – it works and is relatively short and simple, basically a one-liner. But there are some issues:

  • it’s not really self-explanatory, relying on a bit cryptic COPY structure, meant originally for backups or data import / export
  • it cannot be always used due to the COPY PROGRAM’s superuser requirement (or the “pg_execute_server_program” grant as of PG 11) that is not always available in cloud / managed environments
  • you need physical access to the database server to read the log

In short it could be improved a bit.

Alternative #1 – dblink for autonomous transactions

The first and probably most common alternative would be to use the old “dblink” functionality. The feature has been there since ages ago, but as there is basically now something better with Postgres Foreign Data Wrappers (FDW) for most use cases (where there’s a fixed set of remote tables), I’m kind of considering it a legacy technology… however, it’s still a good fit for this “fire and forget” purpose. To learn more about “dblink” (with random / ad-hoc queries on remote Postgres databases being the main use case) please look at the documentation here.

-- set up the "logging sink" table
CREATE TABLE log(
  msg text not null,
  sender_ts timestamptz not null,
  local_ts timestamptz default now()
);
CREATE EXTENSION dblink; -- dblink is a "contrib" extension
-- define the logger function. 
-- Note the use of clock_timestamp()
-- transactions freeze the "now()" timestamp
CREATE FUNCTION log_dblink(msg text)
 RETURNS void
 LANGUAGE sql
AS $function$
   select dblink('host=/var/run/postgresql port=5432 user=postgres dbname=postgres',
	format('insert into log select %L, %L', msg, clock_timestamp()::text))
$function$;

The advantages are that now no superuser is needed for the use of dblink (the installation of the extension still needs it) once a password is specified in the connect string, so basically anyone can use it. Also, no physical access is needed – data is stored in a normal table and available via SQL.

Alternative #2 – PL/Python + some Python driver for autonomous transactions

The second approach would be something more complex again, but not so hacky I would say: Stored procedures in PL/Python and opening an explicit external transaction! The code (assuming PL/Python packages and the “psycopg2” Python driver have been installed already) would then look something like that:

CREATE EXTENSION plpythonu;

CREATE FUNCTION public.log_python(msg text)
 RETURNS void
 LANGUAGE plpythonu
AS $function$
  import psycopg2
  from datetime import datetime
  conn = psycopg2.connect(host='/var/run/postgresql', port=5432)
  conn.autocommit = True
  cur = conn.cursor()
  cur.execute("INSERT INTO log select %s, %s", (msg, datetime.now()))
$function$;

Although this is semantically cleaner to read, the downside here is that PL/Python still requires more setup and full superuser access (that’s becoming increasingly rare). But on the positive side, it’s surprisingly quite a few milliseconds faster than the “dblink”! Here’s a tip – to gain back on speed of “dblink”, one could actually use PgBouncer to reduce connection initialization time – the toll on a single simple operation should not be underestimated! According to my quick tests, PgBouncer removed those extra milliseconds and things were on par.

To close it off – sadly, as we saw, there are no really perfect solutions for autonomous transactions, but “dblink” would be my preferred one. An idea though for a better solution – it would be really cool if the Postgres FDW would actually have some option to define such autonomous “fire and forget” or “out of transaction” servers / tables that ignore the main transaction, so that a simple INSERT would suffice. Let’s hope something like that will pop up one day.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.