Implementing Autonomous Transactions in Postgres

07.2019 | Category: How To | Tags: plpgsql, transactions

07.2019

Category: How To

BY Kaarel Moppel - Having recently witnessed quite an ingenious hack to implement some good old "println" style development debugging from stored procedures into a file, it prompted me to post knowledge to the Interwebs on two other ways how such a goal can be implemented more transparently. Also, with help of some other good old legacy technology in one case. By the way, the main reason for them going for the hack was that being relatively new to databases they didn't know how to name this thing that they were implementing, so Google wasn't able to help – once again proof that naming things is one of the hardest problems of computing 🙂

Table of Contents

What's an "autonomous transaction"?

But to start with let's explain what an "autonomous transaction" is, since most people who work with databases probably haven't needed / heard of them and actually they're not a standard or anything. They are not too well-supported by various database engines. The phrase itself comes from the Oracle world, I believe, and it basically denotes "fire and forget" (sub)-transactions that are not connected to the main transactions. And sadly, also Postgres does not have direct built-in support for them – you're always in a real transaction and need to invent a bit, if the need arises.

What would be the common use cases? Mostly some logging / auditing / progress tracking into tables, in such a way that the information on the attempt would persist even when the main transaction is rolled back - for example, due to an error. Remember - in a standard transaction, everything is thrown away in case no special measures (savepoints or exception handling sub-blocks in stored procedures) are taken.

The hacky way to write autonomous transactions

So what did the initial implementation that wowed me a bit look like?

CREATE FUNCTION public.log(appid  text, msg text)
 RETURNS void
 LANGUAGE plpgsql
 SECURITY DEFINER
AS $function$
BEGIN
execute $ copy (select $ || quote_literal(msg) || $) to program 'tee -a /tmp/$ ||appid|| $.log' $;
END;
$function$;

1

2

3

4

5

6

7

8

9

CREATE FUNCTION public.log(appid text, msg text)

RETURNS void

LANGUAGE plpgsql

SECURITY DEFINER

AS $function$

BEGIN

execute $ copy (select $ || quote_literal(msg) || $) to program 'tee -a /tmp/$ ||appid|| $.log' $;

END;

$function$;

Not bad - it works and is relatively short and simple, basically a one-liner. But there are some issues:

it's not really self-explanatory, relying on a bit cryptic COPY structure, meant originally for backups or data import / export
it cannot be always used due to the COPY PROGRAM's superuser requirement (or the "pg_execute_server_program" grant as of PG 11) that is not always available in cloud / managed environments
you need physical access to the database server to read the log

In short it could be improved a bit.

Alternative #1 - dblink for autonomous transactions

The first and probably most common alternative would be to use the old "dblink" functionality. The feature has been there since ages ago, but as there is basically now something better with Postgres Foreign Data Wrappers (FDW) for most use cases (where there's a fixed set of remote tables), I'm kind of considering it a legacy technology... however, it's still a good fit for this "fire and forget" purpose. To learn more about "dblink" (with random / ad-hoc queries on remote Postgres databases being the main use case) please look at the documentation here.

-- set up the 'logging sink' table
CREATE TABLE log(
  msg text not null,
  sender_ts timestamptz not null,
  local_ts timestamptz default now()
);
CREATE EXTENSION dblink; -- dblink is a 'contrib' extension
-- define the logger function. 
-- Note the use of clock_timestamp()
-- transactions freeze the 'now()' timestamp
CREATE FUNCTION log_dblink(msg text)
 RETURNS void
 LANGUAGE sql
AS $function$
   select dblink('host=/var/run/postgresql port=5432 user=postgres dbname=postgres',
	format('insert into log select %L, %L', msg, clock_timestamp()::text))
$function$;

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

-- set up the 'logging sink' table

CREATE TABLE log(

msg text not null,

sender_ts timestamptz not null,

local_ts timestamptz default now()

);

CREATE EXTENSION dblink; -- dblink is a 'contrib' extension

-- define the logger function.

-- Note the use of clock_timestamp()

-- transactions freeze the 'now()' timestamp

CREATE FUNCTION log_dblink(msg text)

RETURNS void

LANGUAGE sql

AS $function$

select dblink('host=/var/run/postgresql port=5432 user=postgres dbname=postgres',

format('insert into log select %L, %L', msg, clock_timestamp()::text))

$function$;

The advantages are that now no superuser is needed for the use of dblink (the installation of the extension still needs it) once a password is specified in the connect string, so basically anyone can use it. Also, no physical access is needed – data is stored in a normal table and available via SQL.

Alternative #2 - PL/Python + some Python driver for autonomous transactions

The second approach would be something more complex again, but not so hacky I would say: Stored procedures in PL/Python and opening an explicit external transaction! The code (assuming PL/Python packages and the "psycopg2" Python driver have been installed already) would then look something like that:

CREATE EXTENSION plpythonu;

CREATE FUNCTION public.log_python(msg text)
 RETURNS void
 LANGUAGE plpythonu
AS $function$
  import psycopg2
  from datetime import datetime
  conn = psycopg2.connect(host='/var/run/postgresql', port=5432)
  conn.autocommit = True
  cur = conn.cursor()
  cur.execute('INSERT INTO log select %s, %s', (msg, datetime.now()))
$function$;

1

2

3

4

5

6

7

8

9

10

11

12

13

CREATE EXTENSION plpythonu;

CREATE FUNCTION public.log_python(msg text)

RETURNS void

LANGUAGE plpythonu

AS $function$

import psycopg2

from datetime import datetime

conn = psycopg2.connect(host='/var/run/postgresql', port=5432)

conn.autocommit = True

cur = conn.cursor()

cur.execute('INSERT INTO log select %s, %s', (msg, datetime.now()))

$function$;

Although this is semantically cleaner to read, the downside here is that PL/Python still requires more setup and full superuser access (that's becoming increasingly rare). But on the positive side, it's surprisingly quite a few milliseconds faster than the "dblink"! Here's a tip - to gain back on speed of "dblink", one could actually use PgBouncer to reduce connection initialization time - the toll on a single simple operation should not be underestimated! According to my quick tests, PgBouncer removed those extra milliseconds and things were on par.

To close it off - sadly, as we saw, there are no really perfect solutions for autonomous transactions, but "dblink" would be my preferred one. An idea though for a better solution – it would be really cool if the Postgres FDW would actually have some option to define such autonomous "fire and forget" or "out of transaction" servers / tables that ignore the main transaction, so that a simple INSERT would suffice. Let's hope something like that will pop up one day.

3 responses to “Implementing Autonomous Transactions in Postgres”

xzilla says:

July 12, 2019 at 3:50 pm

As part of a large Oracle migration, we built pg_jobmon, a complete framework for just this kind of thing. Check out https://github.com/omniti-labs/pg_jobmon & feel free to reach out if you have questions / need help.

Reply
Luca Ferrari says:

July 15, 2019 at 2:39 pm

The odd implementation of the log function does something different than the other twos: it logs to a file, while your proposed solutions log to a table (even if a remote one).

In this context, I think also a foreign data wrapper could help .

Also it is a little awkward the dollar quoting and its nesting in the first example!

Reply
jobinau says:

July 15, 2019 at 3:22 pm

Good blog post. love to see different options and workaround using pgbouncer 🙂

Another blog post by Yaser mentioned about overhead of dblink as "The PostgreSQL dblink is session-specific. Any dblink opened in one session cannot be used in a different session. This means that each session will have to open a new DB connection and this increases
response time." Reference : https://aws.amazon.com/blogs/database/migrating-oracle-autonomous-transactions-to-postgresql/

Reply

Implementing Autonomous Transactions in Postgres

What's an "autonomous transaction"?

The hacky way to write autonomous transactions

Alternative #1 - dblink for autonomous transactions

Alternative #2 - PL/Python + some Python driver for autonomous transactions

3 responses to “Implementing Autonomous Transactions in Postgres”

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts