CYBERTEC Logo

Simulating temporal tables with dblink and replication delay

On some rare occasions, I’ve been asked if Postgres supports system-versioned temporal tables - in other words, time travel on table data for your queries. As quite often with Postgres, thanks to its extensibility, I can answer: well, officially, it does not...but, you can make it work, nevertheless 🙂

Why are people even asking such a crazy question? Well, the idea has something to it. For example, when something fishy suddenly happens with your data, i.e. an UPDATE returns “1000 rows affected”, but you were expecting 10 rows to be affected - then it would be nice to quickly “scroll back the time” and see how the table actually looked 5 minutes ago, to maybe then reinstate the lost data quickly if it indeed was an error. Turning to real cold backups in such a case could mean hours spent - too slow!

Temporal tables options

The first possibility for achieving such a requirement on any database system is with meticulous schema design, and basically not deleting data at all - i.e. “insert only” data modelling. In some cases, it can be pretty convenient, so if the concept is new to you I’d recommend checking out the effects in the PostgreSQL context, from earlier posts here and here.

At the system-level implementation, Oracle has had something like that for years, in the form of the “AS OF” syntax. Also, MS SQL has supported something similar, since its v2016 release. I must say that I’ve never used that particular Oracle feature myself, but I’ve heard that it has some limitations in terms of huge databases with lots of changes happening - so probably something disk / WAL storage related. Which I can well believe - database engines almost always need to pick and optimize for some important attributes - can’t have them all.

In short, it’s true that PostgreSQL has no such native feature out of the box, but the idea is definitely not a new one - as can be seen here. As always with Postgres, there are some related extensions or plain schema design scripts available, which I’ve seen quite a few of. The most promising of them might be this one, but I’ve yet to try it out.

My workaround solution for today is a mishmash of two more “rareish” PostgreSQL features: dblink + recovery apply delay.

Time delayed replication

This cool Postgres feature largely lives in the shadows, but it basically allows you to have an extra regular streaming replication node, constantly time-delayed! So you can tell it to not apply the changes coming from the primary node immediately, which is the default. Instead, it will implement them after some time, like 1h.

The parameter to set the “apply delay” is called recovery_min_apply_delay (official documentation here) and can be conveniently set in human readable units. Note the “min” or minimum part - meaning, if you have some longer queries on the node, or the hardware is weak, the actual delay could be longer.

Bear in mind that for huge and very busy databases, in practice this feature can get “costly” in the sense of disk space. - as the WAL pulled in must be stored for the selected time period (and usually even a bit more) until a “restartpoint” is performed. It’s better to start conservatively with lags measured in hours, not days. If this really becomes a show-stopper for longer lag periods, there are alternatives: one can switch to the old “archive recovery” replication method (which was the only binary replication possibility before v9.0) and only pull in WAL when needed, or even maybe regularly re-syncing with a PITR backup system like pgBackRest which performs very snappily in “delta” mode.

Postgres-to-Postgres communication over “dblink”

This feature, or an extension actually, is probably more well-known than the previous feature - since something equivalent is provided pretty much by any other DBMS system. It basically allows you to access arbitrary remote Postgres nodes on-the-fly, in a fully dynamic way - i.e. we don’t need to set up anything on the target system or the executing system! The queries can be arbitrarily complex - given that we know what data types are being returned; the syntax requires that we declare aliases. So a “Hello world” for “dblink” in simplest form would look something like this:

By the way, if you’re planning to make more heavy use of this functionality, you can also use some more elaborate access patterns with connection caching or privilege escalation - see the documentation for that.

1 + 1 = 10

So what do we get when we cross-pollinate the two features previously explained? Yep, you guessed it - a dose of extra coolness in the form of “temporal tables”!

There are some small annoyances to be dealt with. As explained in the “dblink” section - we need to explicitly know our returned data types, and the SQL code is not going to look pretty, intermingled with some connect string info. Also, since it’s not really “native”, performance could potentially be sub-optimal, and would need verification in each individual use case.

Additionally, we cannot just select a new “timeline” within a query as we please, but rather we need to meddle with the replica configuration. We can only conveniently scroll the time forward - say from 2h lag to 1h lag. Going to a 3h lag would require a rebuild or PITR restore. But we can have many replicas running simultaneously with different lags, with a little extra hardware cost.

In the end, it works, so let’s cut to the code and set up a minimal useful example with 2 nodes, using the latest PostgreSQL version - v13.2.

Test setup

For test hardware, I used 2 DigitalOcean nodes with 4 vCPU, 8GB of RAM and 160GB of SSD disks in the same AMS3 AZ. The IP address in the sample code below means 10.110.0.4 is the primary and 10.110.0.5 is for the time-delayed replica node. As the test dataset, I used the good old default pgbench schema, with a scaling factor of 100, resulting in 1.3GB of data, so that everything would be cached and we’re not affected by disk slowness, since we don’t want to measure disk access here.

The only changes I made to the server configuration on both nodes were the following: “work_mem=512MB” and “shared_buffers=2GB” + “trust” access in pg_hba.conf between the two hosts’ private IPs.

Note that I’m skipping the build-up of initial streaming replication, as it would be 100% a standard setup. We’ll skip directly to more interesting stuff.

Let’s see if “dblink” works by trying to access the secondary node from the primary one, with the former being a completely normal up-to-date streaming replication HA replica which is ready for a theoretical failover.

Yay, it works! So now we can go and enable the “apply lag” on the replica side, and verify that the feature indeed functions as it should.

Applying the “apply lag”

On node 2, we can enable the “apply lag” dynamically, but changing the configuration file + restart also works.

Now we’re ready to generate some test transactions on the primary, and verify that the time-delayed replica does not yet reflect the change:

But on the replica:

We are still on zero, since we see the “lag” feature working!

Use case - discovering potentially harmful data changes

Let’s assume that we want to use a “pseudo temporal tables“ to see which rows have changed since 1 hour ago, which might be a common use case. We basically need to then calculate a diff - combining data both from the primary and the time-delayed replica. For the number of changed rows, let’s just take about 1% of total rows, i.e. 100k rows in our case. So on the primary, let’s execute the following:

And now, to the diffing part. The query itself in “meta-SQL” could, in its simplest form, look something like this:

Note that this is one of those rare cases where we definitely want to use a FULL OUTER JOIN as we also want to know about new / disappeared rows. If we were only interested in new and changed rows, we could just use a LEFT JOIN, and the query would probably be quite a bit faster.

How does it perform?

Now to the more important question - how does “dblink” perform with our test query? Can we actually use it in the real world to compare a significant amount of rows? Let’s find out...

Around 20 seconds - not bad, I believe! Given 10 million rows scanned on both sides, plus transmitting that 10 million over the wire once.

NB! Note that I actually executed the query a couple of times to get everything cached, since we usually want to remove disk access from the equation. I actually didn’t want to get the changed rows, so I used EXPLAIN ANALYZE with additional costs, and particularly with timings disabled, to minimize the “observer effect”.

A bit of relativity theory

Along the lines of query performance...I started to wonder: was the number we got actually any good? How would it perform if the data was totally local and persistent, i.e. like a normal table? So on the primary side, I pulled in the whole lagged dataset from the standby into a separate table!

After a couple of runs, we arrived at around 11s. As expected - considerably faster locally 🙂 But also not really tragic I would say, at least not on an order of magnitude. I could live with that, given that something like this is a rare occasion. Well, at least, it should be 🙂

Summary

As we saw, this somewhat obscure approach of “dblink” + “recovery delay” mixed together can definitely deliver good results within a reasonable amount of time for our simple test case - so the approach might be worth making a mental note of, if you haven’t seen it before. The performance side could get problematic for really large data amounts - but hey, we’re competing with much more expensive and complex database systems, and we still get the job done, without paying a dime!

Also, the test buildup could surely be improved - consider that we were changing quite a lot of data - 1% of some huge transactional table usually does not get changed within an hour. In a real-world scenario, we could potentially take advantage of some indexes, whereas here we only had full scans.

In the future, the coolest would be some level of support for something like that on the “postgres_fdw” side - I even actually tried it, but currently it’s not an intended use case, so recursion went on behind the scenes, with lots of connections being opened until some ‘out of connection’ errors popped up.

Thanks for reading!

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram