Version 13 has been out there for over a month by now and the most important stuff has already been well digested… but luckily (or not) there’s so much non-major stuff in every release! This becomes quite evident when looking at the release notes with its 150 items of change or peeking at the great “Why Upgrade” website or doing direct Git archeology – 2233 changes in total between 12.4 and 13.0!

So in this post I’m highlighting some changes that stood out to me on the 2nd reading of the release notes. Note that I’m intentionally not touching the most prominent features like the new B-tree index format/deduplication, trusted extensions, and parallel VACUUM that have gotten a lot of attention already.

DBA side favourites

Generate backup manifests for base backups and add a tool to verify them

So now when you pull a new basebackup (to build a standby or just to store for a while for PITR purposes) using the pg_basebackup tool, by default unless explicitly overridden by the --no-manifest parameter, a listing file with all fetched items together with their checksums will be stored in the target directory. And a new accompanying tool named pg_verifybackup, when pointed to a base backup, can then use this listing file to verify file contents on the receiver side, removing the need, for example, to start the instance and dump out the contents to “/dev/null”.

From the details side – one can also tune the behavior a bit (various checksum algorithms, etc), but in practice, the most useful flags seem to be the --no-parse-wal and --wal-directory parameters. The first will skip checking the WAL at all which is useful if we pull snapshots for PITR purposes and don’t include WAL at all in our base backups (a standard thing to do in such case) and the WAL-s are actually pushed onto some S3/cloud sink and not just onto some NFS mount. For the latter case, there’s the second parameter – --wal-directory.

All in all my favorite non-performance feature with this v13 release, helping to avoid some potential disasters over the years to come.

Allow pg_rewind to use the target cluster’s restore_command to retrieve needed WAL

The feature is enabled using the -c/--restore-target-wal option and it basically executes the target instance’s (the $DATADIR being synced) restore_command to fetch the recent WALs needed to detect and wipe any “lone changes” to the instance. A great feature that makes pg_rewind a lot more usable – as previously you had to be typically pretty fast to arrive at the party as checkpoints recycle/throw away WALs pretty fast with default settings, meaning it was something more for HA automation tools like Patroni for example. Well, one could copy stuff manually, but…

And by the way, don’t forget though that you need to prepare for rewinding before you get into trouble in the first place – by enabling data checksums at init time (or now also possible later with pg_checksums, in offline mode though) or setting the wal_log_hints parameter.

Add an option to pg_rewind to configure standbys

Basically exactly the same -R/--write-recovery-conf parameter available for pg_basebackup for years, allowing to switch the rewinded node back to the HA cluster even more conveniently/faster. Makes perfect sense actually as this is the main purpose of using pg_rewind anyways. From the technical side – this is achieved by writing the specified --source-server connect string also into postgresql.auto.conf primary_conninfo parameter.

Allow reindexdb to operate in parallel

This is one of the “parallelity” changes and a pretty nice addition for those instances where you have a dedicated downtime/maintenance slot and want to speed up bloat elimination without more complex scripts or techniques, like using 3rd party extensions like pg_repack or pg_squeeze. Parallel mode is enabled with the new --jobs option. By the way, I normally always recommend to run this tool also not in default mode but limited to biggest/most active tables or whole schemas, using the --table and --schema flags.

Use the directory of the pg_upgrade program as the default --new-bindir setting when running pg_upgrade

This is a very welcome update for me personally as we need to quite often connect to customer systems over some horribly slow and small web console emulators and the pg_upgrade command is pretty much the only one that did not want to fit on a single line. I know, I know – there’s also an option to declare environment variables like PGDATANEW but explicit is still better than implicit and I prefer having important stuff directly under my nose. So now something like that will be enough to do the final step of the upgrade:

/usr/lib/postgresql/13/bin/pg_upgrade --link -d pg12 -D pg13 -b /usr/lib/postgresql/12/bin/

Allow dropdb to disconnect sessions using the target database, allowing the drop to succeed

This is enabled with the -f option. This feature got some attention already as basically it’s quite nice and convenient… but as it’s also a footgun I’ll probably avoid it for production operations. Also for production DBs I usually recommend not dropping the DB directly but just renaming it at first…

Allow pgbench to partition its “pgbench_accounts” table

This time I’m really glad that my favorite satellite program coming with Postgres, the quick benchmarking tool pgbench has received numerous updates. The most useful to me is the fact that now you can partition the main data table by providing just some flags – previously you had to do it in many steps, altering the schema manually and then specifying some --init-steps again. Now 2 out of 3 built-in partitioning schemes are supported and testing out performance benefits of partitions (gaining parallel autovacuum most importantly) is easy as:

pgbench --initialize --scale=X --partition-method=hash --partitions=4

And another benefit when you’ve again forgotten about the exact syntax of declaring partitions and you quickly want to look it up without switching to a browser:

PGOPTIONS="-c log_statement=all -c client_min_messages=LOG" pgbench -i --partition-method=hash --partitions=2  -I dt

Allow pgbench to show script contents using option --show-script

Another minor but nice time-saver – previously it was pretty tedious to check what kind of SQLs the builtin test modes actually generate. One had to enable the query log or go to the documentation or just run a single transaction with the -r flag like pgbench -t1 -r -N. But now it’s a bit more slick and explicit:

pgbench --show-script=simple-update
-- simple-update: <builtin: simple update>
\set aid random(1, 100000 * :scale)
\set bid random(1, 1 * :scale)
\set tid random(1, 10 * :scale)
\set delta random(-5000, 5000)
BEGIN;
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
END;

Allow pgbench to generate its initial data server-side, rather than client-side

So this change exploits the very useful generate_series() function (that everyone using Postgres should know and master!) to generate the data on the fly, so basically only 1 INSERT statement is used instead of the old COPY + data streamed over. On the look of it “server-side” sounded faster to me, but when doing a small test with a subnet co-located server I actually saw that it was about 2x slower! I assume it’s about INSERT having a larger overhead than COPY and in short seems it makes sense to use this new flag only when you really want to reduce the network traffic or the target server is really far away from you – then you would probably see some speedup.

$# pgbench -i -s1000 -q -I dtG -h testkast -p5432
dropping old tables...
creating tables...
generating data (server-side)...
done in 363.64 s (drop tables 0.02 s, create tables 0.02 s, server-side generate 363.60 s).

$# pgbench -i -s1000 -q -I dtg -h testkast -p5432
dropping old tables...
creating tables...
generating data (client-side)...
100000000 of 100000000 tuples (100%) done (elapsed 194.81 s, remaining 0.00 s)
done in 197.27 s (drop tables 0.21 s, create tables 0.01 s, client-side generate 197.05 s).

Allow WAL storage for replication slots to be limited by max_slot_wal_keep_size

Replication slots exceeding this value are marked invalid and the extra WALs above this threshold will be still recycled/deleted after a CHECKPOINT when they’re not anymore needed for data consistency reasons. A great little addition to make running flaky replicas safer – a golden mean of sorts between the rigid wal_keep_segments and the unforgiving replication slots.

Developer/performance favourites

To finish off – also a couple of automatic/transparent changes where users don’t need to lift a finger to start benefiting. Well, except for upgrading in the first place of course 🙂

Improve retrieval of the leading bytes of TOAST’ed values

This might sound very obscure at first to newer Postgres developers… as I bet they’re pretty sure that they’re not using any TOAST features 🙂 And indeed it’s some automatic behind-the-scenes “magic” to optimize storage for large column values that you don’t need to care about mostly.

But with this change some very typical workloads like substring matching/filtering for large string values which Postgres typically decides to move to TOAST and additionally compress, are now easily 10x (and even more) faster! That’s way above the typical incremental speedups that we usually get for most features when upgrading.

Implement incremental sorting

Another huge win that needs an example probably. So this addition is basically taking partial ordering of first index columns into account so that you can still benefit from an index even if the 2nd or 3rd ORDER BY column of your query is not included in the index – so in short an “index scan” vs “sequential scan” situation, which depending on the table size can yield wins of orders of magnitude as the table size grows. My testing with a small 1 million rows tables that fitted fully into “shared buffers” showed a 120x speedup for example!

CREATE UNLOGGED TABLE test(a int, b int);
INSERT INTO test SELECT i, i FROM generate_series(1, 1e6) i;
CREATE INDEX ON test (a);
EXPLAIN ANALYZE SELECT * from test where a < 1000 order by a, b;

Allow control over how much memory is used by logical decoding before it is spilled to disk

This is controlled by the new logical_decoding_work_mem parameter, with the default value of 64MB and a minimal value of 64kB. This is a very nice addition to those (relatively few) that use Logical Replication but are having performance issues… which happens sadly quite often when not running a pure OLTP-style workload as the feature is not really optimized yet for bulk data operations. Previously it was hard-coded to spill the reorder buffer to disk after only 4096 row changes for example. By the way, the next major version is set to improve the bulk operations performance furthermore with streaming of large in-progress transactions – so definitely waiting for that.

Danger Zone

Add the backend type to CSVLOG output

Left this one to the end so that you’d hopefully remember it more likely… as this one is potentially a breaking change (in addition to the list of things at the top of the release notes) if you use the CSVLOG log destination and use some more sophisticated log analyzing techniques other than “grep” to make sense of the information.

The new column added to the list of previous ones is called backend_type and is of data type text. But yes, seems that it could have been highlighted more prominently in the release notes as if it caught the most popular Postgres log parsing tool called PgBadger off guard, it will probably hit also some other users. Lots of people actually load their database logs into some tables to power some dashboards or alert because it’s really easy to do so actually! Check the documentation on the “file_fdw” for example if you want to see how.

Thanks for reading and see you hopefully in a year for similar reasons 🙂