CYBERTEC PostgreSQL Logo

Tips and tricks to kick-start Postgres year 2020

01.2020 / Category: , / Tags: |

Quite often when visiting customers or doing trainings, I see that people are quite happy seeing and learning about some little time-saving PostgreSQL tricks, commands or SQL snippets on the screen and they usually ask to slow down and explain or just say “wow, didn’t know you could do that”. As these tips and tricks are not too much material on their own, I thought I’d just make note of such titbits and write a post on that later - and now it seems  that this “later” arrived 🙂 So below there is a list without any particular ordering on some Postgres features and tricks that I’ve gathered over the last year and that might be news for more casual Postgres users. Hope you’ll find something useful and a happy new Postgres year!

Dynamic SQL loops with psql’s “gexec”

This could easily be the most useful one, especially for DBA-s doing some maintenance operations. Since a couple of years the PostgreSQL’s native query tool can do kind of “dynamic programming” - meaning you generate some SQL-s based on a queries output (i.e. the resultset is dynamic) and then you execute all of those generated SQL-s in sequence. Typical uses might be: “vacuuming” or gathering statistics for all the biggest tables that haven’t gotten any automatic “care” recently, getting row counts for all tables, generating thousands of tables for test purposes, rebuilding most bloated indexes, etc - in short stuff that previously required leaving the psql environment and writing a small script.

Note that gexec is a psql command, not SQL!

Session variables

Officially session level user variables are not really advertised in the documentation and not as convenient as in some other database systems (this might change in v13 with a proposed new “schema variables” feature) but it works, and they can be freely used in your applications when needed. How does it work? A small sample below:

Sleeping more conveniently

The sleep functions are not overly useful and rather for demonstration purposes (creating some locking situations and then trying to find out who blocked whom) but after using the pg_sleep() function for years, that takes input in seconds, to my horror one day I discovered that there are more convenient functions that accept even human readable input!

Setting per function planner constants

This is not something that everyone can use, as the heyday of stored procedures is surely over, but they’re nevertheless very useful and performance friendly if applied in correct situations. So one transparent performance tuning aspect of stored procedures is that you can tune all the PostgreSQL configuration parameters available for your procedure of interest. It runs the fastest, without changing any application code or affecting any other queries / procedures. And this is actually a huge benefit and a serious argument to use stored procedures in a tiered / compartmentalized environment (think banks, etc). Code for such changes looks like:

Duplicating databases for testing purposes

There are a couple of different ways to achieve that (with pg_dump most commonly used) but actually the fastest way to duplicate databases is to use the CREATE DATABASE syntax with the TEMPLATE modifier. Something like:

What does this command do then? Well sadly on operational systems it will hang first...as the grave downside of the method is that it expects no active sessions on the copied DB - thus it’s not suitable for 24/7 live systems but for test systems or where some off-hours interruption is OK. But after terminating all sessions, it will binary copy all the existing files from the specified template DB into a new DB, thus no parsing, exporting, importing, building indexes. FYI - for snapshotting file systems like ZFS, Btrfs, EBS it can even be used during operation as the copy doesn’t cost initially anything there. So a more real life sample would look something like that:

Defining cluster initialization defaults (Debian)

If we are already talking about test instances, or cases where for some reason you need a lot of instances on a single compute node i.e. multi-tenant on instance level or when running some test farms - there is a couple of quite unknown file locations that enable you to override the default configs and instance initialization parameters. Server configs should be (and mostly are) somehow versioned and managed, so I’ve used it rather to set initialization settings, namely enabling PostgreSQL side data checksums for new clusters. This cluster defaults file can be found at “/etc/postgresql-common/createcluster.conf” and I usually change only one line:

FYI - besides the default server config and initialization settings there are also some more cluster based files to customize the behaviour of “pg_ctl” (e.g. timeouts, restart mode) and service autostart behaviour, but the defaults there are quite sane.

PS this titbit only applies to Debian based systems (and PGDG repos i.e. the “official” packages from the Postgres project) and assumes that you use the according wrappers like “pg_createcluster”.

Sample config files

This is what around 98.375% of people find surprising - every PostgreSQL installation also has a bunch of sample configuration files on board, both for the server and client utilities! For example for those cases where you maybe accidentally erased some of them or you’re just curious what were the original defaults (although this can also be investigated via the pg_settings system view). For Debian based systems these files are located in “/usr/share/postgresql/$PGVER/” and for Redhat based ones in “/usr/pgsql-$PGVER/share/”.

A fun fact on config files - postgresql.conf is not even absolutely required as all settings have a default value compiled in! Only at least one access defining row in pg_hba.conf is required.

Selecting random sample rows quickly

Back to SQL land.

* The TABLESAMPLE SQL command

Although fixed in SQL standard for a while (SQL standard 2003) and implemented in PostgreSQL years ago, the TABLESAMPLE clause still seems to be quite unknown. What does it do? Well, as the name implies, it helps us to quickly select a random test subset of a table’s data. Quite handy actually when data browsing, to get a feel for the schema at hand. There are 2 variations of the command in PostgreSQL where the BERNOULLI method could be substantially slower, so if “good randomness” is not important better prefer SYSTEM.

* Good old random()

Another possibility that always works also for some very old Postgres versions and could be handy also for smallish tables and small sample percentages where TABLESAMPLE, being somewhat probabilistic, sometimes fails to select any rows at all. Warning - it will be slow for bigger tables and causes a sequential scan + expensive sort operation.

* Crazy CTID stuff

For ultra-big data warehouse type of tables where good randomness is required on a regular basis for some weird reason, one could out-smart the slow TABLESAMPLE BERNOULLI with the following hack:

If you grokked the above, you can feel good about your Postgres knowledge and have a small celebration break with soda pop or Club Mate 😀

NOTNULL

To finish off - another very simple character saving tip when doing NULL comparisons:

2 responses to “Tips and tricks to kick-start Postgres year 2020”

  1. Thank you!!! I've been looking for a way to do session variables in Postgres for a while since coming from SQL Server. Combining that with multiple Postgres result sets from Azure Data Studio (the best PG IDE I've found from a long list) and I'm having a good day.

  2. Regarding the first query: I like you utilised the format function to construct the dynamic statement instead of concatenation operations. It would be even a bit more elegantly when using %I as the format specifier as that saves you from quote_identing the parameters: format('SELECT count(*) FROM %I.%I', table_schema, table_name)
    For literals that would normally be passed through quote_nullable there is also a %L specifier.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram