In my last post I described what to expect from simple PL/pgSQL triggers in performance degradation sense, when doing some inspection/changing on the incoming row data. Conclusion for the most common “audit fields” type of use case was that we should not worry about it too much and just create those triggers. But in which use cases would make sense to start worrying a bit?

So to get more insights I conjured up some more complex trigger use cases and again measured transaction latencies on them for an extended period of time. So do please read on for some extra info on the performed tests or just jump to the concluding results table at end of article.

Default pgbench vs audit triggers for all updated tables

This was the initial test I ran for the original blog post – default pgbench transactions, with schema slightly modified to include 2 auditing columns for all tables being updated, doing 3 updates, 1 select, 1 insert (see here to see how the default transaction looks like) vs PL/PgSQL audit triggers on all 3 tables getting updates. The triggers will just set the last modification timestamp to current time and username to current user, if not already specified in the incoming row.
Results: 1.173ms vs 1.178ms i.e. <1% penalty for the version with triggers.

Single row update usecase

With multi statement transactions a lot of time is actually spent on communication over the network. To get rid of that the next test consisted of just a single update on the pgbench_accounts table (again 2 audit columns added to the schema). And then again the same with an PL/pgSQL auditing trigger enabled that sets the modification timestamp and username if left empty.
Results: 0.390ms vs 0.405ms ~ 4% penalty for the trigger version. Already a bit visible, but still quite dismissable I believe.

/* script file used for pgbench */
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

Single row update with a trigger written in C

But what it the above 4% performance degradation is not acceptable and it sums up if we are actually touching a dozen of tables (ca 60% hit)? Can we somehow shave off some microseconds?
Well one could try to write triggers in the Postgres native language of “C”! As well with optimizing normal functions it should help with triggers. But hughh, “C” you think…sounds daunting? Well…sure, it’s not gonna be all fun and play, but there a quite a lot of examples actually included in the Postgres source code to get going, see here for example.

So after some tinkering around (I’m more of a Python / Go guy) I arrived at these numbers: 0.405ms for PL/pgSQL trigger vs 0.401ms for the “C” version meaning only ~ +1% speedup! So in short – absolutely not worth the time for such simple trigger functionality. But why so little speedup against an interpreted PL language you might wonder? Yes, PL/pgSQL is kind of an interpreted language, but with a good property that execution plans and resulting prepared statements actually stay cached within one session. So if we’d use pgbench in “re-connect” mode I’m pretty sure we’d see some very different numbers.

...
	// audit field #1 - last_modified_on
	attnum = SPI_fnumber(tupdesc, "last_modified_on");

	if (attnum <= 0)
		ereport(ERROR,
				(errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION),
				 errmsg("relation \"%d\" has no attribute \"%s\"", rel->rd_id, "last_modified_on")));

	valbuf = (char*)SPI_getvalue(rettuple, tupdesc, attnum);
	if (valbuf == NULL) {
		newval = GetCurrentTimestamp();
		rettuple = heap_modify_tuple_by_cols(rettuple, tupdesc,
											 1, &attnum, &newval, &newnull);
	}
...

See here for my full “C” code.

Single row update with a trigger doing “logging insert”

Here things get a bit incomparable actually as we’re adding some new data, which is not there in the “un-triggered” version. So basically I was doing from the trigger the same as the insert portion (into pgbench_history) from the default pgbench transaction. Important to note though – although were seeing some slowdown…it’s most probably still faster that doing that insert from the user transaction as we can space couple of network bytes + the parsing (in our default pgbench case statements are always re-parsed from text vs pl/pgsql code that are parsed only once (think “prepared statements”). By the way, to test how pgbench works with prepared statements (used mostly to test max IO throughput) set the “protocol” parameter to “prepared“.
Results – 0.390ms vs 0.436ms ~ 12%. Not too bad at all given we double the amount of data!

Defaul pgbench vs 3 “logging insert” triggers

Here we basically double the amount of data written – all updated tables get a logging entry (including pgbench_accounts, which actually gets an insert already as part on normal transaction). Results – 1.173 vs 1.285 ~ 10%. Very tolerable penalties again – almost doubling the dataset here and only paying a fraction of the price! This again shows that actually the communication latency and transaction mechanics together with the costly but essential fsync during commit have more influence than a bit of extra data itself (given we don’t have tons of indexes on the data of course). For reference – full test script can be found here if you want to try it out yourself.

Summary table

Use CaseLatencies (ms)Penalty per TX (%)
Pgbench default vs with audit triggers for all 3 updated tables1.173 vs 1.1780.4%
Single table update (pgbench_accounts) vs with 1 audit trigger0.390 vs 0.4053.9%
Single table update (pgbench_accounts) vs with 1 audit trigger written in “C”0.390 vs 0.4012.8%
Single table update vs with 1 “insert logging” trigger0.390 vs 0.43611.8%
Pgbench default vs with 3 “insert logging” triggers on updated tables1.173 vs 1.2859.6%

Bonus track – trigger trivia!

* Did you know that in Postgres one can also write DDL triggers so that you can capture/reject/log structural changes for all kinds of database objects? Most prominent use case might be checking for full table re-writes during business hours.
* Also there are “statement level” triggers that are executed only once per SQL. They’re actually the default even if you don’t specify the level. And in Postgres 10 they were also extended with the “transition tables” feature, allowing you to inspect all rows changed by the statement to possibly do some summary aggregations or validation.
* When you have many triggers on a table the execution order happens alphabetically by trigger name! Additionally in case of BEFORE and INSTEAD OF triggers, the possibly-modified row returned by each trigger becomes the input to the next trigger.
* Row level BEFORE triggers are much more “cheaper” then AFTER triggers when updating a lot of rows, as they fire immediately vs at the end of the statement in which case Postgres needs to temporarily store the row state information. Situation can usually be alleviated though with some sane WHEN conditions in trigger declarations.
* And yes, it’s possible to create for example an insert trigger that inserts again into the same table that caused the trigger to fire:) Won’t then there be an infinite loop, eating up all your disk space? Yes it would…if max_stack_depth wouldn’t kick in 😉 But of course I’d advise you to keep triggers always as simple as possible.
* For writing triggers you’re not actually tied to most popular trigger language of PL/pgSQL and abovementioned “C” – at least PL/Python and PL/Perl also support triggers, and there might be some more.
* Postgres 11 will include support for triggers on partitioned tables, allowing to declare them only once! Currently one had to define them for all sub-partitions separately.

First of course the big question – should we be using good old triggers at all? Well, actually I’m not going to recommend anything here as it’s an opinionated topic:) People well-versed in databases would probably see good use cases for them whereas modern application developers would mostly say it’s an awful practice – doing some “magic stuff”™ kind of secretly. Avoiding that holy war, let’s say here that we already need to use triggers – boss’s orders. But then comes the question – should we be afraid of using triggers in Postgres due to possible performance penalties? Should we plan to beef up the hardware or do some app optimizations beforehand? From my personal experience – no, mostly nothing will change in the big picture if “used moderately”. But let’s try to generate some numbers as that’s where the truth lives…

A pgbench test scenario with triggers

All applications are generally unique, so the most critical part of usable performance tests boils down to actually setting up a more or less plausible use case. As Postgres comes bundled with the quick benchmarking tool pgbench, I usually tend to take it’s schema as a baseline and do some modifications on that based on the type of application that the customer has. In the case of deploying triggers the most usual use case is probably “auditing” – making sure on database level that we store some data on the author/reason for those changes on all rows. So to simulate such basic auditing I decided to just add two audit columns for all pgbench tables receiving updates (3 of them) in the default transaction mode. So let’s create last_modified_on (as timestamp) and last_modified_by (as text) and then fill them with triggers – basically something like that:

...
ALTER TABLE pgbench_(accounts|branches|tellers)
	ADD COLUMN last_modified_on timestamptz,
	ADD COLUMN last_modified_by text;
...
CREATE FUNCTION trg_last_modified_audit() RETURNS TRIGGER AS
$$
	IF NEW.last_modified_by IS NULL THEN
		NEW.last_modified_by = session_user;
	END IF;
	IF NEW.last_modified_on IS NULL THEN
		NEW.last_modified_on = current_timestamp;
	END IF;
	RETURN NEW;
$$ LANGUAGE plpgsql;
...

Hardware / Software

Next I booted up a moderately specced (2 CPU, 4 GB RAM, 48 GB SSD) test machine on Linode (Ubuntu 18.04) and installed the latest Postgres (v10.4) binaries from the official Postgres project managed repository leaving all postgresql.conf settings except shared_buffers (which I set at 3GB) at default. For the pgbench scaling factor number I chose 100, giving us a ~1.3GB database (see here for more how to choose those scale numbers) so that everything is basically cached and we can factor out most IO jitter – checkpoints, background writer and autovacuum are still kind of random of course, but typically they’re there also for real-life systems so not sure if removing them is a good idea.

NB! For carrying out the actual testing I then compiled the latest (11devel) pgbench to make use of repeatable test cases (the new –random-seed parameter!), initialized the schema and ran the simplest possible pgbench test for triggers/untriggered case for 2h with 3 loops. Basically something like seen below (for the full script see here).

pgbench -i -s 100
pgbench -T 7200 --random-seed=2018

Summary on simple row modification trigger performance

So the first test I did compared pgbench transactions “as is” with just 2 auditing columns added for the 3 tables getting updates vs the “triggered” case where for each of those tables also a trigger was installed that was setting the auditing timestamp/username. The avg. transaction latency results came back kind of as expected: 1.173 ms vs 1.178 ms i.e. 0.4% difference – meaning basically no difference in average transaction latency at all for transactions where 3 simple data checking/filling triggers are executed in the background!

To reiterate: if having a typical OLTP transaction touching a couple of tables, PL/PgSQL triggers containing just simple business logic can be used without further performance considerations!

Hmm…but how many simple triggers would you then need to see some noticeable runtime difference in our use case? Probably at least a dozen! In a typical short lived OLTP transaction context we’re still mostly IO (largely disk fsync speed) and especially for multi-statement transactions also network (round-trip) bound…thus worrying on some extra CPU cycles spent in triggers can be spared.

So what could be tried more to get an idea of penalties resulting from triggers? First we could make the transactions thinner by getting rid of network round trip latency – a single UPDATE on the pgbench_accounts would be good for that in our case. Then we could also let triggers insert some data into some other tables…but that’s enough content for another blog post I believe. See you soonish!

Transaction performance topic being as actual as ever in spite of constant hardware improvements, we’re quite often asked how to improve in that area. But the truth is that when the most important PostgreSQL configuration paramaters are already more or less tuned, it is mostly really hard to magically squeeze that extra something out of a setup without modifying also the schema.

What I’ve noticed though is that lot of people with performance issues seem to be still using “old” spinning disks. There are of course a lot of different reasons for that (maybe some long-term server lease contracts or losing support if changing hardware configs) but for cases where this is done because there’s just way too much data and it would get very expensive, there might be some remedy. Most people don’t seem to realize that for OLTP systems there’s a huge win if one can already move the indexes of the most busy tables to SSDs or some other low latency media. So here a quick overview how to do that here together with some indicative numbers.

Moving indexes to fast storage

The process is quite simple.

1. Install/connect/mount the media. This is the hardest part probably.

2. Create a Postgres tablespace (superuser needed). Would then make sense to also adjust the random_page_cost parameter.

CREATE TABLESPACE fast_storage LOCATION '/some/mount'
  WITH (random_page_cost=1.25);

3. Move impactful (lots of columns) or existing indexes to that tablespace. NB! This will result in full locking. Also since 9.4 it’s actually possible to move all indexes to some tablespace with “ALTER INDEX ALL IN TABLESPACE”, but this would basically mean a downtime as everything is locked and then the moving starts. One can do it also in a more controlled/manual way, via “CREATE INDEX CONCURRENTLY … TABLESPACE … + RENAME INDEX+ DROP INDEX” or maybe use pg_squeeze/pg_repack extensions that can basically do the same.

# single index
ALTER INDEX pgbench_accounts_aid_idx SET TABLESPACE fast_storage;
# or all indexes on a table with a small DO block (which could be improved to with schemas)
DO $$
DECLARE
r record;
BEGIN
  FOR r IN SELECT ci.relname FROM pg_class c
                  JOIN pg_index i ON i.indrelid = c.oid JOIN pg_class ci on i.indexrelid = ci.oid
                  AND c.relname = 'pgbench_accounts'
  LOOP
    EXECUTE 'ALTER INDEX ' || r.relname || ' SET TABLESPACE fast_storage' ;
  END LOOP;
END
$$;
# or by levereging psql-s newish "gexec" (less locking so due to independent transaction)
SELECT  'ALTER INDEX ' || ci.relname || ' SET TABLESPACE fast_media' FROM pg_class c
                  JOIN pg_index i ON i.indrelid = c.oid JOIN pg_class ci on i.indexrelid = ci.oid
                  AND c.relname = 'pgbench_accounts' \gexec

4. Optionally it also might be a good idea to set this new tablespace as default is schema is somewhat static.

ALTER SYSTEM SET default_tablespace TO fast_storage;
select pg_reload_conf();

Trying to compare HDD vs SDD

To “visualize” the possible performace benefits (there are some for sure) I performed a small and somewhat simplistic test, comparing HDD and SDD transaction latencies with a lot of random IO – to really hit the disk a lot I chose a very small amount of RAM (~5% of dataset fits in shared_buffers/kernel cache), but increased max_wal_size a lot so that we wouldn’t stall during the test, giving more predictable latencies. To generate random IO easily I decided to just create 9 extra indexes on the pgbench schema – and having 10 to 20 indexes on a central OLTP table is also actually quite common. Also to only illustrate HDD vs SDD difference on multi-index update latencies I removed other activities like WAL logging by using unlogged tables, disabling the background writer, and also changed the pgbench default transaction so that only the UPDATE part on the single pgbench_accounts table would be executed.

HW info: Google Compute Engine (europe-west-1), 1 vCPU, 1 GB RAM, 100GB Standard persistent disk / SSD persistent disk
Postgres info: PG 10.3, max_wal_size=50GB, checkpoint_timeout=1d, shared_buffers=256MB, bgwriter_lru_maxpages=0
Test script:

SCALE=684 # estimated ~10Gi DB, see jsfiddle.net/kmoppel/6zrfwbas/ for the formula
TEST_DURATION_SEC=900
CLIENTS=1
NR_OF_AID_INDEXES=8
MAX_TPS=10

pgbench --unlogged-tables -i -s $SCALE

for i in $(seq 1 $NR_OF_AID_INDEXES) ; do psql -qXc "create index on pgbench_accounts (aid)" ; done
psql -qXc "create index on pgbench_accounts (abalance)"	# an extra index to effectively disables HOT-updates

psql -qXc "checkpoint"

cat << EOF > bench_upd_only.sql
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
EOF

pgbench -f bench_upd_only.sql -R $MAX_TPS -T $TEST_DURATION_SEC -P $TEST_DURATION_SEC -c $CLIENTS

The results

And the results were quite suprising actually – almost 38x difference in average update latency on 10 indexes! I somehow thought it will be a bit less, maybe 5-10x…

Disk TypeLatency avg.Latency Stddev.
HDD141 ms158 ms
SSD3.7 ms2.5 ms

NB! The test doesn’t make any claims at absolute thruths – I used separate Google cloud machines (same non-disk specs though) which could have different utilization levels, but to counteract I limited the transaction rate to a very low 10 TPS not to make it a total throughput test but rather latency test, so in the end it should at least give some idea on possible performance gains. Also we can see that HDD latencies (at least on shared cloud envs) jump quite a lot on random updates, with “Latency Stddev” being bigger than “Latency avg”.

Pgbench is a very well known and handy built-in tool that Postgres DBAs can use for quick performance benchmarking. Main functionality/flow is super simple, but it also has some great optional features like running custom scripts and specifying different probabilities for them. One can also use bash commands to fill query variables for example.

But the thing that has always annoyed me a bit is the fact that one cannot specify the desired database or table size, but has to think in so called “scaling factor” numbers. And we know from documentation that scaling factor of “1” means 100k rows in the main data table “pgbench_accounts”. But how the does scaling factor of 100 (i.e. 10 Million bank accounts) translate to disk size? Which factor do I need to input when wanting to quickly generate a database of 50 GB to see how would random updates perform in case the dataset does not fit into RAM/shared buffers. Currently a bit of trial and error involved 🙁 So how to get rid of the guesswork and be a bit more deterministic? Well I guess one needs a formula that translates input of DB size to scaling factor!

Test data generation

One way to arrive at the magic formula would be to generate a lot of sample data for various scaling factors, measure the resulting on-disk sizes and deriving a formula out of it. That’s exactly what I’ve done.

So I hatched up the below script and booted up a GCE instance, destined to churn the disks nonstop for a bit more than a whole day as it appeared – I’m sorry little computer 🙂 The script is very simple – it runs the pgbench schema initialization with different scaling factors from 1 to 50k and stores the resulting DB/table/index sizes in a table, so that later some statistics/maths could be applied to infer a formula. NB! Postgres version used was 10.1.

psql -c "create table if not exists init_results (scale int not null, table_size_mb int not null, index_size_mb int not null, db_size_mb int not null);"

SCALES="1 10 50 100 150 200 250 300 350 400 450 500 750 1000 1500 2000 2500 3000 3500 4000 4500 5000 6000 7000 8000 9000 10000 12500 15000 17500 19999 20000 25000 30000 35000 40000 45000 50000"

for x in $SCALES ; do

pgbench -i --unlogged-tables -s $x -n &>/tmp/pgbench.log

SQL=$(cat <<-EOF

insert into init_results select

${x},

(select round(pg_table_size('pgbench_accounts') / 1024^2)),

(select round(pg_relation_size('pgbench_accounts_pkey') / 1024^2)),

(select round(pg_database_size(current_database()) / 1024^2))

EOF

)

echo "$SQL" | psql

done

Turning test data into a formula

After the script finished (25h later…note that without the “–unlogged-tables” flag it would take a lot longer), I had a nice table showing how different scaling factors translate to database size on disk, measured in MB (MiB/mebibytes to be exact, i.e. 1MB=1048576 bytes) as this is what the “psql” client is using when reporting object sizes.

So now how to turn this data around so that we get a formula to calculate scaling factors, based on input target DB size? Well I’m sure there are quite some ways and countless Statistics/Data Mining techniques could be applied but as Data Science is not my strongest skill I thougt I’ll first take the simplest possible road of using built in statistics functions of a spreadsheet program and see how it fares. If it the estimation doesn’t look good I could look for something more sophisticated, scikit-learn or such. But luckily it worked out quite well!

Simple regression analysis with Libreoffice

Libreoffice (my office suite of choice) is nowhere near MS Excel generally but it does have some quite good Statistical Regression functionalities built in and one has options to calculate regression constants together with accuracy coefficents or calculate the formulas and draw them on charts as “Trend Lines”. I went the visual way. It goes something like that – create a normal chart, activate it with a doubleclick and then right click on some data point on the chart and select “Insert Trend Line” from the appearing popup menu. On the next screen one can choose if the “to be fitted” formula for your data should be a simple linear one or a logarithmic/exponential/polynomial. In the case of pgbench data at hand though, after trying out all of them, it appeared that there was no real extra accuracy from the more complex formulas, so I decided to apply KISS here, meaning a simple linear formula. If you look at the script you’ll see that we actually gathered also size data for the “pgbench_accounts” table and index data, thus for completeness I also generated the formulas for calculating scaling factors from target “pgbench_accounts” table/index sizes, but mostly it should not be of interest as the prominent table makes up 99% of the DB size.

The magic formula

So without further ado, below are the resulting formulas and a graph showing how pgbench scale changes according to DB size. I also created a small JSFiddle that you can bookmark here if you happen to work with pgbench a lot.

NB! Accuracy here is not 100% perfect of course as there are some non-linear components (pgbench_accounts.aid changes to int8 from scale 20k, index leaf density varies minimally) but it is really “good enough”, meaning <1% accuracy error.

Hope you find it useful and would be nice if something similar ends up in “pgbench” tool itself also one day.

Target objectScale Formula
DB0.0669 * DB_Size_Target_MB – 0.5
Table (pgbench_accounts)0.0781 * Table_Size_Target_MB
Index (pgbench_accounts_pkey)0.4668 * Index_Size_Target_MB

 

DB size to Scale graph

Beginning of the year happens to be luckily somewhat slow at customers, so last couple of weeks I’ve had the chance to work on our Open Source PostgreSQL monitoring tool called pgwatch2 and implement some stuff that was in the queue. Changes include only a couple of fixes and a lot of new features… so hell, I’m calling it “Feature Pack 3″… as we’ve had some bigger additions in the past also. Read on for a more detailed overview on the most important new stuff.

Becoming “Enterprizy”

Focus word this time could be “Enterprise”. Meaning – firstly trying to make pgwatch2 easier to deploy for larger companies who maybe need to think about security or container orchestration and secondly also adding robustness. Security even got some special attention – now there’s a pgwatch2 version suitable for running on OpenShift, i.e. Docker process runs under an unprivileged user. But there are also quite some new dashboards (see screenshots at the end), like the long-awaited “Top N queries”, that should delight all “standard” users and also some other smaller UI improvements.

And please do let us know on Github if you’re still missing something in the tool or are having difficulties with something – I think myself I’ve lost the ability to look at the tool with beginner eyes. Thus – any feedback would be highly appreciated!

Project Github link – here.

Version 1.3 full changelog – here.

Most important changes for v1.3.0

  • “Non-root” Docker support

Suitable for example for OpenShift deployments.

  • Explicit Docker volumes added to Postgres, Grafana, InfluxDB and pgwatch2 config

Using volumes was of course also possible before, but people didn’t seem to think about it, so now it’s more explicit. Volumes helps a lot for long term Docker deployments as it makes updating to new image versions quite easy compared to dump/restore.

  • Dockerfiles for deploying “non-root” components separately

These can be used to do some advanced orchestrated setups. See the “docker” folder for details. There are also latest images available on Docker Hub, but it’s meant more for DIY still.

  • Better Grafana access control vie env. variables

Now possible to turn off anononymous access and set admin user/password.

  • New “AWS CloudWatch overview” dashboard

One can now easily monitor/alert on a mix of on-prem and AWS RDS managed DBs.

  • New dashboard for PgBouncer statistics

Visualizes PgBouncer “SHOW STATS” commands. NB! Requires config DB schema
change for existing setups as a new “datasource type” field was introduced.

  • New dashboard for “Top N” time consuming/frequent/slowest/IO-hungry queries added

Now one can see based on pg_stat_statements info, exactly which queries are the costliest (4 criterias available) over a user selected timeperiod. Lot’s of people have asked for that, so please, enjoy!

  • New dashboard – “Biggest relations treemap”

Helps to visually highlight biggest tables/indexes. Uses a custom plugin for Grafana.

  • InfluxDB HA support added, i.e. writing metrics to 2 independent DBs

Metrics gathering daemon can now store all gathered metrics in 2 independent InfluxDBs…so kind of homebrewn clustering. For real clustering one needs the commercial Influx version so be sure to check it out also if needing extra robustness, seems like a great product.

  • Gatherer daemon robustness – a ringbuffer of max 100k metrics datapoints introduced

Previously if Influx was down metrics were gathered and saved in daemon memory till things blew up. 100k datasets translates into ~ 2GB of RAM.

  • InfluxDB metrics retention period now configurable via ENV

Variable is called PW2_IRETENTIONDAYS. Default is 90 days.

  • “Change detection” doesn’t expect superuser rights on the monitored DB anymore
  • Web UI security – respecting the “only SSL connections” ENV variable also for Postgres
  • Web UI security – now possible to disable exposing of component logs
  • README improvements – a new section on custom deployments
  • README improvements – a new docu on all available ENV variables, per subcomponent

Screenshots of new Dashboards

Stat Statements Top
Stat Statements Top

PgBouncer Stats
PgBouncer Stats

Biggest relations
Biggest relations

AWS RDS overview
AWS RDS overview

Something from the security realm this time – normally this side of the “database world” doesn’t get too much attention and “love” because it’s difficult/tedious and requires some deep understanding of how the system works…but ignoring security as we know can catch up with you badly.

Anyways, not telling a real horror story here though, but some weeks ago when a friend who’s a “casual DBA” (install, start/stop, backup/restore) asked for help with a situation where a column was “leaking” data I had to think a bit longer as usual to find an appropriate place where to fix this issue. The solution I suggested in the end included making use of the Rule System of PostgreSQL. I hadn’t seen rules used in years myself (one should generally try to avoid having some stuff happening behind the scenes) and had to refresh on them a bit myself, but this was the correct use case for applying them I think and thought I’d share it as well then.

The security leak

So the application they had was a bought in “black box” webapp that did some quite trivial CRUD on some data forms and all was fine…until they noticed that on some form, some column with address data was visible to all users of the system instead of a specific login role. As said they couldn’t immediately change the code and were looking for a quick fix to hide the data in that column altogether for all users on the DB side (they could do changes there), until the app gets properly fixed. So what means/tricks are available for such kind of stuff? Not too many and they have mostly implications…but let’s look at them.

Available options on restricting column access

  • Column-level privileges

When data is access by different login roles, normally the best way to limit access on single columns would be to use the “column privileges” feature – just do a “REVOKE SELECT” followed by “GRANT SELECT (col1, col2,…)”, leaving out the “secret” columns from the list. With our app though it wouldn’t work as only one backend role was used, and all the SELECTs on that table would start to error out also on other forms showing the column – SQL needs to be adjusted. A no go.

  • Rename the column, create a new (nullable) column with the same name

This would be the easiest trick (very easily rollbackable) for a temporary fix…but would only work when no new data is being added – in that case though new customer registrations were still coming in from the web. A no go.

  • A view instead of a table

Renaming the old table and creating a simple view with the same name, including all the wanted columns (and adding appropriate GRANTs, for example copying the existing ones with “pg_dump -s -t problem_table | grep ^GRANT”) is relatively easy – we could replace our “secret” column’s value for example with an empty string and all would be fine…until we again get some INSERT-s or UPDATE-s. Then we would see an error like below, although simple DML in itself is allowed on views by default in Postgres.

ERROR:  cannot insert into column "leaky_data" of view "problem_table"
DETAIL:  View columns that are not columns of their base relation are not updatable.

So how to get around this INSERT/UPDATE problem?

  • A view instead of a table with Rules!

To overcome the errors on adding/changing data we need to re-route the data into the “base table” of our view. This can easily be done with rules and this would be a good fit for this use case. Read on for a sample.

The solution

So let’s say this is our leaking table:

CREATE TABLE  problem_table(id int, leaky_data text);

And now we want to just replace all values in leaky_data column with empty strings. So we need to write something like that:

BEGIN;

ALTER TABLE problem_table RENAME TO problem_table_real;

CREATE VIEW problem_table AS SELECT id, ''::text AS leaky_data FROM problem_table_real;

CREATE RULE problem_table_ins AS ON INSERT TO problem_table
    DO INSTEAD
    INSERT INTO problem_table_real VALUES (
           NEW.id,
           NEW.leaky_data
    );

CREATE RULE problem_table_upd AS ON UPDATE TO problem_table
    DO INSTEAD
    UPDATE  problem_table_real
       SET id = NEW.id,
               leaky_data = NEW.leaky_data
     WHERE id = NEW.id;

COMMIT;

Now indeed it does what we want:


INSERT INTO problem_table(id, leaky_data) VALUES (1, 'secret');

SELECT * FROM problem_table;

 id │ leaky_data 
────┼────────────
  1 │ 
(1 row)

NB! When the application is using constructs like “UPDATE … RETURNING *” it’s also possible to take care of the loophole there in the Rule definition, by adding the “RETURNING …, ‘’::text AS leaky_data” part. Hope it helps!

A quick performance tip for all those wrestling with occasional un-performant queries in PostgreSQL. There’s one quite simple trick available that many don’t know about, that can be applied at “low cost” when slow queries are caused by poor planner estimates for your filter conditions. Read on for details.

The problem

So the thing with Postgres statistics is that you basically lose them as soon as you transform the data column in some way in your query – most prominent such example would most probably be something like “SELECT … WHERE date_trunc(‘day’, some_indexed_timestamp_column) = ‘2017-01-01″…and gone is the (imaginary) index and column statistics. That query could of course be easily rewritten so that the performance doesn’t explode, but here also a functional index would make sense when rewrite is not possible or the query is executed often and is selective enough. But for queries/filters that are used very rarely a full functional index might not be ideal due to performance side-effects (index maintenance slows down updates, deletes, inserts a bit)…so welcome our performance helper of the day – BRIN functional indexes!

Example use case

So now let’s paint it red with a dummy example: let’s say we want to select all ID’s that are dividable by 10 out of 1 million sequentially growing IDs. Let’s create the test data and look at the resulting query plan.

krl@postgres=# CREATE TABLE t_event AS SELECT generate_series(1, 1e6) id;
SELECT 1000000

krl@postgres=# CREATE INDEX ON t_event (id);
CREATE INDEX

krl@postgres=# ANALYZE t_event ;
ANALYZE

krl@postgres=# EXPLAIN SELECT * FROM t_event WHERE id % 10 = 0;
                          QUERY PLAN                          
──────────────────────────────────────────────────────────────
 Seq Scan on t_event  (cost=0.00..19425.00 <strong>rows=5000</strong> width=6)
   Filter: ((id % '10'::numeric) = '0'::numeric)
(2 rows)

What do we see? Well, we see that Postgres expects to get only 5000 rows but actually we know that there would be 100k rows matched! Such a misestimation of 20x could already end really badly for larger datasets. And where does the number 5000 come from? Now some “insider knowledge” is required – here one just needs to know that when Postgres doesn’t have exact statistics for a column filter (as we actually transformed our ID column that we have statistics on), it by default figures that something between 0.33333 and 1% of data will be returned, depending on datatype and filter type (normal/join) – here 0.5%. So a constant…which obviously doesn’t work too well for our less selective filter. But OK, how to improve the situation?

Using functional BRIN indexes for better row amount estimations

So now the trick – let’s create our BRIN functional index and see how it changes our original query plan row estimate. BRIN by the way stands in short for Block Range (a.k.a. Min/Max) index – a lossy (table re-check needed) index type introduced in Postgres 9.5, generally meant for big amounts of data, most effective when the indexed columns are appearing in the table also somewhat naturally ordered. So let’s create one, rebuild statistics and let’s do another EXPLAIN.

krl@postgres=# CREATE INDEX ON t_event USING brin ((id % 10));
CREATE INDEX

krl@postgres=# ANALYZE t_event ;
ANALYZE

krl@postgres=# EXPLAIN SELECT * FROM t_event WHERE id % 10 = 0;
                          QUERY PLAN                           
───────────────────────────────────────────────────────────────
 Seq Scan on t_event  (cost=0.00..19425.00 <strong>rows=98467</strong> width=6)
   Filter: ((id % '10'::numeric) = '0'::numeric)
(2 rows)

And indeed, the estimation of 98467 matching rows looks now a lot better – quite exact actually!
Here the database of course still decided for a sequential scan…but at least it did it based on right assumptions (based on row count and table location correlation, see below for the corresponding pg_stats entry) and after a quick check, forcing Postgres to use the index, I indeed saw that it was faster with a scan. And the efficiency gain compared to a normal functional index (as the index wouldn’t be used anyways, just the index statistics) – 384x less space on the disc + faster data modifications as BRIN indexes need only maintenance when min/max thereshold is crossed for a range of blocks!

Also note that for our case the row estimation worked so well as we have less than default_statistics_target (100 by default) unique values in our derived column so that Postgres has more or less exact appearance frequencies available for single values.

Hope this trick will serve you well 🙂

krl@postgres=# \di+ t_event_*
                             List of relations
 Schema │       Name        │ Type  │ Owner │  Table  │ Size  │ Description 
────────┼───────────────────┼───────┼───────┼─────────┼───────┼─────────────
 public │ t_event_expr_idx  │ index │ krl   │ t_event │ 21 MB │ 
 public │ t_event_expr_idx1 │ index │ krl   │ t_event │ <strong>56 kB</strong> │ 
 public │ t_event_id_idx    │ index │ krl   │ t_event │ 21 MB │ 
(3 rows)

krl@postgres=# SELECT * FROM pg_stats WHERE tablename = 't_event_expr_idx1';
─[ RECORD 1 ]──────────┬─────────────────────────────────────────────────────────────────────────────────────
schemaname             │ public
tablename              │ t_event_expr_idx1
attname                │ expr
inherited              │ f
null_frac              │ 0
avg_width              │ 7
n_distinct             │ 10
most_common_vals       │ {2,3,7,6,0,1,4,5,8,9}
most_common_freqs      │ {0.1033,0.102233,0.1013,0.1012,0.1005,0.100033,0.0993,0.0978333,0.0978333,0.0964667}
histogram_bounds       │ ¤
correlation            │ 0.0950125
most_common_elems      │ ¤
most_common_elem_freqs │ ¤
elem_count_histogram   │ ¤