With the latest major version freshly released, it’s again time to gauge its performance. I’ve already been doing this for years, and I have my scripts ready, so it’s not too much work for me – but rather, for the machines. The v12 release, as always, adds quite a lot of nice stuff. Some highlights are a framework for changing storage engines, better partitioning, smaller multi-column, and repetitive value indexes, and concurrent re-indexing, to name a few. So basically, there should be quite a bit of performance improvements for OLAP / Warehouse use cases, but not so much for typical OLTP stuff. The smaller indexes should help, of course, if datasets get very big… but I guess the only way to get a real impression of it is to look at some test numbers, so feel free to jump to the summarizing table at the end of the post or get into the details by reading on.

By the way – from a code change statistics point of view, the numbers look quite similar to the v11 release, with the exception that there is an increase of about 10% in the number of commits. This is a good sign for the overall health of the project, I think 🙂

git diff --shortstat REL_11_5_STABLE..REL_12_0

3154 files changed, 317813 insertions(+), 295396 deletions(-)

git log --oneline REL_11_5..REL_12_0 | wc -l

2429

Test Queries

I decided to run 4 different test cases, each with a couple of different scale / client combinations. Scales were selected 100, 1000 and 5000 – with the intention that with 100, all data fits into shared buffers (which should provide the most accurate results), 1000 means everything fits into RAM (Linux buffer cache) for the used hardware and 5000 (4x RAM) to test if disk access was somehow improved. A tip – to quickly get the right “scaling factor” numbers for a target DB size, one could check out this post here.

The queries were all tied to or derived from the default schema generated by our old friend pgbench, and the 4 simulated test scenarios were as follows:

 

1) Default pgbench test mimicking an OLTP system (3x UPDATE, 1x SELECT, 1x INSERT)

\set aid random(1, 100000 * :scale)

\set bid random(1, 1 * :scale)

\set tid random(1, 10 * :scale)

\set delta random(-5000, 5000)

BEGIN;

UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;

SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;

UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;

INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);

END;

 

2) Default pgbench with a slightly modified schema – I decided to add a couple of extra indexes, since the release notes promised more effective storage of indexes with many duplicates and also for multi-column (compound) indexes, which surely has its cost. Extra index definitions were following:

CREATE INDEX ON pgbench_accounts (bid);

CREATE INDEX ON pgbench_history (tid);

CREATE INDEX ON pgbench_history (bid, aid);

CREATE INDEX ON pgbench_history (mtime);

 

3) pgbench read-only (–select-only flag)

\set aid random(1, 100000 * :scale)

SELECT abalance FROM pgbench_accounts WHERE aid = :aid;

 

4) Some home-brewed analytical / aggregate queries that I also used for testing the previously released PostgreSQL version, v11. The idea here is not to work with single records, but rather to scan over all tables / indexes or a subset on an index.

/* 1st let’s create a copy of the accounts table to be able to test massive JOIN-s */

CREATE TABLE pgbench_accounts_copy AS SELECT * FROM pgbench_accounts;

CREATE UNIQUE INDEX ON pgbench_accounts_copy (aid);

/* Query 1 */

SELECT avg(abalance) FROM pgbench_accounts JOIN pgbench_branches USING (bid) WHERE bid % 2 = 0;

/* Query 2 */

SELECT COUNT(DISTINCT aid) FROM pgbench_accounts;

/* Query 3 */

SELECT count(*) FROM pgbench_accounts JOIN pgbench_accounts_copy USING (aid) WHERE aid % 2 = 0;

/* Query 4 */

SELECT sum(a.abalance) FROM pgbench_accounts a JOIN pgbench_accounts_copy USING (aid) WHERE a.bid % 10 = 0;

Test hardware / software

In order to get more reliable results this time, I used a dedicated old workstation machine that I have lying around as test hardware, instead of a cloud VM . . The machine is rather low-spec, but it at least has an SSD, so for comparing differences it should be totally fine. I actually did also try to use a cloud VM first (with a dedicated CPU), but the runtimes I got were so unpredictable on different runs of the same PG version that it was impossible to say anything conclusive.

CPU: 4x Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz

RAM: 16GB

DISK: 256GB SSD

OS: Clean server install of Ubuntu 18.04 with some typical kernel parameter tuning for databases:

hdparm -W0 /dev/sda # disable write caching

vm.swappiness=1 # minimal swapping

vm.overcommit_memory=2 # no overcommit

vm.overcommit_ratio=99

For most tests, the disk parameters should not matter here either, as most used queries are quite read-heavy and data fits into shared buffers or RAM for most tests. This is actually what you want for an algorithmical comparison.

For running Postgres, I used the official Postgres project maintained [repo](https://wiki.postgresql.org/wiki/Apt) to install both 11.5 and 12.0. Concerning server settings (postgresql.conf), I left everything to defaults, except a couple of changes (see below) on both clusters, for reasons described in the comments.

shared_buffers='2GB'  # to ensure buffers our dataset

work_mem='1GB'  # increasing work_mem is common and desired for analytical queries

maintenance_work_mem=’2GB’ # to speed up `pgbench –init` plus 

autovacuum_vacuum_cost_delay = ‘20ms’ # to grant equality as v12 has it on 2ms

random_page_cost = 2  # 1.5 to 2 is a good value for slower SSD-s

jit=on    # v11 didn’t have it enabled by default

shared_preload_libraries='pg_stat_statements' # for storing/analyzing test results

logging_collector=on  # helps slightly with performance when issuing lot of log messages

Results

After running my test script for a couple of days (2h for each test case / scale / client count / PG version variation) on both clusters, here are the numbers for you to evaluate.

The scripts I used, doing all the hard work for me, can be found here by the way. For generating the percentage differences, you could use this query.

NB! Results were extracted directly from the tested PostgreSQL instances via the “pg_stat_statements” extension, so that the communication latency included in the pgbench results is not taken into account. Also note that the measured query “mean_times” are an average over various “client” parallelism counts:1-4 depending on the test. With analytical queries where “parallel query” kicks in– with up to 2 extra workers per query– it does not make sense to use more than $CPU_COUNT / 3 clients as some CPU starvation will otherwise occur (at default parallelism settings).

 

TestcaseScalev11 avg. “mean time” (ms)v12 avg. “mean_time” (ms)Mean time diff changev11 “stddev”v12 “stddev”Stddev diff change
pgbench read-onlyBuffers0.00560.00583.6 %0.0020.0020 %
RAM0.06050.0594-2.2 %0.40.43147.9 %
Disk0.35830.3572-0.3 %0.66990.69553.8 %
pgbench default UPDATEBuffers0.01960.02086.1 %0.00630.00653.2 %
RAM0.10880.1090.2 %0.75730.81567.7 %
pgbench default INSERTBuffers0.0340.040820 %0.0080.0222177.5 %
RAM0.03540.03766.20.01090.0105-3.7
pgbench extra indexes UPDATEBuffers0.02030.0213.40.00670.00693
RAM0.12930.1283-0.81.42851.1586-18.9
pgbench extra indexes INSERTBuffers0.04470.050212.30.01110.0249124.3
RAM0.04590.04692.20.01360.015211.8
Analytical Query 1Buffers551.81556.070.839.7633.22-16.5
RAM2423.132374.30-254.9551.56-2
Analytical Query 2Buffers996.08984.09-1.27.744.62-40.2
RAM7521.17589.250.962.2973.6718.3
Analytical Query 3Buffers2452.232655.378.387.0382.16-5.6
RAM16568.916995.22.6140.10138.78-0.9
Analytical Query 4Buffers876.72906.133.441.4137.49-9.5
RAM4794.994763.72-0.7823.60844.122.5
Total avg.3.313.8

* pgbench default UPDATE means UPDATE on pgbench_accounts table here. pgbench_branches / tellers differences were discardable and the table would get too big

Conclusion

First – as with all performance tests, a healthy dose of skepticism is required, since measuring things correctly is a bit of a science on its own. Especially if some SELECT-s in our case were running on average under 0.1 milliseconds and standard deviations were much bigger – this means there’s quite some randomness when timings are tight. And not to forget that the pgbench test itself is artificial – initially there’s no bloat (which is quite unrealistic in real life) and the rows are also quite “narrow”. But anyway, it would be great to hear some feedback, if you have seen some vastly different numbers in your testing, or if my test setup and assumptions (see script links above) seem fine. Why am I asking? Because for the first time in many years of me doing such tests – the average performance actually decreased slightly! Well, it’s only a modest 3.3% for the mean times and 13.8% for the deviations, so nothing too tragic. And it can of course be totally normal and expected, since the storage architecture was overhauled to be more generic to accommodate different storage engine implementations for future releases.

Also as expected, the new more space-saving index handling took the biggest toll on v12 pgbench INSERT performance (+10%)…but in the long run, of course, it will pay for itself nicely with better cache ratios / faster queries, especially if there are a lot of duplicate column values.

Another thing we can learn from the resulting query “mean times” (and this is probably even more important to remember than the 3% difference!) is that there are huge differences – roughly an order of magnitude – whether data fits into shared buffers vs memory, or memory vs disk. Thus the end with an obvious conclusion – for the best performance, investing in ample memory makes much more sense than a version upgrade!