CYBERTEC PostgreSQL Logo

PostgreSQL v12 initial query performance impressions

11.2019 / Category: , / Tags: |

By Kaarel Moppel

 

With the latest major version freshly released, it’s again time to gauge its performance. I’ve already been doing this for years, and I have my scripts ready, so it’s not too much work for me - but rather, for the machines. The v12 release, as always, adds quite a lot of nice stuff. Some highlights are a framework for changing storage engines, better partitioning, smaller multi-column, and repetitive value indexes, and concurrent re-indexing, to name a few. So basically, there should be quite a bit of performance improvements for OLAP / Warehouse use cases, but not so much for typical OLTP stuff. The smaller indexes should help, of course, if datasets get very big... but I guess the only way to get a real impression of it is to look at some test numbers, so feel free to jump to the summarizing table at the end of the post or get into the details by reading on.

By the way - from a code change statistics point of view, the numbers look quite similar to the v11 release, with the exception that there is an increase of about 10% in the number of commits. This is a good sign for the overall health of the project, I think 🙂

Test Queries

I decided to run 4 different test cases, each with a couple of different scale / client combinations. Scales were selected 100, 1000 and 5000 – with the intention that with 100, all data fits into shared buffers (which should provide the most accurate results), 1000 means everything fits into RAM (Linux buffer cache) for the used hardware and 5000 (4x RAM) to test if disk access was somehow improved. A tip - to quickly get the right “scaling factor” numbers for a target DB size, one could check out this post here.

The queries were all tied to or derived from the default schema generated by our old friend pgbench, and the 4 simulated test scenarios were as follows:

 

1) Default pgbench test mimicking an OLTP system (3x UPDATE, 1x SELECT, 1x INSERT)

 

2) Default pgbench with a slightly modified schema - I decided to add a couple of extra indexes, since the release notes promised more effective storage of indexes with many duplicates and also for multi-column (compound) indexes, which surely has its cost. Extra index definitions were following:

 

3) pgbench read-only (--select-only flag)

 

4) Some home-brewed analytical / aggregate queries that I also used for testing the previously released PostgreSQL version, v11. The idea here is not to work with single records, but rather to scan over all tables / indexes or a subset on an index.

Test hardware / software

In order to get more reliable results this time, I used a dedicated old workstation machine that I have lying around as test hardware, instead of a cloud VM . . The machine is rather low-spec, but it at least has an SSD, so for comparing differences it should be totally fine. I actually did also try to use a cloud VM first (with a dedicated CPU), but the runtimes I got were so unpredictable on different runs of the same PG version that it was impossible to say anything conclusive.

CPU: 4x Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz

RAM: 16GB

DISK: 256GB SSD

OS: Clean server install of Ubuntu 18.04 with some typical kernel parameter tuning for databases:

hdparm -W0 /dev/sda # disable write caching

vm.swappiness=1 # minimal swapping

vm.overcommit_memory=2 # no overcommit

vm.overcommit_ratio=99

For most tests, the disk parameters should not matter here either, as most used queries are quite read-heavy and data fits into shared buffers or RAM for most tests. This is actually what you want for an algorithmical comparison.

For running Postgres, I used the official Postgres project maintained [repo](https://wiki.postgresql.org/wiki/Apt) to install both 11.5 and 12.0. Concerning server settings (postgresql.conf), I left everything to defaults, except a couple of changes (see below) on both clusters, for reasons described in the comments.

Results

After running my test script for a couple of days (2h for each test case / scale / client count / PG version variation) on both clusters, here are the numbers for you to evaluate.

The scripts I used, doing all the hard work for me, can be found here by the way. For generating the percentage differences, you could use this query.

NB! Results were extracted directly from the tested PostgreSQL instances via the “pg_stat_statements” extension, so that the communication latency included in the pgbench results is not taken into account. Also note that the measured query “mean_times” are an average over various “client” parallelism counts:1-4 depending on the test. With analytical queries where “parallel query” kicks in-- with up to 2 extra workers per query-- it does not make sense to use more than $CPU_COUNT / 3 clients as some CPU starvation will otherwise occur (at default parallelism settings).

 

Testcase Scale v11 avg. “mean time” (ms) v12 avg. “mean_time” (ms) Mean time diff change v11 “stddev” v12 “stddev” Stddev diff change
pgbench read-only Buffers 0.0056 0.0058 3.6 % 0.002 0.002 0 %
RAM 0.0605 0.0594 -2.2 % 0.4 0.4314 7.9 %
Disk 0.3583 0.3572 -0.3 % 0.6699 0.6955 3.8 %
pgbench default UPDATE Buffers 0.0196 0.0208 6.1 % 0.0063 0.0065 3.2 %
RAM 0.1088 0.109 0.2 % 0.7573 0.8156 7.7 %
pgbench default INSERT Buffers 0.034 0.0408 20 % 0.008 0.0222 177.5 %
RAM 0.0354 0.0376 6.2 0.0109 0.0105 -3.7
pgbench extra indexes UPDATE Buffers 0.0203 0.021 3.4 0.0067 0.0069 3
RAM 0.1293 0.1283 -0.8 1.4285 1.1586 -18.9
pgbench extra indexes INSERT Buffers 0.0447 0.0502 12.3 0.0111 0.0249 124.3
RAM 0.0459 0.0469 2.2 0.0136 0.0152 11.8
Analytical Query 1 Buffers 551.81 556.07 0.8 39.76 33.22 -16.5
RAM 2423.13 2374.30 -2 54.95 51.56 -2
Analytical Query 2 Buffers 996.08 984.09 -1.2 7.74 4.62 -40.2
RAM 7521.1 7589.25 0.9 62.29 73.67 18.3
Analytical Query 3 Buffers 2452.23 2655.37 8.3 87.03 82.16 -5.6
RAM 16568.9 16995.2 2.6 140.10 138.78 -0.9
Analytical Query 4 Buffers 876.72 906.13 3.4 41.41 37.49 -9.5
RAM 4794.99 4763.72 -0.7 823.60 844.12 2.5
Total avg. 3.3 13.8

* pgbench default UPDATE means UPDATE on pgbench_accounts table here. pgbench_branches / tellers differences were discardable and the table would get too big

Conclusion

First - as with all performance tests, a healthy dose of skepticism is required, since measuring things correctly is a bit of a science on its own. Especially if some SELECT-s in our case were running on average under 0.1 milliseconds and standard deviations were much bigger - this means there’s quite some randomness when timings are tight. And not to forget that the pgbench test itself is artificial - initially there’s no bloat (which is quite unrealistic in real life) and the rows are also quite “narrow”. But anyway, it would be great to hear some feedback, if you have seen some vastly different numbers in your testing, or if my test setup and assumptions (see script links above) seem fine. Why am I asking? Because for the first time in many years of me doing such tests - the average performance actually decreased slightly! Well, it’s only a modest 3.3% for the mean times and 13.8% for the deviations, so nothing too tragic. And it can of course be totally normal and expected, since the storage architecture was overhauled to be more generic to accommodate different storage engine implementations for future releases.

Also as expected, the new more space-saving index handling took the biggest toll on v12 pgbench INSERT performance (+10%)...but in the long run, of course, it will pay for itself nicely with better cache ratios / faster queries, especially if there are a lot of duplicate column values.

Another thing we can learn from the resulting query “mean times” (and this is probably even more important to remember than the 3% difference!) is that there are huge differences - roughly an order of magnitude - whether data fits into shared buffers vs memory, or memory vs disk. Thus the end with an obvious conclusion - for the best performance, investing in ample memory makes much more sense than a version upgrade!

0 0 votes
Article Rating
Subscribe
Notify of
guest
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
disqus_gd6d0lw7cB
disqus_gd6d0lw7cB
4 years ago

How do you know if the ram is fully utilised and needs to "upgrade" your ram for improved performance ?

Kaarel
Kaarel
4 years ago

@disqus_gd6d0lw7cB:disqus A good question...not a straightforward thing to determine from Postgres side, maybe only if running with very high shared_buffers (almost all RAM) then the pg_stat_database blks_hit / blks_read can tell that. The 3rd party pg_stat_kcache extension can also help...but otherwise it comes down to using some monitoring tools on the OS side. PS our monitoring tool pgwatch2 has also support for that via Python psutil - see a sample screenshot here - https://raw.githubusercontent.com/cybertec-postgresql/pgwatch2/master/screenshots/system_stats_psutil.png

Peter Geoghegan
Peter Geoghegan
4 years ago

I suggest running the benchmark with "-M prepared" to get a less noisy figure. The overhead of parsing and planning is quite noticeable with pgbench these days.

Kaarel
Kaarel
4 years ago

Thanks for the tip! I've used this -M flag when stress testing disks as IO. Hmm, would be interesting to know if real-world application use on average more prepared statements or plain protocol. In Java world probably more prepared statements.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram