Transaction performance topic being as actual as ever in spite of constant hardware improvements, we’re quite often asked how to improve in that area. But the truth is that when the most important PostgreSQL configuration paramaters are already more or less tuned, it is mostly really hard to magically squeeze that extra something out of a setup without modifying also the schema.

What I’ve noticed though is that lot of people with performance issues seem to be still using “old” spinning disks. There are of course a lot of different reasons for that (maybe some long-term server lease contracts or losing support if changing hardware configs) but for cases where this is done because there’s just way too much data and it would get very expensive, there might be some remedy. Most people don’t seem to realize that for OLTP systems there’s a huge win if one can already move the indexes of the most busy tables to SSDs or some other low latency media. So here a quick overview how to do that here together with some indicative numbers.

Moving indexes to fast storage

The process is quite simple.

1. Install/connect/mount the media. This is the hardest part probably.

2. Create a Postgres tablespace (superuser needed). Would then make sense to also adjust the random_page_cost parameter.

CREATE TABLESPACE fast_storage LOCATION '/some/mount'
  WITH (random_page_cost=1.25);

3. Move impactful (lots of columns) or existing indexes to that tablespace. NB! This will result in full locking. Also since 9.4 it’s actually possible to move all indexes to some tablespace with “ALTER INDEX ALL IN TABLESPACE”, but this would basically mean a downtime as everything is locked and then the moving starts. One can do it also in a more controlled/manual way, via “CREATE INDEX CONCURRENTLY … TABLESPACE … + RENAME INDEX+ DROP INDEX” or maybe use pg_squeeze/pg_repack extensions that can basically do the same.

# single index
ALTER INDEX pgbench_accounts_aid_idx SET TABLESPACE fast_storage;
# or all indexes on a table with a small DO block (which could be improved to with schemas)
DO $$
DECLARE
r record;
BEGIN
  FOR r IN SELECT ci.relname FROM pg_class c
                  JOIN pg_index i ON i.indrelid = c.oid JOIN pg_class ci on i.indexrelid = ci.oid
                  AND c.relname = 'pgbench_accounts'
  LOOP
    EXECUTE 'ALTER INDEX ' || r.relname || ' SET TABLESPACE fast_storage' ;
  END LOOP;
END
$$;
# or by levereging psql-s newish "gexec" (less locking so due to independent transaction)
SELECT  'ALTER INDEX ' || ci.relname || ' SET TABLESPACE fast_media' FROM pg_class c
                  JOIN pg_index i ON i.indrelid = c.oid JOIN pg_class ci on i.indexrelid = ci.oid
                  AND c.relname = 'pgbench_accounts' \gexec

4. Optionally it also might be a good idea to set this new tablespace as default is schema is somewhat static.

ALTER SYSTEM SET default_tablespace TO fast_storage;
select pg_reload_conf();

Trying to compare HDD vs SDD

To “visualize” the possible performace benefits (there are some for sure) I performed a small and somewhat simplistic test, comparing HDD and SDD transaction latencies with a lot of random IO – to really hit the disk a lot I chose a very small amount of RAM (~5% of dataset fits in shared_buffers/kernel cache), but increased max_wal_size a lot so that we wouldn’t stall during the test, giving more predictable latencies. To generate random IO easily I decided to just create 9 extra indexes on the pgbench schema – and having 10 to 20 indexes on a central OLTP table is also actually quite common. Also to only illustrate HDD vs SDD difference on multi-index update latencies I removed other activities like WAL logging by using unlogged tables, disabling the background writer, and also changed the pgbench default transaction so that only the UPDATE part on the single pgbench_accounts table would be executed.

HW info: Google Compute Engine (europe-west-1), 1 vCPU, 1 GB RAM, 100GB Standard persistent disk / SSD persistent disk
Postgres info: PG 10.3, max_wal_size=50GB, checkpoint_timeout=1d, shared_buffers=256MB, bgwriter_lru_maxpages=0
Test script:

SCALE=684 # estimated ~10Gi DB, see jsfiddle.net/kmoppel/6zrfwbas/ for the formula
TEST_DURATION_SEC=900
CLIENTS=1
NR_OF_AID_INDEXES=8
MAX_TPS=10

pgbench --unlogged-tables -i -s $SCALE

for i in $(seq 1 $NR_OF_AID_INDEXES) ; do psql -qXc "create index on pgbench_accounts (aid)" ; done
psql -qXc "create index on pgbench_accounts (abalance)"	# an extra index to effectively disables HOT-updates

psql -qXc "checkpoint"

cat << EOF > bench_upd_only.sql
\set aid random(1, 100000 * :scale)
\set delta random(-5000, 5000)
UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
EOF

pgbench -f bench_upd_only.sql -R $MAX_TPS -T $TEST_DURATION_SEC -P $TEST_DURATION_SEC -c $CLIENTS

The results

And the results were quite suprising actually – almost 38x difference in average update latency on 10 indexes! I somehow thought it will be a bit less, maybe 5-10x…

Disk TypeLatency avg.Latency Stddev.
HDD141 ms158 ms
SSD3.7 ms2.5 ms

NB! The test doesn’t make any claims at absolute thruths – I used separate Google cloud machines (same non-disk specs though) which could have different utilization levels, but to counteract I limited the transaction rate to a very low 10 TPS not to make it a total throughput test but rather latency test, so in the end it should at least give some idea on possible performance gains. Also we can see that HDD latencies (at least on shared cloud envs) jump quite a lot on random updates, with “Latency Stddev” being bigger than “Latency avg”.