The topic of transaction performance is being as actual as relevant as ever, in spite of constant hardware improvements, we’re quite often asked how to improve in that area. But the truth is that when the most important PostgreSQL configuration parameters are already more or less tuned, it is usually really hard to magically squeeze that extra something out of a setup, without modifying also the schema.
What I’ve noticed is that a lot of people with performance issues seem to be still using “old” spinning disks. Of course there are a lot of different reasons for that (maybe some long-term server lease contracts or losing support if changing hardware configs). But for cases, in which this is done because there’s just way too much data and it would get very expensive, there might be some remedy. Most people don’t seem to realize that for OLTP systems there’s a huge win, if one can already move the indexes of the busiest tables to SSDs or some other low latency media. So here a quick overview on how to do that with some indicative numbers.
Moving indexes to fast storage
The process is quite simple.
1. Install/connect/mount the media. This is probably the hardest part.
2. Create a Postgres tablespace (superuser needed). It would then make sense to also adjust the random_page_cost parameter.
CREATE TABLESPACE fast_storage LOCATION '/some/mount' WITH (random_page_cost=1.25);
3. Move impactful (lots of columns) or existing indexes to that tablespace. NB! This will result in full locking. Also since 9.4 it’s actually possible, to move all indexes to some tablespace with “ALTER INDEX ALL IN TABLESPACE”, but this would basically mean a downtime, as everything is locked and then the moving starts. One can do it also in a more controlled/manual way, via “CREATE INDEX CONCURRENTLY … TABLESPACE … + RENAME INDEX+ DROP INDEX” or maybe use pg_squeeze/pg_repack extensions that can basically do the same.
# single index ALTER INDEX pgbench_accounts_aid_idx SET TABLESPACE fast_storage; # or all indexes on a table with a small DO block (which could be improved to with schemas) DO $$ DECLARE r record; BEGIN FOR r IN SELECT ci.relname FROM pg_class c JOIN pg_index i ON i.indrelid = c.oid JOIN pg_class ci on i.indexrelid = ci.oid AND c.relname = 'pgbench_accounts' LOOP EXECUTE 'ALTER INDEX ' || r.relname || ' SET TABLESPACE fast_storage' ; END LOOP; END $$; # or by levereging psql-s newish "gexec" (less locking so due to independent transaction) SELECT 'ALTER INDEX ' || ci.relname || ' SET TABLESPACE fast_media' FROM pg_class c JOIN pg_index i ON i.indrelid = c.oid JOIN pg_class ci on i.indexrelid = ci.oid AND c.relname = 'pgbench_accounts' \gexec
4. Optionally it might be a good idea to set this new tablespace as default schema is somewhat static.
ALTER SYSTEM SET default_tablespace TO fast_storage; select pg_reload_conf();
Trying to compare HDD vs SDD
To “visualize” the possible performance benefits (there are some for sure), I performed a small and simplistic test, comparing HDD and SDD transaction latencies with a lot of random IO – to really hit the disk a lot, I chose a very small amount of RAM (~5% of dataset fits in shared_buffers/kernel cache), but increased max_wal_size a lot so that we wouldn’t stall during the test, giving more predictable latencies. To generate random IO easily, I decided to just create 9 extra indexes on the pgbench schema – and having 10 to 20 indexes on a central OLTP table is also actually quite common. Also to only illustrate HDD vs SDD difference on multi-index update latencies, I removed other activities like WAL logging by using unlogged tables, disabling the background writer and changed the pgbench default transaction so that only the UPDATE part on the single pgbench_accounts table would be executed.
HW info: Google Compute Engine (europe-west-1), 1 vCPU, 1 GB RAM, 100GB Standard persistent disk / SSD persistent disk
Postgres info: PG 10.3, max_wal_size=50GB, checkpoint_timeout=1d, shared_buffers=256MB, bgwriter_lru_maxpages=0
SCALE=684 # estimated ~10Gi DB, see jsfiddle.net/kmoppel/6zrfwbas/ for the formula TEST_DURATION_SEC=900 CLIENTS=1 NR_OF_AID_INDEXES=8 MAX_TPS=10 pgbench --unlogged-tables -i -s $SCALE for i in $(seq 1 $NR_OF_AID_INDEXES) ; do psql -qXc "create index on pgbench_accounts (aid)" ; done psql -qXc "create index on pgbench_accounts (abalance)" # an extra index to effectively disables HOT-updates psql -qXc "checkpoint" cat << EOF > bench_upd_only.sql \set aid random(1, 100000 * :scale) \set delta random(-5000, 5000) UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid; EOF pgbench -f bench_upd_only.sql -R $MAX_TPS -T $TEST_DURATION_SEC -P $TEST_DURATION_SEC -c $CLIENTS
And the results were quite surprising actually – almost 38x difference in average update latency on 10 indexes! I somehow thought it will be a bit less, maybe 5-10x…
|Disk Type||Latency avg.||Latency Stddev.|
|HDD||141 ms||158 ms|
|SSD||3.7 ms||2.5 ms|
NB! The test doesn’t make any claims at absolute truths – I used separate Google cloud machines (same non-disk specs though), which could have different utilization levels, but to counteract, I limited the transaction rate to a very low 10 TPS not to make it a total throughput test but rather latency test, so in the end it should at least give some idea on possible performance gains. Also we can see that HDD latencies (at least on shared cloud envs) jump quite a lot on random updates, with “Latency Stddev” being bigger than “Latency avg”.