Database performance is truly important. However, when looking at performance in general people only consider the speed of SQL statements and forget the big picture. The questions now are: What is this big picture I am talking about? What is it that can make a real difference? What if not the SQL statements? More often than not the SQL part is taken care of. What people forget is latency. That is right: Network latency
Measuring the performance impact caused by latency
“tc” is a command to control network settings in the Linux kernel. It allows you to do all kinds of trickery such as adding latency, bandwidth limitations and so on. tc helps to configure the Linux kernel control groups.
To simplify configuration I decided to us a simple Python wrapper called tcconfig, which can easily be deployed using pip3:
[[email protected] ~]# pip3 install tcconfig
After downloading some Python libraries the tool is ready to use.
In the next step I want to compare the performance difference between a normal local connection and a connection which has some artificial network latency.
PostgreSQL includes a tool called pgbench which is able to provide us with a simple benchmark. In my case I am using a simple benchmark database containing just 100.000 rows:
[email protected] ~]$ createdb test [[email protected] ~]$ pgbench -i test dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data... 100000 of 100000 tuples (100%) done (elapsed 0.11 s, remaining 0.00 s) vacuuming... creating primary keys... done.
Running at full speed
The following line fires up a benchmark simulating 10 concurrent connection for 20 seconds (read-only). This is totally sufficient to proof the point her:
[[email protected] ~]$ pgbench -S -c 10 -h localhost -T 20 test starting vacuum...end. transaction type: scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 20 s number of transactions actually processed: 176300 latency average = 1.135 ms tps = 8813.741733 (including connections establishing) tps = 8815.142681 (excluding connections establishing)
As you can see my tiny virtual machine has managed to run 8813 transactions per second (TPS).
Let us see what happens when latency is added
Database performance vs. network latency
In this example we assign 10 ms of delay to the loopback device. Here is how it works:
[[email protected] ~]# tcset --device lo --delay=10
10 milliseconds does not feel like much. After all even the Google DNS server is 50ms “away” from my desktop computer:
iMac:~ hs$ ping 18.104.22.168 PING 22.214.171.124 (126.96.36.199): 56 data bytes 64 bytes from 188.8.131.52: icmp_seq=0 ttl=54 time=51.358 ms 64 bytes from 184.108.40.206: icmp_seq=1 ttl=54 time=52.628 ms 64 bytes from 220.127.116.11: icmp_seq=2 ttl=54 time=52.819 ms
If your database is running in the cloud and 10ms of network latency are added. What can go wrong? Let us take a look and see:
[[email protected] ~]$ pgbench -S -c 10 -h 10.0.1.173 -T 20 test starting vacuum...end. transaction type: scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 20 s number of transactions actually processed: 9239 latency average = 21.660 ms tps = 461.687712 (including connections establishing) tps = 462.710833 (excluding connections establishing)
Throughput has dropped 20 times. Instead of 8813 TPS we are now at 461 TPS. This is a major difference – not just a minor incident. Latency is especially painful if you want to run an OLTP application. In a data warehousing context the situation is usually not so severe because queries tend to run longer.
Let me increase latency even more:
[[email protected] ~]# tcset --device lo --delay=50 --overwrite
As you can see performance is again dropping like a stone:
[[email protected] ~]$ pgbench -S -c 10 -h 10.0.1.173 -T 20 test starting vacuum...end. transaction type: scaling factor: 1 query mode: simple number of clients: 10 number of threads: 1 duration: 20 s number of transactions actually processed: 1780 latency average = 112.641 ms tps = 88.777612 (including connections establishing) tps = 89.689824 (excluding connections establishing)
Performance has dropped 100 times. Even if we tune our database the situation is not going to change because time is not lost in the database server itself – it is lost while waiting on the database. In short: We have to fix “waiting”.
In real life latency is a real issue that is often underestimated. The same is true for indexing in general. If you want to learn more about indexing consider reading Laurenz Albe’s post on new features in PostgreSQL 12.