Post UPDATED June 2023: PostgreSQL is a highly sophisticated relational database system capable of performing wonderful operations. But, sophistication also means that there is a bit of complexity under the surface, which is not always well understood by users. One thing people usually don't know about are hint bits.
Table of Contents
Actually, they are an internal optimization which makes sure that visibility checks are sped up considerably. As you might know, PostgreSQL traditionally had to decide for each row whether it was visible to a user or not. The best way to speed up the process is to avoid that - and that is why hint bits were invented.
When PostgreSQL finds a row that is visible to all transactions, it sets a bit in the row header indicating this fact. Next time someone finds this row, they can avoid comparing the row visibility information to their snapshot, because the hint bit already tells them that it must be visible.
To demonstrate how things work, let's create some demo data:
1 2 |
test=# CREATE TABLE t_test (a int, b int); CREATE TABLE |
A simple Perl script can create the data nicely. In this example, half a billion rows are used (big enough to demonstrate the point and yet small enough to do the test in reasonable time):
1 2 3 4 5 6 7 |
[hs@jacqueline blog]$ cat numbers.pl #!/usr/bin/perl for ($count = 1; $count <= 500000000; $count++) { print '$count $countn'; } |
1 2 |
test=# ALTER TABLE t_test SET (autovacuum_enabled = off); ALTER TABLE |
Then data is loaded into PostgreSQL. The easiest way to do that is to use “COPY … FROM PROGRAM” and to simply read things in through a pipe. It works like this:
1 2 3 4 5 6 |
test=# \timing Timing is on. test=# COPY t_test FROM PROGRAM '/path/numbers.pl'; COPY 500000000 Time: 612716.438 ms |
(some Intel i5 and a 750 GB SATA disk) it takes around 10 minutes to load the data. Note that default PostgreSQL parameters were used (no checkpoint optimizations). “top” leaves us with output as shown in the next listing:
1 2 3 4 |
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27570 hs 20 0 257m 24m 23m R 48.6 0.2 0:10.33 postgres 27574 hs 20 0 119m 1584 1252 R 31.6 0.0 0:06.66 numbers.pl 27356 hs 20 0 256m 16m 16m S 1.3 0.1 0:00.18 postgres |
You can clearly see that generating numbers actually causes a decent share of the CPU load. Actually, this is good - because it means that PostgreSQL is cheap, relative to the data creation part.
Up until now, nothing special could be observed. Let's try to run a SELECT:
1 2 3 4 5 6 7 8 9 |
test=# SELECT count(*) FROM t_test; count ----------- 500000000 (1 row) Time: 359669.393 ms |
The interesting part here is not the result of the query. The really interesting part is hidden nicely below the surface.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
[hs@jacqueline blog]$ vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 164 4968092 46412 10534028 0 0 3486 1903 3 13 1 0 91 7 0 0 0 164 4967960 46412 10534144 0 0 0 0 78 225 0 0 100 0 0 0 0 164 4967960 46420 10534144 0 0 0 8 72 203 0 0 99 1 0 0 0 164 4967960 46420 10534144 0 0 0 0 103 301 0 0 100 0 0 0 1 164 4946260 46420 10555520 0 0 10586 20 246 437 1 0 97 2 0 1 1 164 4732608 46420 10763488 0 0 104000 0 1403 2046 10 2 76 12 0 0 1 164 4517592 46420 10972988 0 0 104704 0 1415 2042 10 3 76 11 0 1 0 164 4301832 46420 11183036 0 0 105024 0 1410 2038 10 3 75 12 0 1 0 164 4082228 46420 11396740 0 0 106880 4 1449 2048 11 3 75 11 0 0 1 164 3868576 46420 11604996 0 0 104064 0 1414 2050 10 3 75 12 0 1 0 164 3650460 46420 11817164 0 0 106176 0 1415 2058 11 2 75 12 0 0 1 164 3433460 46420 12028640 0 0 105664 0 1419 2078 10 2 75 12 0 1 1 164 3262588 46420 12194508 0 0 82944 45356 1199 2209 8 3 66 23 0 1 1 164 3143176 46420 12310644 0 0 58112 43818 949 1738 6 2 75 18 0 1 1 164 3018680 46420 12432100 0 0 60736 36900 940 1760 6 2 75 17 0 1 2 164 2897284 46420 12550404 0 0 59136 29726 917 1727 6 2 75 16 0 0 2 164 2775020 46420 12669348 0 0 59392 36642 912 1746 6 2 76 17 0 0 2 164 2656848 46420 12784480 0 0 57600 36388 883 1687 6 2 75 17 0 1 1 164 2537560 46420 12900636 0 0 58048 36644 937 1755 5 2 73 19 0 |
“bi” (= blocks in) and “bo” (= blocks out) are the two columns of interest here.
We start to read at roughly 100 MB / second, a reasonable value for a desktop PC featuring a single SATA drive. But then, things start to go south. At some point, PostgreSQL starts to write data out to disk. Why that? Well, when a row is loaded, PostgreSQL has no idea whether it is visible to others without checking visibility all over again. Therefore PostgreSQL will mark a row as “visible to everyone” when it's read for the first time. This is exactly where the I/O comes from - when rows are marked as “visible to everybody”. The advantage here is that visibility checks are sped up tremendously.
1 2 3 4 5 6 7 8 9 10 |
[hs@jacqueline blog]$ vmstat 2 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 0 1 164 191076 1688 15325392 0 0 104256 0 1401 2077 8 2 75 15 0 0 1 164 183140 1688 15334056 0 0 105792 0 1394 2113 8 2 75 15 0 0 1 164 187728 1688 15329596 0 0 106880 0 1429 2121 9 2 75 15 0 0 1 164 190828 1688 15327804 0 0 105472 0 1413 2124 8 2 75 15 0 2 0 164 184008 1688 15335792 0 0 104640 0 1389 2108 8 2 75 15 0 1 0 164 188348 1688 15336880 0 0 106240 0 1466 2107 9 2 75 14 0 1 0 164 191944 1692 15337824 0 0 100226 4 1372 2060 8 2 75 15 0 |
As you can see, there is no more writing going on - just a clean 100+ MB / second. Exactly what we want to see.
1 2 3 4 5 6 7 |
test=# SELECT count(*) FROM t_test; count ----------- 500000000 (1 row) Time: 110263.714 ms |
Wow, the time needed to perform this one has gone down dramatically! And more importantly, the execution time is pretty stable:
1 |
Time: 110382.362 ms |
The theory behind it is that reads happen more frequently than writes, and therefore investing in hint bits makes an awful lot of sense, because it seriously improves run-times.
But there is a bit more. What if we VACUUM the table?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 1 1 164 161936 616 15427528 0 0 50718 356 845 1262 4 2 79 15 0 0 1 164 168136 616 15421384 0 0 87680 636 1267 2517 6 4 75 15 0 0 1 164 169624 616 15419204 0 0 88704 624 1254 2529 6 3 75 15 0 0 1 164 179296 616 15409532 0 0 90304 560 1279 2435 6 4 75 15 0 0 1 164 185868 616 15402548 0 0 89216 628 1319 1898 7 4 74 15 0 1 1 164 178924 616 15409580 0 0 88832 628 1295 1960 7 3 75 15 0 0 1 164 183636 616 15404972 0 0 80166 624 1196 1802 6 3 75 16 0 0 1 164 186240 616 15402652 0 0 91648 568 1354 1951 7 4 75 15 0 0 2 164 180660 624 15408296 0 0 89472 670 1307 1982 6 3 75 16 0 1 1 164 170120 624 15418500 0 0 72260 49908 1080 2035 5 3 70 22 0 0 2 164 164788 624 15423656 0 0 67072 22782 995 1955 4 2 70 23 0 0 2 164 168012 624 15420556 0 0 61888 22492 984 1928 4 2 73 20 0 0 2 164 164044 624 15424496 0 0 60672 29582 959 1887 4 2 73 20 0 0 2 164 172352 624 15416452 0 0 67712 22524 1053 2097 5 3 72 20 0 |
Again, some writing is going on. We are seriously bound by the disk here, as the following couple of lines taken from “top” indicate:
1 2 3 4 5 |
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 27570 hs 20 0 257m 30m 29m R 26.3 0.2 9:34.80 postgres 59 root 20 0 0 0 0 S 1.7 0.0 18:00.13 kswapd0 36 root 20 0 0 0 0 S 1.0 0.0 10:42.29 kblockd/2 27245 root 20 0 0 0 0 D 0.3 0.0 0:03.53 flush-8:16 30861 root 20 0 15036 1196 836 R 0.3 0.0 0:03.75 top |
1 2 3 |
test=# VACUUM t_test; VACUUM Time: 363773.532 ms |
The beauty now is that we not only know whether a single row is visible or not - VACUUM can also figure out whether an entire block is visible or not.
Again, we see an improvement in speed - but this time, it's not very big (just two seconds):
1 2 3 4 5 6 7 |
test=# SELECT count(*) FROM t_test; count ----------- 500000000 (1 row) Time: 108715.332 ms |
The main benefit really comes from those bits attached to a single line - and this benefit is massive.
If you liked this article, check out the next article in this series: Speed up PostgreSQL data loading with COPY (FREEZE)
For further information on how to speed up performance in PostgreSQL, see this post about CLUSTER and HOT updates, or see Laurenz Albe's first article on HOT updates.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Thanks for information about postgresql internals. I see obvious benefit when using with datawarehouse or loading large amount of data into postgresql, but in which scenario(s) the bits hints could be beneficial for OLTP database load?
Thanks. Very interesting.