CYBERTEC PostgreSQL Logo

Speeding up things with hint bits

08.2014 / Category: / Tags: |

Post UPDATED June 2023: PostgreSQL is a highly sophisticated relational database system capable of performing wonderful operations. But, sophistication also means that there is a bit of complexity under the surface, which is not always well understood by users. One thing people usually don't know about are hint bits.

What are hint bits?

Actually, they are an internal optimization which makes sure that visibility checks are sped up considerably. As you might know, PostgreSQL traditionally had to decide for each row whether it was visible to a user or not. The best way to speed up the process is to avoid that - and that is why hint bits were invented.

When PostgreSQL finds a row that is visible to all transactions, it sets a bit in the row header indicating this fact. Next time someone finds this row, they can avoid comparing the row visibility information to their snapshot, because the hint bit already tells them that it must be visible.

Creating some demo data

To demonstrate how things work, let's create some demo data:

A simple Perl script can create the data nicely. In this example, half a billion rows are used (big enough to demonstrate the point and yet small enough to do the test in reasonable time):

To ensure that autovacuum does not spoil our day, it is turned off for this test:

Then data is loaded into PostgreSQL. The easiest way to do that is to use “COPY … FROM PROGRAM” and to simply read things in through a pipe. It works like this:

On our system...

(some Intel i5 and a 750 GB SATA disk) it takes around 10 minutes to load the data. Note that default PostgreSQL parameters were used (no checkpoint optimizations). “top” leaves us with output as shown in the next listing:

You can clearly see that generating numbers actually causes a decent share of the CPU load. Actually, this is good - because it means that PostgreSQL is cheap, relative to the data creation part.

Hint bits: first contact …

Up until now, nothing special could be observed. Let's try to run a SELECT:

The interesting part here is not the result of the query. The really interesting part is hidden nicely below the surface.

Here is vmstat:

“bi” (= blocks in) and “bo” (= blocks out) are the two columns of interest here.

At first, things look pretty normal.

We start to read at roughly 100 MB / second, a reasonable value for a desktop PC featuring a single SATA drive. But then, things start to go south. At some point, PostgreSQL starts to write data out to disk. Why that? Well, when a row is loaded, PostgreSQL has no idea whether it is visible to others without checking visibility all over again. Therefore PostgreSQL will mark a row as “visible to everyone” when it's read for the first time. This is exactly where the I/O comes from - when rows are marked as “visible to everybody”. The advantage here is that visibility checks are sped up tremendously.

When the query is executed again, the I/O pattern is totally different:

As you can see, there is no more writing going on - just a clean 100+ MB / second. Exactly what we want to see.

The read is a lot faster now:

Wow, the time needed to perform this one has gone down dramatically! And more importantly, the execution time is pretty stable:

The theory behind it is that reads happen more frequently than writes, and therefore investing in hint bits makes an awful lot of sense, because it seriously improves run-times.

Turning to VACUUM

But there is a bit more. What if we VACUUM the table?

Here is vmstat again:

Again, some writing is going on. We are seriously bound by the disk here, as the following couple of lines taken from “top” indicate:

VACUUM takes quite a while as well:

The beauty now is that we not only know whether a single row is visible or not - VACUUM can also figure out whether an entire block is visible or not.

Again, we see an improvement in speed - but this time, it's not very big (just two seconds):

The main benefit really comes from those bits attached to a single line - and this benefit is massive.

Finally...

If you liked this article, check out the next article in this series: Speed up PostgreSQL data loading with COPY (FREEZE)

For further information on how to speed up performance in PostgreSQL, see this post about CLUSTER and HOT updates, or see Laurenz Albe's first article on HOT updates.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

2 responses to “Speeding up things with hint bits”

  1. Thanks for information about postgresql internals. I see obvious benefit when using with datawarehouse or loading large amount of data into postgresql, but in which scenario(s) the bits hints could be beneficial for OLTP database load?

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram