CYBERTEC Logo

BRIN indexes: Correlation, correlation, correlation

06.2016 / Category: / Tags: | |

Since BRIN indexes have been introduced in PostgreSQL 9.5, many people have gladly adopted this new index type. A lot has been written about this new feature and a lot of positive feedback has been reported. While BRIN indexes are clearly a success and definitely a win, some people tend to exaggerate and use them far too frequently.

Correlation, it is about correlation

BRIN indexes are cheap, but this does not mean that they are always beneficial. In case the correlation of a column is low, BRIN indexes can be no gain or even a small loss.

Here is an example showing what can happen:

We generate a PostgreSQL table containing 1 million lines.

Then we select a random row:

The sequential scan takes around 44 ms and returns exactly one row.

Now let us try the same thing using a BRIN index:

In this case, the scan is a lot faster and completes within around 3 ms. That's pretty nice.

Here is the execution plan:

As you can see, PostgreSQL does a bitmap scan to fetch the data. The number of blocks read is 128 (exactly the desired number of blocks).

When correlation goes down ...

However, the situation is quite different in case correlation goes down. Remember: A normal BRIN index calculates the minimum and the maximum value in a range of 128 blocks. In case data is sorted the index performs nicely because many 128 x 8k areas can be excluded from the scan.

The situation changes dramatically if the data is shuffled (= correlation is low). In this case, a chunk of 128 blocks (= 1 MB) will most likely contain a value close to the absolute minimum and the absolute maximum of the column.

That means the scan cannot exclude a sufficient number of chunks:

As we can see in the next listing, this query needs many more blocks than the previous one:

In this example, the query runtime skyrockets. So does the number of blocks needed.

Conclusion

BRIN indexes are only effective if a column is somewhat “sorted”. In many cases, this happens naturally. However, it is certainly not always the case.

In case you need any assistance, please feel free to contact us.
 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Christopher Browne
Christopher Browne
7 years ago

Yeah, I'm looking at adding some BRIN indexes to the Slony replication schema. In particular... https://github.com/cbbrowne/slony1-engine/commit/7a02c20cde8c8f3a54914ca6889d392074d51503

I put them onto tables that largely get only INSERTs, on columns whose values generally correlate with activity. e.g. - sequence values that are continuously being incremented, transaction IDs that are continuously being incremented. These are likely to be good cases for BRIN, but thus far, that's just a guess.

BRIN ought to be great for tables that are the result of ETL processes, where data is added in an organized order, and where old data doesn't go with great frequency.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    1
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram