CYBERTEC Logo

Composite/ combined indexes vs. separate indexes in PostgreSQL

07.2019 / Category: / Tags: |

A “composite index”, also known as “concatenated index” or a combined index, is an index on multiple columns in a table. Many people wonder what is more beneficial: Using separate or using composite indexes? Whenever we do training, consulting or support this question is high up on the agenda and many people keep asking this question. Therefore, I decided to shed some light on this question.

Which indexes should I create? Combined indexes or separate indexes?

To discuss the topic on a more practical level, I created a table consisting of three columns. Then I loaded 1 million rows and added a composite index covering all three columns:

The layout of the table is therefore as follows:

Now let's run ANALYZE to ensure that optimizer statistics are there. Usually, autovacuum will kick in and create statistics for your table -- but to make absolutely sure, running ANALYZE does not hurt in this case.

PostgreSQL will rearrange filters for you

The first important thing to observe is that PostgreSQL will try to arrange the filters in your query for you. The following query will filter on all indexed columns:

As you can see, we filtered for c, b, & a, but the optimizer changed the order of those conditions and turned it into a, b, c to make sure that the index we created suits the query. There are some important things to learn here:

  • The order of the conditions in your WHERE clause makes no difference
  • PostgreSQL will find the right indexes automatically

Also, keep in mind that PostgreSQL knows that equality is transitive and can infer conditions from that:

What you can see here is that PostgreSQL figured out automatically that a, b and c are actually the same.

Using parts of an index

However, if you have a composite index, it isn't necessary to filter on all columns. It is also ok to use just the first or both the first and second field to filter on. Here is an example:

As you can see, PostgreSQL can still use the same index. An index is a sorted list which happens to be ordered by three fields. In multi-column indexes, this ordering is a so-called &ldauo;lexicographical ordering”: the rows are intially sorted by the first index column. Rows with the same first column are sorted by the second column, and so on. It is perfectly fine to just make use of the first columns. Talking about sorted lists:

An index can also provide you with sorted data. In this case, we filter on "a" and order by the remaining two columns, "b" and "c".

When composite indexes don’t work

When DOESN'T a composite index help to speed up a query? Here is an example:

In this case, we filter on the second column. Remember: A b-tree index is nothing more than a sorted list. In our case, it's sorted by a, b, c. So naturally, we cannot usefully filter on this field. However, in some rare cases it may be that you will still see an index scan. Some people see that as proof “that it does indeed work”. But you will see an index scan for the wrong reason: PostgreSQL is able to read an index completely – not to filter but to reduce the amount of I/O it takes to read a table. If a table has many columns, it might be faster to read the index than to digest the table.

Let's simulate this type of index scan:

By telling PostgreSQL that sequential I/O is more expensive than it really is, you will see that the optimizer decides on an index scan:

However, the execution time is 63 milliseconds, which is A LOT more than if we had done this for the first column in the index.

Note that “Index Cond: (b = 10)” means something different here than in the previous examples: while before we had index scan conditions, here we have an index filter condition. It's not ideal: the two look the same in the EXPLAIN output.

Understanding bitmap index scans in PostgreSQL

PostgreSQL is able to use more than one index at the same time. This is especially important if you are using OR as shown in the next example:

What PostgreSQL will do is to decide on a bitmap scan. The idea is to first consult all the indexes to compile a list of rows / blocks, which then have to be fetched from the table (= heap). This is usually a lot better than a sequential scan. In my example, the same index is even used twice. However, in real life you might be more likely to see bitmap scans involving a variety of indexes.

Using a subset of indexes in a single SQL statement

In many cases, using too many indexes can even be counterproductive. The planner will make sure that enough indexes are chosen, but it won’t take too many. Let's take a look at the following example:

The following query filters on all columns again. This time, I didn't use a single combined index. Instead, I decided to use separate indexes, which is usually more inefficient. Here is what happens:

Let's take a closer look at the plan. The query filters on 3 columns BUT PostgreSQL only decided on two indexes. Why is that the case? We have imported 1 million rows. Each column contains 100.000 distinct values, which means that every value occurs 10 times. What is the point in fetching a couple of rows from every index even if two indexes already narrow down the result sufficiently enough? This is exactly what happens.

Optimizing min / max in SQL queries

Indexes are not only about filtering. It will also help you to mind the lowest and the highest value in a column as shown by the following SQL statement:

As you can see, PostgreSQL is pretty sophisticated. If you are looking for good performance. it certainly makes sense to see how PostgreSQL handles indexes and review your code in order to speed thing up. If you want to learn more about performance, consider checking out Laurenz Albe’s post about speeding up count(*). Also, if you are not sure, why your database is slow, check out my post on PostgreSQL database performance, which explains, how to find slow queries.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Michael Vodep
Michael Vodep
2 years ago

Thanks a lot!

Matt Bertino
Matt Bertino
3 years ago

Hey there!

I've got a question about how a “composite index” works when trying to query with a subset of the columns in order. For example

CREATE INDEX idx_data ON t_data(a, b, c);

Then a query like...


SELECT *
FROM t_data
WHERE a = 10
AND c = 10;

Would that still benefit from the index created? (excuse me if this was already covered and I misunderstood.)

laurenz
laurenz
3 years ago
Reply to  Matt Bertino

The query could use the index for the condition a = 10, but not for c = 10, because there is no condition on b.

Imagine a phone book sorted by (last name, first name): you cannot use it efficiently to search for all people with first name "Matt".

Thomas Boussekey
4 years ago

Hello,

Thank you for this interesting article!

I saw a small typo into paragraph: Using a subset of indexes in a single SQL statement

test=# CREATE INDEX idx_a ON t_data (b);
CREATE INDEX
test=# CREATE INDEX idx_a ON t_data (c);
CREATE INDEX

instead of

test=# CREATE INDEX idx_b ON t_data (b);
CREATE INDEX
test=# CREATE INDEX idx_c ON t_data (c);
CREATE INDEX

bobby_r
bobby_r
4 years ago

Thank you so much, your posts are so clear.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    5
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram