Adding an index can decrease SELECT performance

06.2018 | Category: | Tags: index , performance

06.2018

Category:

Tags: index , performance

A bad query plan ... PostgreSQL Index Performance — © Laurenz Albe 2018

Table of Contents

We all know that you have to pay a price for a new index you create — data modifying operations will become slower, and indexes use disk space. That's why you try to have no more indexes than you actually need.

But most people think that SELECT performance will never suffer from a new index. The worst that can happen is that the new index is not used.

However, this is not always true, as I have seen more than once in the field. I'll show you such a case and tell you what you can do about it.

An example

We will experiment with this table:

CREATE TABLE skewed (
   sort        integer NOT NULL,
   category    integer NOT NULL,
   interesting boolean NOT NULL
);

INSERT INTO skewed
   SELECT i, i%1000, i>50000
   FROM generate_series(1, 1000000) i;

CREATE INDEX skewed_category_idx ON skewed (category);

VACUUM (ANALYZE) skewed;

1

2

3

4

5

6

7

8

9

10

11

12

13

CREATE TABLE skewed (

sort integer NOT NULL,

category integer NOT NULL,

interesting boolean NOT NULL

);

INSERT INTO skewed

SELECT i, i%1000, i>50000

FROM generate_series(1, 1000000) i;

CREATE INDEX skewed_category_idx ON skewed (category);

VACUUM (ANALYZE) skewed;

We want to find the first twenty interesting rows in category 42:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM skewed
WHERE interesting AND category = 42
ORDER BY sort
LIMIT 20;

1

2

3

4

5

EXPLAIN (ANALYZE, BUFFERS)

SELECT * FROM skewed

WHERE interesting AND category = 42

ORDER BY sort

LIMIT 20;

This performs fine:

                             QUERY PLAN
--------------------------------------------------------------------
 Limit  (cost=2528.75..2528.80 rows=20 width=9)
        (actual time=4.548..4.558 rows=20 loops=1)
   Buffers: shared hit=1000 read=6
   ->  Sort  (cost=2528.75..2531.05 rows=919 width=9)
             (actual time=4.545..4.549 rows=20 loops=1)
         Sort Key: sort
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=1000 read=6
         ->  Bitmap Heap Scan on skewed
                        (cost=19.91..2504.30 rows=919 width=9)
                        (actual time=0.685..4.108 rows=950 loops=1)
               Recheck Cond: (category = 42)
               Filter: interesting
               Rows Removed by Filter: 50
               Heap Blocks: exact=1000
               Buffers: shared hit=1000 read=6
               ->  Bitmap Index Scan on skewed_category_idx
                        (cost=0.00..19.68 rows=967 width=0)
                        (actual time=0.368..0.368 rows=1000 loops=1)
                     Index Cond: (category = 42)
                     Buffers: shared read=6
 Planning time: 0.371 ms
 Execution time: 4.625 ms

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

QUERY PLAN

--------------------------------------------------------------------

Limit (cost=2528.75..2528.80 rows=20 width=9)

(actual time=4.548..4.558 rows=20 loops=1)

Buffers: shared hit=1000 read=6

-> Sort (cost=2528.75..2531.05 rows=919 width=9)

(actual time=4.545..4.549 rows=20 loops=1)

Sort Key: sort

Sort Method: top-N heapsort Memory: 25kB

Buffers: shared hit=1000 read=6

-> Bitmap Heap Scan on skewed

(cost=19.91..2504.30 rows=919 width=9)

(actual time=0.685..4.108 rows=950 loops=1)

Recheck Cond: (category = 42)

Filter: interesting

Rows Removed by Filter: 50

Heap Blocks: exact=1000

Buffers: shared hit=1000 read=6

-> Bitmap Index Scan on skewed_category_idx

(cost=0.00..19.68 rows=967 width=0)

(actual time=0.368..0.368 rows=1000 loops=1)

Index Cond: (category = 42)

Buffers: shared read=6

Planning time: 0.371 ms

Execution time: 4.625 ms

PostgreSQL uses the index to find the 1000 rows with category 42, filters out the ones that are not interesting, sorts them and returns the top 20. 5 milliseconds is fine.

A new index makes things go sour

Now we add an index that can help us with sorting. That is definitely interesting if we often have to find the top 20 results:

CREATE INDEX skewed_sort_idx ON skewed (sort);

1	CREATE INDEX skewed_sort_idx ON skewed (sort);

And suddenly, things are looking worse:

                          QUERY PLAN
-------------------------------------------------------------
 Limit  (cost=0.42..736.34 rows=20 width=9)
        (actual time=21.658..28.568 rows=20 loops=1)
   Buffers: shared hit=374 read=191
   ->  Index Scan using skewed_sort_idx on skewed
                (cost=0.42..33889.43 rows=921 width=9)
                (actual time=21.655..28.555 rows=20 loops=1)
         Filter: (interesting AND (category = 42))
         Rows Removed by Filter: 69022
         Buffers: shared hit=374 read=191
 Planning time: 0.507 ms
 Execution time: 28.632 ms

1

2

3

4

5

6

7

8

9

10

11

12

13

QUERY PLAN

-------------------------------------------------------------

Limit (cost=0.42..736.34 rows=20 width=9)

(actual time=21.658..28.568 rows=20 loops=1)

Buffers: shared hit=374 read=191

-> Index Scan using skewed_sort_idx on skewed

(cost=0.42..33889.43 rows=921 width=9)

(actual time=21.655..28.555 rows=20 loops=1)

Filter: (interesting AND (category = 42))

Rows Removed by Filter: 69022

Buffers: shared hit=374 read=191

Planning time: 0.507 ms

Execution time: 28.632 ms

What happened?

PostgreSQL thinks that it will be faster if it examines the rows in sort order using the index until it has found 20 matches. But it doesn't know how the matching rows are distributed with respect to the sort order, so it is not aware that it will have to scan 69042 rows until it has found its 20 matches (see Rows Removed by Filter: 69022 in the above execution plan).

What can we do to get the better plan?

PostgreSQL v10 has added extended statistics to track how the values in different columns are correlated, but that does not track the distributions of the values, so it will not help us here.

There are two workarounds:

Drop the index that misleads PostgreSQL. If that is possible, it is a simple solution. But usually one cannot do that, because the index is either used to enforce a unique constraint, or it is needed by other queries that benefit from it.
Rewrite the query so that PostgreSQL cannot use the offending index. Of the many possible solutions for this, I want to present two:
- A subquery with OFFSET 0:
  
  PgSQL
  
  SELECT * FROM (SELECT * FROM skewed WHERE interesting AND category = 42 OFFSET 0) q ORDER BY sort LIMIT 20;
  
  1
  2
  3
  4
  5
  6
  
  SELECT *
  FROM (SELECT * FROM skewed
  WHERE interesting AND category = 42
  OFFSET 0) q
  ORDER BY sort
  LIMIT 20;
  
  This makes use of the fact that OFFSET and LIMIT prevent a subquery from being “flattened”, even if they have no effect on the query result.
- Using an expression as sort key:
  
  PgSQL
  
  SELECT * FROM skewed WHERE interesting AND category = 42 ORDER BY sort + 0 LIMIT 20;
  
  1
  2
  3
  4
  
  SELECT * FROM skewed
  WHERE interesting AND category = 42
  ORDER BY sort + 0
  LIMIT 20;
  
  This makes use of the fact that PostgreSQL cannot deduce that sort + 0 is the same as sort. Remember that PostgreSQL is extensible, and you can define your own + operator!

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

4 responses to “Adding an index can decrease SELECT performance”

Rob Johnson says:

June 11, 2018 at 3:34 pm

Alternatively, you could create the second index as a conditional index, and leave the query as-is:

CREATE INDEX skewed_sort_idx ON skewed (sort) where interesting;

With the conditional index, the original query finished in 8 ms on my machine. My timings in the other two cases matched yours, so this conditional index still degrades performance, but not as much.

Reply
- laurenz says:
  
  June 12, 2018 at 7:58 am
  
  You are right, that is also an alternative.
  
  Reply
Luca Veronese says:

June 13, 2018 at 1:23 pm

The best index here would be
CREATE INDEX skewed_sort_idx ON skewed (category, sort) where interesting;

Reply
Joshua Otwell says:

June 20, 2018 at 10:38 am

Your blog posts are great. The drawings that kick most of them off really are a nice addition and set them apart. Thanks for sharing.

Reply

Leave a Reply Cancel reply

Stay tuned with our

NEWSLETTER

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0

office@cybertec-postgresql.com

ISO_27001_Badge

Customer Support

Support Platform

Services

Support CYBERTEC Partner PostgreSQL Books

Company

STAY TUNED WITH OUR NEWSLETTER

Get the newest PostgreSQL Info & Tools

Data Protection Policy Terms and conditions Terms of Service Imprint

©

2025

CYBERTEC PostgreSQL International GmbH