Killing proper indexing: A neat idea

10.2014 | Category: Performance | Tags: index, performance tuning

10.2014

After being on the road to do PostgreSQL consulting for CYBERTEC for over a decade I noticed that there are a couple of ways to kill indexing entirely. One of the most favored ways is to apply functions or expressions on the column people want to filter on. It is a sure way to kill indexing entirely.

Table of Contents

To help people facing this problem I decided to compile a small blog post addressing this issue. Maybe it is helpful to some of you out there.

Generating some sample data

To prove my point here is some sample data:

test=# CREATE TABLE t_time AS
            SELECT         *
            FROM            generate_series('2014-01-01', '2014-05-01', '1 second'::interval) 
AS t;
SELECT 10364401

1

2

3

4

5

test=# CREATE TABLE t_time AS

SELECT *

FROM generate_series('2014-01-01', '2014-05-01', '1 second'::interval)

AS t;

SELECT 10364401

The script generates a list of around 10 million rows. A list lasting from January 2014 to May 1st, 2014 is generated (one entry per second). Note that generate_series will create a list of timestamps here. To find data quickly an index will be useful:

test=# CREATE INDEX idx_t ON t_time (t);
CREATE INDEX

test=# ANALYZE;
ANALYZE

1

2

3

4

5

test=# CREATE INDEX idx_t ON t_time (t);

CREATE INDEX

test=# ANALYZE;

ANALYZE

Checking indexing

Under normal conditions the index shows its value as soon as a proper WHERE clause is included in our statement:

test=# explain SELECT *
            FROM       t_time
            WHERE      t >= '2014-03-03'
                       AND t < '2014-03-04';
                                          QUERY PLAN                                                              
-----------------------------------------------------------------------------------
 Index Only Scan using idx_t on t_time  (cost=0.43..3074.74 rows=86765 width=8)
   Index Cond: ((t >= '2014-03-03 00:00:00+01'::timestamp with time zone)
            AND (t < '2014-03-04 00:00:00+01'::timestamp with time zone))
 Planning time: 0.083 ms

(3 rows)

1

2

3

4

5

6

7

8

9

10

11

12

test=# explain SELECT *

FROM t_time

WHERE t >= '2014-03-03'

AND t < '2014-03-04';

QUERY PLAN

-----------------------------------------------------------------------------------

Index Only Scan using idx_t on t_time (cost=0.43..3074.74 rows=86765 width=8)

Index Cond: ((t >= '2014-03-03 00:00:00+01'::timestamp with time zone)

AND (t < '2014-03-04 00:00:00+01'::timestamp with time zone))

Planning time: 0.083 ms

(3 rows)

However, in many cases people tend to transform the left side of the comparison when searching for data. In case of large data set this has disastrous consequences and performance will suffer badly. Here is what happens:

test=# explain SELECT * FROM t_time
            WHERE          t::date = '2013-03-03';

                          QUERY PLAN                           
---------------------------------------------------------------
 Seq Scan on t_time  (cost=0.00..201326.05 rows=51822 width=8)
   Filter: ((t)::date = '2013-03-03'::date)

 Planning time: 0.090 ms

(3 rows)

1

2

3

4

5

6

7

8

9

10

11

test=# explain SELECT * FROM t_time

WHERE t::date = '2013-03-03';

QUERY PLAN

---------------------------------------------------------------

Seq Scan on t_time (cost=0.00..201326.05 rows=51822 width=8)

Filter: ((t)::date = '2013-03-03'::date)

Planning time: 0.090 ms

(3 rows)

PostgreSQL does not transform the WHERE-clause into a range and therefore the index is not considered to be usable. To many PostgreSQL based applications this is more or less a death sentence when it comes to performance. A handful of inefficient sequential scans can ruin the performance of the entire application easily.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

2 responses to “Killing proper indexing: A neat idea”

Alphatier Wolf says:

October 16, 2014 at 2:45 pm

Hello,
Nice Case description
The Problem is casting from timestamp to date type. so index is not used.

Thank you

Reply
- Luan Huynh says:
  
  December 12, 2016 at 9:26 am
  
  I agreed with you.
  t = '2013-03-03' : Index Only Scan using idx_t on t_time
  t::date = '2013-03-03': Seq Scan on t_time
  
  Reply

Killing proper indexing: A neat idea

Generating some sample data

Checking indexing

2 responses to “Killing proper indexing: A neat idea”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts