After being on the road to do PostgreSQL consulting for Cybertec for over a decade I noticed that there are a couple of ways to kill indexing entirely. One of the most favored ways is to apply functions or expressions on the column people want to filter on. It is a sure way to kill indexing entirely.

To help people facing this problem I decided to compile a small blog post addressing this issue. Maybe it is helpful to some of you out there.

Generating some sample data

To prove my point here is some sample data:

test=# CREATE TABLE t_time AS

            SELECT         *

            FROM            generate_series('2014-01-01', '2014-05-01', '1 second'::interval) AS t;

SELECT 10364401

The script generates a list of around 10 million rows. A list lasting from January 2014 to May 1st 2014 is generated (one entry per second). Note that generate_series will create a list of timestamps here. To find data quickly an index will be useful:

test=# CREATE INDEX idx_t ON t_time (t);

CREATE INDEX

test=# ANALYZE;

ANALYZE

Checking indexing

Under normal conditions the index shows its value as soon as a proper WHERE clause is included in our statement:

test=# explain SELECT *

            FROM            t_time

            WHERE          t >= '2014-03-03'

                                   AND t < '2014-03-04';

                                          QUERY PLAN                                                              

------------------------------------------------------------------------------------

 Index Only Scan using idx_t on t_time  (cost=0.43..3074.74 rows=86765 width=8)

   Index Cond: ((t >= '2014-03-03 00:00:00+01'::timestamp with time zone)

            AND (t < '2014-03-04 00:00:00+01'::timestamp with time zone))

 Planning time: 0.083 ms

(3 rows)

However, in many cases people tend to transform the left side of the comparison when searching for data. In case of large data set this has disasterous consequences and performance will suffer badly. Here is what happens:

test=# explain SELECT * FROM t_time

            WHERE          t::date = '2013-03-03';

                          QUERY PLAN                           

---------------------------------------------------------------

 Seq Scan on t_time  (cost=0.00..201326.05 rows=51822 width=8)

   Filter: ((t)::date = '2013-03-03'::date)

 Planning time: 0.090 ms

(3 rows)

PostgreSQL does not transform the WHERE-clause into a range and therefore the index is not considered to be usable. To many PostgreSQL based applications this is more or less a death sentence when it comes to performance. A handful of inefficient sequential scans can ruin the performance of the entire application easily.