July 2014

UPDATE Sept. 2023: For more recent information about planner statistics, see the PostgreSQL documentation about extended statistics, or this blog about CREATE STATISTICS.

You can also find out more recent info in this blog about Improving Group By with Create Statistics.

The blog post below about cross correlation has been preserved for reference.

Planner estimates have already been discussed on this blog in my previous post, and also in some posts before that. A couple of years ago, I stumbled across an interesting issue, which is commonly known as “cross correlation”.

Let us consider the following example:

test=# CREATE TABLE t_test (a int, b int);
CREATE TABLE

test=# INSERT INTO t_test SELECT 0, 1
       FROM generate_series(1, 100000);

INSERT 0 100000

test=# INSERT INTO t_test SELECT 1, 0
       FROM generate_series(1, 100000);
INSERT 0 100000

test=# ANALYZE t_test;
ANALYZE

test=# CREATE TABLE t_test (a int, b int);

CREATE TABLE

test=# INSERT INTO t_test SELECT 0, 1

FROM generate_series(1, 100000);

INSERT 0 100000

test=# INSERT INTO t_test SELECT 1, 0

FROM generate_series(1, 100000);

INSERT 0 100000

test=# ANALYZE t_test;

ANALYZE

We add 100.000 rows containing 0 and 1 and then add 100.000 rows containing 1 and 0. Then optimizer statistics are built. So, all together we have 200.000 files in the table:

test=# explain SELECT count(*) FROM t_test;
                             QUERY PLAN                            
--------------------------------------------------------------------
 Aggregate  (cost=3385.00..3385.01 rows=1 width=0)
   ->  Seq Scan on t_test  (cost=0.00..2885.00 rows=200000 width=0)
 Planning time: 0.095 ms

(3 rows)

test=# explain SELECT count(*) FROM t_test;

QUERY PLAN

--------------------------------------------------------------------

Aggregate (cost=3385.00..3385.01 rows=1 width=0)

-> Seq Scan on t_test (cost=0.00..2885.00 rows=200000 width=0)

Planning time: 0.095 ms

(3 rows)

So far everything looks just fine. The planner has guessed the number of rows in the table precisely. The same applies to the following query:

test=# explain SELECT count(*) FROM t_test WHERE a = 0;
                            QUERY PLAN                            
-------------------------------------------------------------------
 Aggregate  (cost=3634.28..3634.29 rows=1 width=0)
   ->  Seq Scan on t_test  (cost=0.00..3385.00 rows=99713 width=0)
         Filter: (a = 0)
 Planning time: 0.052 ms

(4 rows)

test=# explain SELECT count(*) FROM t_test WHERE a = 0;

QUERY PLAN

-------------------------------------------------------------------

Aggregate (cost=3634.28..3634.29 rows=1 width=0)

-> Seq Scan on t_test (cost=0.00..3385.00 rows=99713 width=0)

Filter: (a = 0)

Planning time: 0.052 ms

(4 rows)

99.713 is actually a pretty good estimate and it is totally sufficient to come up with a reasonable plan.

cross correlation at work

Let us try something else:

test=# explain SELECT count(*) FROM t_test WHERE a = 0 AND b = 0;
                            QUERY PLAN                            
-------------------------------------------------------------------
 Aggregate  (cost=4010.00..4010.01 rows=1 width=0)
   ->  Seq Scan on t_test  (cost=0.00..3885.00 rows=50000 width=0)
         Filter: ((a = 0) AND (b = 0))

 Planning time: 0.068 ms

(4 rows)

test=# explain SELECT count(*) FROM t_test WHERE a = 0 AND b = 0;

QUERY PLAN

-------------------------------------------------------------------

Aggregate (cost=4010.00..4010.01 rows=1 width=0)

-> Seq Scan on t_test (cost=0.00..3885.00 rows=50000 width=0)

Filter: ((a = 0) AND (b = 0))

Planning time: 0.068 ms

(4 rows)

Oops? What happened? The planner will estimate that 50.000 rows match this condition. The reality is somewhat different:

test=# SELECT count(*) FROM t_test WHERE a = 0 AND b = 0;
 count
-------
     0
(1 row)

test=# SELECT count(*) FROM t_test WHERE a = 0 AND b = 0;

count

-------

(1 row)

How did we end up with this terrible under-estimation of the problem?

Well, the reason is how PostgreSQL handles statistics. Internally PostgreSQL will store statistics for every column. So, we know that a=0 represents 50% of the table and b=0 will also represent 50% of the table. From a mathematical point of view it might be safe to just multiply those likelihoods:

0.5 * 0.5 = 0.25

This is exactly what is going on here. Therefore the estimate is 25% of 200.000 rows = 50.000 rows. In our example this is dead wrong. The problem is that PostgreSQL does not have statistics about the combination of those columns. It does not know that when a=1 than b will be 0.

As you have seen, this can lead to over estimation - but also to under estimations:

test=# explain SELECT count(*) FROM t_test WHERE a = 0 AND b = 1;
                            QUERY PLAN                             
-------------------------------------------------------------------
 Aggregate  (cost=4009.28..4009.30 rows=1 width=0)
   ->  Seq Scan on t_test  (cost=0.00..3885.00 rows=49714 width=0)
         Filter: ((a = 0) AND (b = 1))
 Planning time: 0.035 ms

(4 rows)

test=# explain SELECT count(*) FROM t_test WHERE a = 0 AND b = 1;

QUERY PLAN

-------------------------------------------------------------------

Aggregate (cost=4009.28..4009.30 rows=1 width=0)

-> Seq Scan on t_test (cost=0.00..3885.00 rows=49714 width=0)

Filter: ((a = 0) AND (b = 1))

Planning time: 0.035 ms

(4 rows)

In this case the real number of rows returned by the system is 100.000 and not just 49.714.

Keep in mind that the examples I have created are really pretty artificial. But, things like that can happen in real life and cause issues along the way. And as always: If you get bad plans - expect bad performance.

In 99% of all cases the PostgreSQL planner does a perfect job to optimize your queries and to make sure that you enjoy high performance and low response times. The infrastructure ensuring this is both sophisticated as well as robust. However, there are some corner cases which can turn out to be quite nasty. In some cases, the planner has simply no real chance and has to rely on guesswork to come up with a reasonable plan.

The question now is: How can a user figure out where things go wrong? Here is an example …

Dissecting a query …

The goal here is to see, how a simple query can be dissected and analyzed. Here is the query we want to optimize:

            SELECT * FROM pg_stats;

1	SELECT * FROM pg_stats;

Looks like a simple thing, doesn't it? Well, not quite … Let us take a look at the definition of this system view:

SELECT         ...
               ...
   FROM   pg_statistic s
     JOIN pg_class c ON c.oid = s.starelid
     JOIN pg_attribute a ON c.oid = a.attrelid
          AND a.attnum = s.staattnum
     LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
  WHERE NOT a.attisdropped
     AND has_column_privilege(c.oid, a.attnum, 'select'::text);

SELECT ...

...

FROM pg_statistic s

JOIN pg_class c ON c.oid = s.starelid

JOIN pg_attribute a ON c.oid = a.attrelid

AND a.attnum = s.staattnum

LEFT JOIN pg_namespace n ON n.oid = c.relnamespace

WHERE NOT a.attisdropped

AND has_column_privilege(c.oid, a.attnum, 'select'::text);

Still, this looks like a simple query. Let us take a look at the execution plan:

 test=# explain SELECT * FROM pg_stats;
                                             QUERY PLAN                                            
----------------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=47.28..135.42 rows=6 width=535)
   ->  Hash Join  (cost=47.14..133.69 rows=6 width=475)
         Hash Cond: ((a.attrelid = c.oid) AND (a.attnum = s.staattnum))
         Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)
         ->  Seq Scan on pg_attribute a  (cost=0.00..68.19 rows=2419 width=70)
               Filter: (NOT attisdropped)
         ->  Hash  (cost=40.75..40.75 rows=426 width=421)
               ->  Hash Join  (cost=14.64..40.75 rows=426 width=421)
                     Hash Cond: (s.starelid = c.oid)
                     ->  Seq Scan on pg_statistic s  (cost=0.00..20.26 rows=426 width=349)
                     ->  Hash  (cost=10.95..10.95 rows=295 width=72)
                         ->  Seq Scan on pg_class c  (cost=0.00..10.95 rows=295 width=72)
   ->  Index Scan using pg_namespace_oid_index on pg_namespace n 
            (cost=0.13..0.19 rows=1 width=68)
            Index Cond: (oid = c.relnamespace)

 Planning time: 1.746 ms

(15 rows)

test=# explain SELECT * FROM pg_stats;

QUERY PLAN

----------------------------------------------------------------------------------------------------

Nested Loop Left Join (cost=47.28..135.42 rows=6 width=535)

-> Hash Join (cost=47.14..133.69 rows=6 width=475)

Hash Cond: ((a.attrelid = c.oid) AND (a.attnum = s.staattnum))

Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)

-> Seq Scan on pg_attribute a (cost=0.00..68.19 rows=2419 width=70)

Filter: (NOT attisdropped)

-> Hash (cost=40.75..40.75 rows=426 width=421)

-> Hash Join (cost=14.64..40.75 rows=426 width=421)

Hash Cond: (s.starelid = c.oid)

-> Seq Scan on pg_statistic s (cost=0.00..20.26 rows=426 width=349)

-> Hash (cost=10.95..10.95 rows=295 width=72)

-> Seq Scan on pg_class c (cost=0.00..10.95 rows=295 width=72)

-> Index Scan using pg_namespace_oid_index on pg_namespace n

(cost=0.13..0.19 rows=1 width=68)

Index Cond: (oid = c.relnamespace)

Planning time: 1.746 ms

(15 rows)

Again, this one looks pretty sane. As always we need to read the thing from inside to outside. Therefore the first thing to see is the sequential scan on pg_class. 295 lines are expected. Then comes the next sequential scan as well as the hash join. With overall costs of 136 and 6 rows the plan looks very promising.

Let us look what happens underneath. “explain analyze” will reveal what is really going on:

test=# explain analyze SELECT * FROM pg_stats;
                                        QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------
 Nested Loop Left Join  (cost=47.28..135.42 rows=6 width=535)
                        (actual time=9.284..14.197 rows=400 loops=1)
   ->  Hash Join  (cost=47.14..133.69 rows=6 width=475)
                        (actual time=9.245..13.215 rows=400 loops=1)
         Hash Cond: ((a.attrelid = c.oid) AND (a.attnum = s.staattnum))
         Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)
         ->  Seq Scan on pg_attribute a  (cost=0.00..68.19 rows=2419 width=70)
                        (actual time=0.006..3.282 rows=2328 loops=1)
               Filter: (NOT attisdropped)
         ->  Hash  (cost=40.75..40.75 rows=426 width=421)
                        (actual time=0.827..0.827 rows=400 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 143kB
               ->  Hash Join  (cost=14.64..40.75 rows=426 width=421)
                        (actual time=0.133..0.538 rows=400 loops=1)
                     Hash Cond: (s.starelid = c.oid)
                     ->  Seq Scan on pg_statistic s  (cost=0.00..20.26 rows=426 width=349)
                                   (actual time=0.011..0.106 rows=400 loops=1)
                     ->  Hash  (cost=10.95..10.95 rows=295 width=72)
                                   (actual time=0.113..0.113 rows=299 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 31kB
                           ->  Seq Scan on pg_class c  (cost=0.00..10.95 rows=295 width=72)
                                   (actual time=0.002..0.050 rows=299 loops=1)
   ->  Index Scan using pg_namespace_oid_index on pg_namespace n 
            (cost=0.13..0.19 rows=1 width=68)
            (actual time=0.001..0.001 rows=1 loops=400)
         Index Cond: (oid = c.relnamespace)

 Planning time: 0.471 ms
 Execution time: 14.294 ms

(18 rows)

test=# explain analyze SELECT * FROM pg_stats;

QUERY PLAN

---------------------------------------------------------------------------------------------

Nested Loop Left Join (cost=47.28..135.42 rows=6 width=535)

(actual time=9.284..14.197 rows=400 loops=1)

-> Hash Join (cost=47.14..133.69 rows=6 width=475)

(actual time=9.245..13.215 rows=400 loops=1)

Hash Cond: ((a.attrelid = c.oid) AND (a.attnum = s.staattnum))

Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)

-> Seq Scan on pg_attribute a (cost=0.00..68.19 rows=2419 width=70)

(actual time=0.006..3.282 rows=2328 loops=1)

Filter: (NOT attisdropped)

-> Hash (cost=40.75..40.75 rows=426 width=421)

(actual time=0.827..0.827 rows=400 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 143kB

-> Hash Join (cost=14.64..40.75 rows=426 width=421)

(actual time=0.133..0.538 rows=400 loops=1)

Hash Cond: (s.starelid = c.oid)

-> Seq Scan on pg_statistic s (cost=0.00..20.26 rows=426 width=349)

(actual time=0.011..0.106 rows=400 loops=1)

-> Hash (cost=10.95..10.95 rows=295 width=72)

(actual time=0.113..0.113 rows=299 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 31kB

-> Seq Scan on pg_class c (cost=0.00..10.95 rows=295 width=72)

(actual time=0.002..0.050 rows=299 loops=1)

-> Index Scan using pg_namespace_oid_index on pg_namespace n

(cost=0.13..0.19 rows=1 width=68)

(actual time=0.001..0.001 rows=1 loops=400)

Index Cond: (oid = c.relnamespace)

Planning time: 0.471 ms

Execution time: 14.294 ms

(18 rows)

Again the idea is to read things from inside out. Now we need two brackets for each node: One for “cost” and one for “actual time”: The important thing here is that costs are measured in abstract units - “actual time”, however, is measured in milliseconds. What we see here is that the planner estimated 295 rows and we got 299 rows in reality. Pretty good, no? The time spent on this operation is close to zero - a very good sign. Let us move up the plan now: The sequential scan on pg_statistics expects 426 lines. We get slightly less, which is pretty good.

Moving up the plan, pg_attribute reveals that those estimates are pretty close to what PostgreSQL has predicted. 2419 vs 2328 lines sounds like a pretty reasonable thing as well.

When things start to be a little off …

Then we can see a couple of interesting lines:

  ->  Hash Join  (cost=47.14..133.69 rows=6 width=475)
                        (actual time=9.245..13.215 rows=400 loops=1)
         Hash Cond: ((a.attrelid = c.oid) AND (a.attnum = s.staattnum))
         Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)

-> Hash Join (cost=47.14..133.69 rows=6 width=475)

(actual time=9.245..13.215 rows=400 loops=1)

Hash Cond: ((a.attrelid = c.oid) AND (a.attnum = s.staattnum))

Join Filter: has_column_privilege(c.oid, a.attnum, 'select'::text)

The estimate here is just 6 lines - in reality, however, we get a set of 400 rows. What a whopping difference.

Why does it matter? If an estimate is very low, it can easily happen that a “nested loop” on top of this is heavily underestimated. With just 400 lines we cannot expect issues here - but what if you end up with a nested loop on a million rows if there should be just 500 rows around? It can easily turn out to be a disaster.

So, how do we end up with this wrong estimate? The problem here is the join filter. To see, if something is allowed or not, the view calls the has_column_privilege() function. The clue here is that PostgreSQL has absolutely no statistics about the return values of this function. As I have logged in as superuser, I happen to be able to read everything. And voila, this is where the wrong estimate comes from.

Of course, in my example this is not dangerous at all. It just shows nicely, how things can become a hidden bomb.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.