Detecting wrong planner estimates

07.2014 / Category: / Tags: | | |

In 99% of all cases the PostgreSQL planner does a perfect job to optimize your queries and to make sure that you enjoy high performance and low response times. The infrastructure ensuring this is both sophisticated as well as robust. However, there are some corner cases which can turn out to be quite nasty. In some cases, the planner has simply no real chance and has to rely on guesswork to come up with a reasonable plan.

The question now is: How can a user figure out where things go wrong? Here is an example …

Dissecting a query …

The goal here is to see, how a simple query can be dissected and analyzed. Here is the query we want to optimize:

Looks like a simple thing, doesn't it? Well, not quite … Let us take a look at the definition of this system view:

Still, this looks like a simple query. Let us take a look at the execution plan:

Again, this one looks pretty sane. As always we need to read the thing from inside to outside. Therefore the first thing to see is the sequential scan on pg_class. 295 lines are expected. Then comes the next sequential scan as well as the hash join. With overall costs of 136 and 6 rows the plan looks very promising.

Let us look what happens underneath. “explain analyze” will reveal what is really going on:

Again the idea is to read things from inside out. Now we need two brackets for each node: One for “cost” and one for “actual time”: The important thing here is that costs are measured in abstract units - “actual time”, however, is measured in milliseconds. What we see here is that the planner estimated 295 rows and we got 299 rows in reality. Pretty good, no? The time spent on this operation is close to zero - a very good sign. Let us move up  the plan now: The sequential scan on pg_statistics expects 426 lines. We get slightly less, which is pretty good.

Moving up the plan, pg_attribute reveals that those estimates are pretty close to what PostgreSQL has predicted. 2419 vs 2328 lines sounds like a pretty reasonable thing as well.

When things start to be a little off …

Then we can see a couple of interesting lines:

The estimate here is just 6 lines - in reality, however, we get a set of 400 rows. What a whopping difference.

Why does it matter? If an estimate is very low, it can easily happen that a “nested loop” on top of this is heavily underestimated. With just 400 lines we cannot expect issues here - but what if you end up with a nested loop on a million rows if there should be just 500 rows around? It can easily turn out to be a disaster.

So, how do we end up with this wrong estimate? The problem here is the join filter. To see, if something is allowed or not, the view calls the has_column_privilege() function. The clue here is that PostgreSQL has absolutely no statistics about the return values of this function. As I have logged in as superuser, I happen to be able to read everything. And voila, this is where the wrong estimate comes from.

Of course, in my example this is not dangerous at all. It just shows nicely, how things can become a hidden bomb.

0 0 votes
Article Rating
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
Ross Reedstrom
Ross Reedstrom
9 years ago

Great walk through and analysis. The comment about being root, though is off: I see similar values for root vs. unprivileged user on a mostly empty db as well. Perhaps in this case, the discrepancy might show up if you had a lot of private schema?

Shaun Thomas
9 years ago

This kind of visibility isn't taken into consideration when handling statistics within the planner. Pretty much everything comes from pg_stats combined with various GUC settings like random_page_cost. The issue is that the planner has long had issues with correlations---as in, it can't do them. If two columns have a relationship that defines the statistics between them, PG will not notice this, but will use the raw ANALYZE data instead, which can be off by several orders of magnitude. This becomes especially pronounced when handling multiple tables via JOIN. Even with a direct equality, the proportions are unknown except via local table stats per column.

Since PostgreSQL doesn't (and shouldn't, really) gather statistics between tables, this can't really improve. About the only possible way forward here is to analyze the returning data and stash any observed correlations in a pg_observed_stats table or something. Then the planner could use it as an override when it sees that join set again. I believe Oracle does something like this, actually.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf

+43 (0) 2622 93022-0

Get the newest PostgreSQL Info & Tools

    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    CYBERTEC PostgreSQL International GmbH
    Would love your thoughts, please comment.x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram