CYBERTEC Logo

Forcing a join order in PostgreSQL

05.2023 / Category: / Tags: | | | |
Comic about forcing join order in PostgreSQL
© Laurenz Albe 2023

 

Different from many other database systems, PostgreSQL does not support query hints. That makes it difficult to force the hand of the query planner when it comes to a certain join order that you know to be good. This article explains how you can influence execution plans in PostgreSQL.

Why no query hints?

The PostgreSQL TODO list lists optimizer hints under “Features We Do Not Want” and links to a discussion page that explains why. The discussion page references mailing list discussions that show that this decision has not been unanimous.

However, one of the strong points of PostgreSQL is its extensibility. If you really want query hints, you can get them: all you have to do is install the extension pg_hint_plan. This extension offers a comprehensive set of Oracle-style query hints, and it does not require a modified version of the PostgreSQL server.

But perhaps you don't want to install third-party software, or your database is running at a hosting provider and you have no access to the operating system. In that case, read on for alternative ways to force the join order.

An example query and its join order

We have three tables a, b and c and want to calculate the natural join between them. The optimizer chooses the following plan:

The PostgreSQL optimizer decided to first join b and a, then join the result with c. But we would like to join b and c first!

Forcing the join order with an optimizer barrier

We would like to write the query in a way that makes the PostgreSQL optimizer choose the plan we want. However, that is not as simple as it may seem. The PostgreSQL optimizer does not only plan the query as you wrote it, but it rearranges the query considerably. Among other things

  • it rearranges the join order as it thinks best
  • if “pulls up” subqueries to “flatten” the plan tree
  • it pushes WHERE conditions into joins and UNION ALL

Usually that is just what you want: the more ways the optimizer finds to execute the query, the better its chances are of finding the fastest execution plan. But if we want to force the optimizer's hand, we want to prevent exactly that. Therefore, we are looking for optimizer barriers, that is SQL constructs that prevent PostgreSQL from rearranging the plan.

The following two techniques are pretty similar: both rewrite the query to use a subquery and prevent the optimizer from pulling it up.

Using OFFSET 0 to force the join order

Here, we write a subquery in the FROM clause that explicitly joins the desired tables. To keep PostgreSQL from flattening the query, we can use an OFFSET or LIMIT clause in the subquery. The traditional way to do that is to use OFFSET 0, which does not change the result of the subquery:

It would not be hard to teach the optimizer to ignore that “useless” clause, but that would disable this useful trick, so it won't happen.

Using a common table expression to force the join order

Writing a subquery in the FROM clause can make the query hard to read. A common table expression (CTE) is a different approach: you write the subquery in the WITH clause at the beginning of the statement and give it a name. Then you can use that name in the main query, quite like a view, but one that only exists in the context of a single statement.

Before PostgreSQL v12, a CTE was automatically an optimizer barrier. Since v12, PostgreSQL can pull CTEs into the main query, and you have to use the MATERIALIZED keyword to prevent that:

The plan is different from the previous one, as PostgreSQL chose a hash join.

Join order and join_collapse_limit

I mentioned above that the optimizer rearranges the join order of a query. With an inner join of two tables, there are usually seven choices: PostgreSQL can opt for a nested loop, hash or merge join, and for the first two of these, the order of the tables makes a difference as well. With more tables, the number of options explodes, since the result of an inner join is independent of the order in which you join the tables. For three tables, there can be up to 147 combinations.

However, while the optimizer tries to find the best execution plan, it is also important that it does not take too long for planning. After all, PostgreSQL normally does not cache execution plans. To keep planning time moderate, the optimizer draws the line somewhere: if a query joins many tables, the optimizer will only consider all possible combinations for the first eight tables. It joins the remaining tables just like you wrote them in the statement. You can adjust that limit with the parameters join_collapse_limit and from_collapse_limit. The first one is for statements written with the explicit JOIN syntax, and the second applies to joins written in the form

If the number of tables reaches 12 (the default value of the parameter geqo_threshold), PostgreSQL uses an entirely different approach: it randomly generates a number of query plans and plays evolution by recombining the most promising plans over several generations. This genetic query optimizer can result in non-deterministic query plans, which is not always what you want.

Dumbing down the optimizer with join_collapse_limit = 1

With this approach, we deliberately lobotomize the optimizer by telling it not to rearrange the join order in the SQL statement. Then you have to write the tables in the exact order in which you want them joined:

We end up with the same execution plan as with OFFSET 0. You don't want to leave join_collapse_limit at 1, because other queries may perform badly with that setting. Here are some ideas how to change a parameter for a single query:

  • run the query in an explicit READ ONLY transaction and use SET LOCAL to change the parameter, so that it reverts to its previous setting as soon as the transaction is done
  • run the query from a database function and use the SET option of CREATE FUNCTION to change the parameter for the execution of the function

Conclusion

Unless you want to use an extension like pg_hint_plans, it is not easy to force the PostgreSQL optimizer to do what you want. We have seen how you can force the join oder with optimizer barriers or parameter settings.

If you are interested in query optimization, perhaps you want to read about UNION ALL and performance or about the different join strategies. There is also an introduction to EXPLAIN (ANALYZE).

0 0 votes
Article Rating
Subscribe
Notify of
guest
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Tony Zucchini
11 months ago

Hi Laurenz, I put this question unsolved on stackowerflow, do you have some suggestion ?

I have a table "weather" that stores weather parameters for 45 years
of entire Europe area with 400 million of record partitioned by range in
an Oracle database managed through 19c instance and the table has been
duplicated using oracle_fdw foreign wrapper in a PostgreSQL 15 instance,
so now that table is local to PostgreSQL instance. Both tables are
preloaded in memory in keep pool for Oracle and shared buffers in
PostgreSQL using pg_prewarm.

I wrote a query that generates a ranking map of temperature max in a
selected interval for every year. Now execution time for one month
interval is 48s in PostgreSQL and 130s in Oracle, but I can force the
Oracle query optimizer to parallel execution that sounds logic with a
partitioned table using hint /* PARALLEL */, in this way execution time
goes down from 130s to 7s, but there isn't any possibility to force
PostgreSQL query optimizer to do the same. We know there aren't hints in
PostgreSQL but I played with related configuration parameters to force a
parallel strategy for the query, useless. Do You know if there is
possible strategy to force PostgreSQL optimizer to parallelize the query
?

laurenz
laurenz
11 months ago
Reply to  Tony Zucchini

This has nothing to do with my article...

Jeremy Giaco
Jeremy Giaco
9 months ago

dude, thanks. i didn't think there was a way to force join order in postgres. i used a lot of hints/tricks in sql server to force join order, and didn't think there was a way to do that in postgres. i had a table with 500m very "thin" rows. one indexed value had two rows of data. when joining that to another table with several million rows, it took minutes. i knew i wanted to force the join order, which turns it into a subsecond query, but i thought i'd have to implement a temp table to do so. not any more!

Franck Pachot
11 months ago

All those force the join order but not the join direction. If the optimizer is bad on cardinality to get the right join order, there's a good chance that it chooses a bad direction. P
pg_hint_plan can do both, and also set a planner GUC at statement level.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    4
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram