CYBERTEC Logo

Enforcing join orders in PostgreSQL

03.2024 / Category: / Tags: |

After the pgconfeu23 in Prague - which has been an excellent event - I decided to share some of the things I presented as a blog post to maybe shed some light on some of those topics. One of those ideas presented was the way PostgreSQL handles joins and especially join orders. Internally the PostgreSQL does a good job to optimize queries but how does it really work?

Let us create some tables first:

In PostgreSQL we can easily create SQL using SQL. The beauty of psql is that one can simply run gexec to use the previous output as new input:

Voila, we have 5 tables which can serve as a sample data structure.

Joining tables in PostgreSQL

The following query shows a simple join using the tables we have just created:

What is the important observation here? Let us take a look at planning time. PostgreSQL needs 0.297 milliseconds to find the best execution plan (= execution strategy) to run the query. The question arising is: Where does the planner need the time to plan the query? The thing is: Even when using explicit joins as shown above PostgreSQL will join those tables implicitly and decide on the best join order. What does that mean in real life? Well let us consider a join “a join b join c”: Even if we write an SQL that says join “a to b” the optimizer might still decide to vote for “c join a join b” in case it guarantees the same result. Why is this so important? Because it offers a great deal of efficiency. Letting the optimizer decide on the best join order is an important internal optimization.

However, we got to keep planning time in mind - especially if there are many tables (10+?) are involved.

Controlling the join behavior in SQL

In case planning time is an issue we can force PostgreSQL to use the join order we want it to use. The variable controlling this behavior is join_collapse_limit. What does it mean? Basically it controls the number of explicit joins planned implicitly. In other words: How many explicit joins can be optimized by PostgreSQL.

If we set this variable to 1 it means that we force PostgreSQL to use the join order of our choosing:

What is really noteworthy about this is the significant improvement of planning speed. We can see a stunning 4x speedup.

However, let me also issue a word of caution: There is a reason why the optimizer is trying to re-structure joins in the first place. In case the query is more expensive than what we see here in this example it can make a lot of sense to invest more time into plan creation. In other words: Changing this variable can backfire unless the end user is fully aware of what is going on. We therefore advise to test your queries and your entire setups using real data and a real workload before changing this setting. In general it can also be beneficial to ONLY change the variable for a single query and keep the default value in postgresql.conf as it is for all other operations.

More about the pgconfeu23 in Prague in our blog.

5 1 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram