CYBERTEC PostgreSQL Logo

What is an inner join in SQL? And what is an outer join?

09.2022 / Category: / Tags: |

A join is a concept in IT which is widely used and often referred to but rarely really understood. What are the differences between inner joins, outer joins, semi joins and so on? Let's shed some light on them and see how inner and outer joins really work.

Producing sample data

Before we can get started we need to create some sample data:

The structure is actually quite simple. It consists of two tables containing a handful of rows which will be used for our little demo.

Explicit vs implicit joins

The question people often ask is: What is the difference between an implicit and an explicit join? The answer is simply syntactic sugar. Here is an implicit join:

In this the join actually happens in the WHERE-clause. In the FROM clause we only list the tables we want to join. The alternative is to use an explicit join:

Both syntax variations are identical. It is merely a question of taste and style that is relevant here. If you look at the execution plan you will see that both plans are absolutely identical.

Here is the second plan …

Basically the optimizer handles both queries the same way for MOST queries. If you happen to run a typical application there is no difference. However, there are some subtle differences which are worth mentioning. Let us take a look at the following configuration variable which can be found in postgresql.conf:

This means that 8 explicit joins are implicitly planned. If you are using standard implicit joins PostgreSQL will be able to determine the join order for those joins automatically. This is done to speed up the query. In reality it means that PostgreSQL does not necessarily join data in the order we pass it to the query - things can be shuffled by the optimizer if it deems necessary. However, if you are using explicit joins this is only done for up to 8 explicit joins - the rest is taken in the order defined by the query. While this is definitely a difference between implicit and explicit joins it is usually not that relevant to most applications. Joining 10+ tables is usually more the exception than the rule. Also, keep in mind that joining dozens of tables will already lead to genetic query optimization (GEQO) anyway.

In most cases explicit vs implicit joins is a matter of style rather than performance. However, keep in mind that there are indeed differences when looking at really complex queries.

What is an inner join? What is an outer join?

The next question people often have is: What is the difference between an inner and an outer join? The answer is: An inner join will look for common entries in those tables. An outer join will always take all data on one side and find the corresponding matches on the other side.

Here is an example:

What we do here is to display all records on table “a” (= “left side”) and look for those rows matching on the right side. The opposite of a LEFT JOIN is a RIGHT JOIN. The next query is identical to the first one:

Full joins

However, there is a third option: A full join. What it does is to take ALL data from both sides and find the matches. Either side of the join that does not have a match will be filled up with NULL values. Here is an example:

While this is relatively easy to code, people often make mistakes due to misconceptions about what outer joins do.

Outer joins: Common mistakes and errors

What happens more often than not is that people add an AND clause to the join condition. What does it mean for the result:

Most people actually expect that the result set will be smaller than before but this is not the case. The ON condition merely changes which row match - a LEFT JOIN will still produce all rows on the left side. The additional condition simply changes some entries to become NULL but this is mostly it - it does not reduce the amount of data. Therefore a LEFT JOIN that is broken semantically is often hitting our PostgreSQL 24x7 support desk masked as a performance problem.

Especially when aggregates are involved this can easily hide the underlying semantic problem:

It is quite common to see wrong results because people expect some data to be filtered away by the additional condition.

IN: What is a semi-join?

But there is more. Often people ask: What is a semi-join? The answer is quite simple: Think of an IN statement:

The IN-statement is an implicit DISTINCT filter which removes duplicate entries. The difference between a join and a semi-join is therefore the way duplicates are handled.

The execution plan clearly reveals that those duplicates are removed. In this case it is done using a HashAggregate (which you will also see in case of a GROUP BY statement in many cases):

The opposite of an IN statement is NOT IN:

In this case we remove all other rows from the result.

Finally …

In case you would like to learn more about joins, check out our blogpost about join strategies.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram