UPDATED July 2023: The PostgreSQL optimizer is really a wonderful piece of software which is capable of doing great things. One of those great things is so-called “join pruning”. In case PostgreSQL detects a join which is actually not needed for execution, it will automatically remove it from the plan. Removing joins from the plan can result in significantly better performance and provide end users with much simpler plans.

Let’s take a look and see when a join can be pruned from a query. Here is an example.

Two tables are needed:

test=# CREATE TABLE a (aid int PRIMARY KEY);

test=# INSERT INTO a VALUES (1), (2), (3);

test=# CREATE TABLE b (bid int PRIMARY KEY);

test=# INSERT INTO b VALUES (2), (3), (4);

Mind that both tables have a primary key – which will play a major role later on.

Inspecting queries

To see what PostgreSQL does, we can write a basic query:

test=# SELECT * FROM a LEFT JOIN b ON (aid = bid);
 aid | bid
   1 |
   2 |   2
   3 |   3

(3 rows)

This is a left-join, which takes all values from the left and matches values on the right. Here’s the plan:

test=# explain SELECT * FROM a LEFT JOIN b ON (aid = bid);
                       QUERY PLAN
 Hash Left Join  (cost=67.38..137.94 rows=2550 width=8)
      Hash Cond: (a.aid = b.bid)
      ->  Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4)
      ->  Hash  (cost=35.50..35.50 rows=2550 width=4)
          ->  Seq Scan on b  (cost=0.00..35.50 rows=2550 width=4)

(5 rows)

PostgreSQL will perform a hash join to produce the final result. In the SELECT-clause, a star will ensure that all columns are returned.

However, what happens if only columns from the first table are needed?

test=# explain SELECT a.* FROM a LEFT JOIN b ON (aid = bid);
                     QUERY PLAN
 Seq Scan on a  (cost=0.00..35.50 rows=2550 width=4)

(1 row)

In this case, PostgreSQL can prune the plan and just do a sequential scan on “a”.

Conclusion – Join pruning

Why is that possible? Well, first of all no information from “b” is needed to satisfy the SELECT-clause. However, this is not a sufficient condition to justify eliminating the table from the execution plan. Both sides must be unique. Remember, in a 1:n relationship the join could potentially return more data than a query without the join would. So, only if both sides are unique is it logically possible to just skip the second table. If you can prove that a simpler plan will produce the same result, you can use the simpler plan.


Read more about joins and performance: