The PostgreSQL optimizer is really a wonderful piece of software, which is capable of doing great things. One of those great things is so called “join pruning”. In case PostgreSQL is able to detect a join which is actually not needed for execution, it will automatically remove it from the plan. Removing joins from the plan can result in significantly better performance and provide end users with much simpler plans.
Let us take a look and see, when a join can be pruned from a query. Here is a simple example. Two tables are needed:
test=# CREATE TABLE a (aid int PRIMARY KEY); CREATE TABLE test=# INSERT INTO a VALUES (1), (2), (3); INSERT 0 3 test=# CREATE TABLE b (bid int PRIMARY KEY); CREATE TABLE test=# INSERT INTO b VALUES (2), (3), (4); INSERT 0 3
Mind that both tables have a primary key, which will play a major role later on.
Inspecting queries
To see what PostgreSQL does, we can write a simple query:
test=# SELECT * FROM a LEFT JOIN b ON (aid = bid); aid | bid -----+----- 1 | 2 | 2 3 | 3 (3 rows)
This is just a simple left-join, which takes all values from the left and matches values on the right. Here is the plan:
test=# explain SELECT * FROM a LEFT JOIN b ON (aid = bid); QUERY PLAN ----------------------------------------------------------------- Hash Left Join (cost=67.38..137.94 rows=2550 width=8) Hash Cond: (a.aid = b.bid) -> Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) -> Hash (cost=35.50..35.50 rows=2550 width=4) -> Seq Scan on b (cost=0.00..35.50 rows=2550 width=4) (5 rows)
PostgreSQL will perform a hash join to produce the final result. In the SELECT-clause a star will ensure, that all columns are returned.
However, what happens if only columns of the first table are needed?
test=# explain SELECT a.* FROM a LEFT JOIN b ON (aid = bid); QUERY PLAN ----------------------------------------------------- Seq Scan on a (cost=0.00..35.50 rows=2550 width=4) (1 row)
In this case PostgreSQL can prune the plan and just do a sequential scan on “a”. Why is that possible? Well, first of all no information from “b” is needed to satisfy the SELECT-clause. However, this is not a sufficient condition to just kick the table. Both sides must be unique. Remember, in a 1:n relationship the join could potentially return more data than a query without the join would. So, only if both sides are unique it is logically possible to just skip the second table.