© Laurenz Albe 2020
Many people consider recursive queries a difficult topic. Still, they enable you to do things that would otherwise be impossible in SQL.
This articles gives a simple introduction with examples and shows the differences to Oracle’s implementation of recursive queries.
Common table expressions (
A common table expression (CTE) can be seen as a view that is only valid for a single query:
WITH ctename AS ( SELECT ... ) SELECT ... FROM ctename ...
This could also be written as a subquery in
FROM, but there are some advantages to using CTEs:
- The query becomes more readable.
- You can reference the CTE several times in the query, and it will be calculated only once.
- You can use data modifying statements in the CTE (typically with a
Note that before v12, PostgreSQL always materialized CTEs. That means, the CTE was calculated independently from the containing query. From v12 on, CTEs can be “inlined” into the query, which provides further optimization potential.
CTEs (as well as the recursive CTEs mentioned below) are defined in the SQL standard, although PostgreSQL does not implement the
Syntax of recursive queries
Recursive queries are written using recursive CTEs, that are CTEs containing the
WITH RECURSIVE ctename AS ( SELECT /* non-recursive branch, cannot reference "ctename" */ UNION [ALL] SELECT /* recursive branch referencing "ctename" */ ) SELECT ... FROM ctename ...
How recursive queries are processed
PostgreSQL internally uses a working table to process recursive CTEs. This processing is not really recursive, but rather iterative:
First, the working table is initialized by executing the non-recursive branch of the CTE. The result of the CTE is also initialized with this result set. If the recursive CTE uses
UNION rather than
UNION ALL, duplicate rows are removed.
Then, PostgreSQL repeats the following until the working table is empty:
- Evaluate the recursive branch of the CTE, replacing the reference to the CTE with the working table.
- Add all resulting rows to the CTE result. If
UNIONis used to combine the branches, discard duplicate rows.
- Replace the working table with all new rows from the previous step (excluding any removed duplicates).
Note that the self-referential branch of the CTE is not executed using the complete CTE result so far, but only the rows that are new since the previous iteration (the working table).
You have to be aware of the danger of an endless loop here:
If the iteration never ends, the query will just keep running until the result table becomes big enough to cause an error. There are two ways to deal with that:
- Often you can avoid infinite recursion by using
UNION, which removes duplicate result rows (but of course requires extra processing effort).
- Another way is to place a
LIMITclause on the query that uses the CTE, because PostgreSQL stops processing if the recursive CTE has calculated as many rows as are fetched by the parent query. Note that this technique is not portable to other standard compliant databases.
A simple example
Let’s assume a self-referential table like this:
TABLE emp; empno | ename | job | mgr | hiredate | sal | comm | deptno -------+--------+-----------+------+------------+---------+---------+-------- 7839 | KING | PRESIDENT | | 1981-11-17 | 5000.00 | | 10 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | | 30 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | | 10 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | | 20 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | | 20 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | | 20 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 | 0.00 | 30 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950.00 | | 30 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | | 10 (12 rows)
We want to find all the subordinates of person 7566, including the person itself.
The non-recursive branch of the query will be:
SELECT empno, ename FROM emp WHERE empno = 7566;
The recursive branch will find all subordinates of all entries in the working table:
SELECT emp.empno, emp.ename FROM emp JOIN ctename ON emp.mgr = ctename.empno;
We can assume that the dependencies contain no cycles (nobody is his or her own manager, directly or indirectly). So we can combine the queries with
UNION ALL, because no duplicates can occur.
So our complete query will be:
WITH RECURSIVE ctename AS ( SELECT empno, ename FROM emp WHERE empno = 7566 UNION ALL SELECT emp.empno, emp.ename FROM emp JOIN ctename ON emp.mgr = ctename.empno ) SELECT * FROM ctename; empno | ename -------+------- 7566 | JONES 7902 | FORD 7369 | SMITH (3 rows)
Adding generated columns
Sometimes you want to add more information, like the hierarchical level. You can do that by adding the starting level as a constant in the non-recursive branch. In the recursive branch you simply add 1 to the level:
WITH RECURSIVE ctename AS ( SELECT empno, ename, 0 AS level FROM emp WHERE empno = 7566 UNION ALL SELECT emp.empno, emp.ename, ctename.level + 1 FROM emp JOIN ctename ON emp.mgr = ctename.empno ) SELECT * FROM ctename; empno | ename | level -------+-------+------- 7566 | JONES | 0 7902 | FORD | 1 7369 | SMITH | 2 (3 rows)
If you use
UNION to avoid duplicated rows in the case of circular references, you cannot use this technique. This is because adding
level will make rows that were identical before different. But in that case a hierarchical level wouldn’t make much sense anyway because an entry could appear on infinitely many levels.
Another frequent requirement is to collect all ancestors in a “path”:
WITH RECURSIVE ctename AS ( SELECT empno, ename, ename AS path FROM emp WHERE empno = 7566 UNION ALL SELECT emp.empno, emp.ename, ctename.path || ' -> ' || emp.ename FROM emp JOIN ctename ON emp.mgr = ctename.empno ) SELECT * FROM ctename; empno | ename | path -------+-------+------------------------ 7566 | JONES | JONES 7902 | FORD | JONES -> FORD 7369 | SMITH | JONES -> FORD -> SMITH
Comparison to Oracle
Oracle database has a different syntax for recursive queries that does not conform to the SQL standard. The original example would look like this:
SELECT empno, ename FROM emp START WITH empno = 7566 CONNECT BY PRIOR empno = mgr; EMPNO ENAME ---------- ---------- 7566 JONES 7902 FORD 7369 SMITH
This syntax is more concise, but less powerful than recursive CTEs. For more complicated queries involving joins, it can become difficult and confusing.
It is always easy to translate an Oracle “hierarchical query” to a recursive CTE:
- The non-recursive branch is the Oracle query without the
CONNECT BYclause but including the
- The recursive branch is the Oracle query without the
START WITHclause but including the
CONNECT BYclause. You add a join with the name of the recursive CTE and replace all
PRIORcolumns with columns from that joined CTE.
- If the Oracle query uses
CONNECT BY NOCYCLE, use
Apart from that, Oracle also supports standard compliant recursive CTEs. These also support the
CYCLE clauses that PostgreSQL doesn’t implement.
The true power of recursive queries
Without recursive CTEs, many things that can be written in procedural languages cannot be written in SQL. That is usually no problem, because SQL is made to query databases. If you need procedural code, you can always write a database function in one of the many procedural languages available in PostgreSQL.
But recursive CTEs make SQL turing complete, that is, it can perform the same calculations as any other programming language. Obviously, such a “program” would often be quite complicated and inefficient, and this is a theoretical consideration and not of much practical value. Still, the previous examples showed that recursive CTEs can do useful work that you couldn’t otherwise perform in SQL.
As an example for the power of recursive queries, here is a recursive CTE that computes the first elements of the Fibonacci sequence:
WITH RECURSIVE fib AS ( SELECT 1 AS n, 1::bigint AS "fibₙ", 1::bigint AS "fibₙ₊₁" UNION ALL SELECT n+1, "fibₙ₊₁", "fibₙ" + "fibₙ₊₁" FROM fib ) SELECT n, "fibₙ" FROM fib LIMIT 20; n | fibₙ ----+------ 1 | 1 2 | 1 3 | 2 4 | 3 5 | 5 6 | 8 7 | 13 8 | 21 9 | 34 10 | 55 11 | 89 12 | 144 13 | 233 14 | 377 15 | 610 16 | 987 17 | 1597 18 | 2584 19 | 4181 20 | 6765 (20 rows)
This example also demonstrates how an endless loop can be avoided with a
LIMIT clause on the parent query.
You shouldn’t be afraid of recursive queries, even if the term “recursive” scares you. They are not that hard to understand, and they provide the simplest way to query databases containing hierarchies and graphs. In many cases they are also much more efficient than equivalent procedural code because they avoid repeated queries to the same tables.
Together with window functions, recursive queries are among the most powerful tools that SQL has to offer.