In version 9.6 PostgreSQL has introduced parallel queries. The ability to use more than just one CPU core per query is a giant leap forward and has made PostgreSQL an even more desirable database. With parallel queries many workloads can be speeded up considerably.

In this article, I want to point out a small missing feature, which has bugged one of our support clients recently. Consider the following example:

test=# CREATE TABLE t_demo AS
SELECT         id
FROM           generate_series(1, 10000000) AS id;

SELECT 10000000

The sample table contains 10 million rows.

To tell the optimizer about the content of the table, running ANALYZE seems like a good idea:

test=# ANALYZE t_demo;

Then users can tell PostgreSQL about the maximum number of cores allowed for the query we are about to run:

test=# SET max_parallel_workers_per_gather TO 4;

After that we can already run a simple aggregation:

test=# explain SELECT count(*) FROM t_demo;
                                        QUERY PLAN
 Finalize Aggregate  (cost=76498.57..76498.58 rows=1 width=8)
  ->  Gather  (cost=76498.15..76498.56 rows=4 width=8)
      Workers Planned: 4
      ->  Partial Aggregate  (cost=75498.15..75498.16 rows=1 width=8)
      ->  Parallel Seq Scan on t_demo  (cost=0.00..69248.12 rows=2500012 width=0)
(5 rows)

PostgreSQL will execute a so called parallel sequential scan using 4 CPU cores (= worker processes). The system will scale nicely and significantly better execution times can be observed.

SERIALIZABLE and parallelism

We have seen a couple of people using SERIALIZABLE transactions for analytical requests. While this might not be a good idea for other reasons there are also implications when it comes to the use of parallel queries. At this point (as of PostgreSQL 9.6) there is some code in PostgreSQL, which is not yet fully parallel safe. Therefore parallel queries are not yet supported if you happen to use SERIALIZABLE. If you are affected by this, consider using REPEATABLE READ.

Here is what happens:

test=# explain SELECT count(*) FROM t_demo;
                               QUERY PLAN
 Aggregate  (cost=169248.60..169248.61 rows=1 width=8)
  ->  Seq Scan on t_demo  (cost=0.00..144248.48 rows=10000048 width=0)
(2 rows)

test=# COMMIT;

As you can see, only a single core is used.

The code inside the query planner (planner.c) is very clear about this:

We can’t use parallelism in serializable mode because the predicate locking code is not parallel-aware.  It’s not catastrophic if someone tries to run a parallel plan in serializable mode; it just won’t get any workers and will run serially.  But it seems like a good heuristic to assume that the same serialization level will be in effect at plan time and execution time, so don’t generate a parallel plan if we’re in serializable mode.

Future versions of PostgreSQL will surely relax this at some point so that all isolation levels can benefit from parallel queries.