EXPLAIN that parameterized statement in PostgreSQL!

11.2022 | Category: Performance | Tags: execution plan

11.2022

Category: Performance

Tags: execution plan

the database consultant needs a plan to get a clue about the parameterized statement — © Laurenz Albe 2022

Table of Contents

For detailed query performance analysis, you need EXPLAIN (ANALYZE, BUFFERS) output for an SQL statement. With a parameterized statement, it can be difficult to construct a run-able statement for EXPLAIN (ANALYZE). Sometimes, you don't even know the parameter values. I'll show you how you can get at least plain EXPLAIN output for the generic plan in this case. Such an execution plan is better than nothing, and maybe even enough to guess what the problem is.

Parameterized statements

With the extended query protocol, PostgreSQL allows you to separate the SQL statement and the constants used in the statement. This enhances security, since it makes SQL injection impossible, but it is mostly a performance feature. Such parameterized statements can be named and reused with different parameter values, thereby saving the database engine the work of parsing the same statement over and over. What is more, using this feature PostgreSQL can sometimes avoid the much greater overhead of generating execution plans for each execution of the statement.

You typically encounter parameterized statements in two situations:

prepared statements with parameters (typically used through a client API)
static SQL statements in PL/pgSQL functions that use variables

The placeholders for the parameters are $1, $2 and so on. In the case of PL/pgSQL, you won't see them; the PL/pgSQL call handler will substitute these placeholders instead of the variable names. Note that parameterized statements can only be SELECT, INSERT, UPDATE, DELETE and VALUES statements.

Generic plans for parameterized statements

Normally, PostgreSQL generates an execution plan whenever a query is executed. However, PostgreSQL can cache execution plans for named parameterized statements. Plans are only ever cached inside a single database session; there is no shared memory plan cache.

If PostgreSQL thinks it can do so without hurting performance, it will start using a generic plan for the statement. Such an execution plan does not take the parameter values into account and can be reused. The decision whether to switch to a generic plan or not is based on a heuristic and is typically made when the statement executes for the sixth time. You can tell such a plan from the placeholders that it uses instead of the parameters. Here is an example using a prepared statement:

PREPARE stmt(text) AS SELECT oid FROM pg_class WHERE relname = $1;

EXPLAIN (COSTS OFF) EXECUTE stmt('pg_proc');

                       QUERY PLAN                        
═════════════════════════════════════════════════════════
 Index Scan using pg_class_relname_nsp_index on pg_class
   Index Cond: (relname = 'pg_proc'::text)
(2 rows)

1

2

3

4

5

6

7

8

9

PREPARE stmt(text) AS SELECT oid FROM pg_class WHERE relname = $1;

EXPLAIN (COSTS OFF) EXECUTE stmt('pg_proc');

QUERY PLAN

═════════════════════════════════════════════════════════

Index Scan using pg_class_relname_nsp_index on pg_class

Index Cond: (relname = 'pg_proc'::text)

(2 rows)

The next four executions of EXPLAIN look the same, but then we see

EXPLAIN (COSTS OFF) EXECUTE stmt('pg_attribute');
                       QUERY PLAN                        
═════════════════════════════════════════════════════════
 Index Scan using pg_class_relname_nsp_index on pg_class
   Index Cond: (relname = $1)
(2 rows)

1

2

3

4

5

6

EXPLAIN (COSTS OFF) EXECUTE stmt('pg_attribute');

QUERY PLAN

═════════════════════════════════════════════════════════

Index Scan using pg_class_relname_nsp_index on pg_class

Index Cond: (relname = $1)

(2 rows)

PostgreSQL has started using a generic plan! From that point on, the query planning time will become much shorter.

Forcing a generic plan for a parameterized statement

You can use the PostgreSQL parameter plan_cache_mode to influence the behavior described in the previous section. The default setting “auto” chooses the heuristic described above, where PostgreSQL decides after a few executions whether a generic plan would be beneficial or not.

With the setting “force_custom_plan” you can tell PostgreSQL to never use a generic plan. That is a good idea if the generic plan turns out to be not quite as good as PostgreSQL thought. It is also a good setting for data warehouses, where you normally run expensive analytical queries and saving on planning time is less important than getting the best possible execution plan.

Finally, the setting “force_generic_plan” makes PostgreSQL use a generic plan immediately. We will make use of that setting later.

Where can you encounter a parameterized statement?

Parameterized statements in the PostgreSQL log

A parameterized statement looks like this in the log:

LOG:  duration: 0.012 ms  execute stmt: SELECT oid FROM pg_class WHERE relname = $1
DETAIL:  parameters: $1 = 'pg_proc'

1 2	LOG: duration: 0.012 ms execute stmt: SELECT oid FROM pg_class WHERE relname = $1 DETAIL: parameters: $1 = 'pg_proc'

Usually, the parameters are logged as a detail message, but if there are many of them, it can be a considerable effort to replace all the placeholders with the parameter values. You also don't see the parameter data type in the log, so you may have to look up the table definition to know whether you should write 42 or '42'. If the statement caused an error and you didn't set log_parameter_max_length_on_error to a nonzero value, you won't get the parameters logged at all:

ERROR:  canceling statement due to statement timeout
STATEMENT:  SELECT oid FROM pg_class WHERE relname = $1

1 2	ERROR: canceling statement due to statement timeout STATEMENT: SELECT oid FROM pg_class WHERE relname = $1

Parameterized statements in pg_stat_statements

pg_stat_statements is the Swiss army knife for analyzing database workloads. One of its features is that it ignores the values of constants, so that statements that only differ in constants are aggregated together. So if you query the pg_stat_statements view, you will see placeholders even in statements that were not parameterized originally. Also, since pg_stat_statements collects statistics over many executions of a statement, it doesn't collect the actual parameter values for any of them.

The need for a generic plan

If you find a problem statement in the log or in pg_stat_statements, you want to analyze its performance. In order to do that, you have to guess appropriate parameter values, so that you can get an execution plan with EXPLAIN (ANALYZE, BUFFERS). This can be tedious and take a long time.

For a first analysis, it is helpful to see the execution plan as generated by EXPLAIN (without ANALYZE). Since “plain” EXPLAIN does not execute the query, it should not depend on the actual parameter values, as long as we are happy with the generic plan. Unfortunately, EXPLAIN refuses to generate a generic plan for a parameterized statement:

EXPLAIN SELECT oid FROM pg_class WHERE relname = $1;
ERROR:  there is no parameter $1
LINE 1: EXPLAIN SELECT oid FROM pg_class WHERE relname = $1;
                                                         ^

1

2

3

4

EXPLAIN SELECT oid FROM pg_class WHERE relname = $1;

ERROR: there is no parameter $1

LINE 1: EXPLAIN SELECT oid FROM pg_class WHERE relname = $1;

^

We get that error even if we set plan_cache_mode = force_generic_plan.

Generating a generic plan for a parameterized statement using `PREPARE`

We want to do better than that, and we can. Using PREPARE, we can create a prepared statement with a placeholder:

PREPARE stmt(name) AS SELECT oid FROM pg_class WHERE relname = $1;

1	PREPARE stmt(name) AS SELECT oid FROM pg_class WHERE relname = $1;

Now we can force a generic plan and EXPLAIN the prepared statement. We can supply NULL as parameter value, since NULL exists for every data type, and the parameter value is ignored anyway:

SET plan_cache_mode = force_generic_plan;

EXPLAIN EXECUTE stmt(NULL);
                                        QUERY PLAN                                         
═══════════════════════════════════════════════════════════════════════════════════════════
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.29 rows=1 width=4)
   Index Cond: (relname = $1)
(2 rows)

DEALLOCATE stmt;

1

2

3

4

5

6

7

8

9

10

SET plan_cache_mode = force_generic_plan;

EXPLAIN EXECUTE stmt(NULL);

QUERY PLAN

═══════════════════════════════════════════════════════════════════════════════════════════

Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=4)

Index Cond: (relname = $1)

(2 rows)

DEALLOCATE stmt;

The only remaining fly in the ointment is that we have to figure out the appropriate data types for the parameters.

Using the pseudo-type `unknown` with a parameterized statement

Pseudo-types are data types that you cannot use in table definitions. One of these data types is “unknown”: It is used during query parsing for string constants whose data type must be resolved later, based on the context. We can use unknown as a data type for query parameters and let PostgreSQL figure out the appropriate data type itself:

PREPARE stmt(unknown) AS SELECT oid FROM pg_class WHERE relname = $1;

SET plan_cache_mode = force_generic_plan;

EXPLAIN EXECUTE stmt(NULL);
                                        QUERY PLAN                                         
═══════════════════════════════════════════════════════════════════════════════════════════
 Index Scan using pg_class_relname_nsp_index on pg_class  (cost=0.28..8.29 rows=1 width=4)
   Index Cond: (relname = $1)
(2 rows)

DEALLOCATE stmt;

1

2

3

4

5

6

7

8

9

10

11

12

PREPARE stmt(unknown) AS SELECT oid FROM pg_class WHERE relname = $1;

SET plan_cache_mode = force_generic_plan;

EXPLAIN EXECUTE stmt(NULL);

QUERY PLAN

═══════════════════════════════════════════════════════════════════════════════════════════

Index Scan using pg_class_relname_nsp_index on pg_class (cost=0.28..8.29 rows=1 width=4)

Index Cond: (relname = $1)

(2 rows)

DEALLOCATE stmt;

Putting it all together in an extension

Now we have a simple algorithm to get the generic plan of a parameterized statement:

count the number of parameters
create a prepared statement with that many “unknown” parameters
set “plan_cache_mode” to “force_generic_plan”
EXPLAIN the prepared statements using NULLs as arguments

I wrapped all this into a function and wrote the extension generic_plan. It is written in PL/pgSQL and does not require superuser permissions to install. Here you can see it in action:

CREATE EXTENSION IF NOT EXISTS generic_plan;

SELECT generic_plan('SELECT * FROM pg_sequences WHERE max_value < last_value + $1');
                                        generic_plan                                         
═════════════════════════════════════════════════════════════════════════════════════════════
 Subquery Scan on pg_sequences  (cost=1.09..24.10 rows=1 width=245)
   Filter: (pg_sequences.max_value < (pg_sequences.last_value + $1))
   ->  Nested Loop  (cost=1.09..24.09 rows=1 width=245)
         Join Filter: (c.oid = s.seqrelid)
         ->  Seq Scan on pg_sequence s  (cost=0.00..1.06 rows=6 width=49)
         ->  Materialize  (cost=1.09..22.76 rows=3 width=136)
               ->  Hash Join  (cost=1.09..22.74 rows=3 width=136)
                     Hash Cond: (c.relnamespace = n.oid)
                     ->  Seq Scan on pg_class c  (cost=0.00..21.62 rows=8 width=76)
                           Filter: (relkind = 'S'::'char')
                     ->  Hash  (cost=1.05..1.05 rows=3 width=68)
                           ->  Seq Scan on pg_namespace n  (cost=0.00..1.05 rows=3 width=68)
                                 Filter: (NOT pg_is_other_temp_schema(oid))
(13 rows)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

CREATE EXTENSION IF NOT EXISTS generic_plan;

SELECT generic_plan('SELECT * FROM pg_sequences WHERE max_value < last_value + $1');

generic_plan

═════════════════════════════════════════════════════════════════════════════════════════════

Subquery Scan on pg_sequences (cost=1.09..24.10 rows=1 width=245)

Filter: (pg_sequences.max_value < (pg_sequences.last_value + $1))

-> Nested Loop (cost=1.09..24.09 rows=1 width=245)

Join Filter: (c.oid = s.seqrelid)

-> Seq Scan on pg_sequence s (cost=0.00..1.06 rows=6 width=49)

-> Materialize (cost=1.09..22.76 rows=3 width=136)

-> Hash Join (cost=1.09..22.74 rows=3 width=136)

Hash Cond: (c.relnamespace = n.oid)

-> Seq Scan on pg_class c (cost=0.00..21.62 rows=8 width=76)

Filter: (relkind = 'S'::'char')

-> Hash (cost=1.05..1.05 rows=3 width=68)

-> Seq Scan on pg_namespace n (cost=0.00..1.05 rows=3 width=68)

Filter: (NOT pg_is_other_temp_schema(oid))

(13 rows)

Conclusion

It can be complicated to collect parameter values to analyze the execution of a parameterized statement, but using the generic_plan extension we can at least get a generic plan easily. The tricks used are a prepared statement with parameters of type “unknown”, adjusting plan_cache_mode and using NULLs as parameter values.

In case you're interested in learning more about parameters, check out my blog on Query Parameter Data Types and Performance.

3 responses to “EXPLAIN that parameterized statement in PostgreSQL!”

Andrew Kozin says:

November 23, 2022 at 9:55 am

But wouldn't the usage of NULL for building the "generic" plan shift the estimation the planner makes?

As far as I understood, the generic plan is just the stable result of planning execution of the first 5 requests with some real data. And because the planner takes those in consideration, plans for NULL can be quite different than for some (or any) non-null parameter values.

For example, suppose the table with 2M rows has 1k values per non-empty key, and 1M rows with nulls. Would the "generic" plans for null key the same as for the 6-th plan after 5 requests with non-null keys?

Reply
- laurenz says:
  
  November 23, 2022 at 10:08 am
  
  If I force a generic plan with plan_cache_mode = force_generic_plan, PostgreSQL won't even look at the parameter values. So the NULLs cannot influence the plan that is created. There is only one generic plan for a query, so yes, that would be the same plan that PostgreSQL generates at the sixth execution.
  
  Reply
Fabrice says:

January 22, 2025 at 9:29 am

Hello,

Thanks for this excellent blog.
I try to test your example on my instance but I cannot see in postgres log the DETAIL section for the bind variables, I got just this:
2025-01-22 10:15:25.105 CET [595604] LOG: statement: PREPARE stmt(text) AS SELECT oid FROM pg_class WHERE relname = $1;
2025-01-22 10:15:25.106 CET [595604] LOG: duration: 0.487 ms
2025-01-22 10:15:29.505 CET [595604] LOG: statement: EXECUTE stmt('pg_proc');
2025-01-22 10:15:29.505 CET [595604] DETAIL: prepare: PREPARE stmt(text) AS SELECT oid FROM pg_class WHERE relname = $1;

I used these parameters:
logging_collector = on
log_min_duration_statement = 0
log_parameter_max_length = -1

Dis I miss something.

Regards,

Fabrice

Reply

Leave a Reply Cancel reply

Stay tuned with our

NEWSLETTER

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0

office@cybertec-postgresql.com

ISO_27001_Badge

Customer Support

Support Platform

Services

Support CYBERTEC Partner PostgreSQL Books

Company

STAY TUNED WITH OUR NEWSLETTER

Get the newest PostgreSQL Info & Tools

Data Protection Policy Terms and conditions Terms of Service Imprint

©

2025

CYBERTEC PostgreSQL International GmbH