Propagating support functions
© Laurenz Albe 2019

 

PostgreSQL commit 74dfe58a5927b22c744b29534e67bfdd203ac028 has added “support functions”. This exciting new functionality that allows the optimizer some insight into functions. This article will discuss how this will improve query planning for PostgreSQL v12. If you are willing to write C code, you can also use this functionality for your own functions.

Functions as “black boxes”

Up to now, the PostgreSQL optimizer couldn’t really do a lot about functions. No matter how much it knew about the arguments of a function, it didn’t have the faintest clue about the function result. This also applied to built-in functions: no information about them was “wired into” the optimizer.

Let’s look at a simple example: language=”sql”

EXPLAIN SELECT * FROM unnest(ARRAY[1,2,3]);

                         QUERY PLAN
-------------------------------------------------------------
 Function Scan on unnest  (cost=0.00..1.00 rows=100 width=4)
(1 row)

PostgreSQL knows exactly that the array contains three elements. Still, it has no clue how many rows unnest will return, so it estimates an arbitrary 100 result rows. If this function invocation is part of a bigger SQL statement, the wrong result count can lead to a bad plan. The most common problem is that PostgreSQL will select bad join strategies based on wrong cardinality estimates. If you have ever waited for a nested loop join to finish that got 10000 instead of 10 rows in the outer relation, you know what I’m talking about.

There is the option to specify COST and ROWS on a function to improve the estimates. But you can only specify a constant there, which often is not good enough.

There were many other ways in which optimizer support for functions was lacking. This situation has been improved with support functions.

Support function syntax

The CREATE FUNCTION statement has been extended like this:

CREATE FUNCTION name (...) RETURNS ...
SUPPORT supportfunction
AS ...

This way a function gets a “support function” that knows about the function and can help the optimizer produce a better plan. Only a superuser can use the SUPPORT option.

Such a support function must have the signature

supportfunction(internal) RETURNS internal

internal” means that the function argument and return code are pointers to some C structure. That means that the function has to be written in C and is not callable from SQL.

When the optimizer considers some optimization for a function call, it invokes the support function. If the support function returns NULL to indicate it cannot help with that request, the optimizer goes ahead and plans as usual.

The optimizer can pass differenct C structures to the support function, depending on the optimization it considers. See src/include/nodes/supportnodes.h in the PostgreSQL source for details.

Optimizations a support function can provide

A support function can provide some or all of the following features:

Support for indexable expressions that replace or provide a filter for the function call

This in only called for functions that return boolean and are at the top level of a WHERE or JOIN condition, for example

SELECT ... FROM a JOIN b ON func(a.x, b.y)

or

SELECT ... FROM a WHERE func(a.x, a.y, a.z)

Sometimes it may be possible to replace the function call with an identical expression that can use an index scan. A trivial example would be int4eq(x, 42), which could be replaced by x = 42. Usually, though, the indexable expressions will not be able to replace the function call, but it can be useful as a “lossy” filter that significantly reduces the number of function calls that have to be performed.

One well-known example of such a lossy filter are LIKE expressions:

EXPLAIN SELECT * FROM person WHERE name LIKE 'alb%t';

                                  QUERY PLAN                                   
-------------------------------------------------------------------------------
 Index Scan using person_name_idx on person  (cost=0.29..8.31 rows=9 width=11)
   Index Cond: ((name ~>=~ 'alb'::text) AND (name ~<~ 'alc'::text))
   Filter: (name ~~ 'alb%t'::text)
(3 rows)

The two (byte-wise) comparisons can use an index scan, and they narrow down the search space. An additional filter removes the false positives.
Up to PostgreSQL v11, the optimizer had this knowledge wired in. From v12 on, the functions that implement the LIKE operator have support functions that contain this knowledge.

However, the main use case for this kind of support function will be PostGIS, and support functions were introduced specifically to help PostGIS. Up to now, functions like ST_Intersects() or ST_DWithin() used a trick to get index support: they were defined as SQL functions with a (lossy, but indexable) boundary box operator and an exact function. PostGIS relied on “function inlining” to get PostgreSQL to use an index. This was an ugly hack that caused problems, particularly with parallel queries. With PostgreSQL v12, PostGIS can use support functions to do this correctly.

Support for row count estimates

With a “set-returning function”, PostgreSQL calls the support function to get an estimate for the number of rows. This has been implemented for unnest in v12, so the example from the beginning will get the correct estimate:

EXPLAIN SELECT * FROM unnest(ARRAY[1,2,3]);

                        QUERY PLAN                         
-----------------------------------------------------------
 Function Scan on unnest  (cost=0.00..0.03 rows=3 width=4)
(1 row)

This provides a smarter alternative to the ROWS clause of CREATE FUNCTION.

Support for cost estimates

Similar to the above, a support function can also provide a smarter alternative to the COST clause of CREATE FUNCTION.

There is no example of such a function if the PostgreSQL v12 code base, except in the regression tests, but maybe there will be more in future releases.

Support for the selectivity of a WHERE condition

As we saw before, a function that returns a boolean can appear at the top level of a WHERE condition. An example would be

SELECT ... FROM a WHERE starts_with(a.x, 'alb')

Up to now, PostgreSQL had no idea how selective this condition is, so it simply estimated that it would filter out two thirds of the rows.

With PostgreSQL v12, you can define a support function that provides a better estimate for the selectivity of such a condition. Again, so far the only example for such a function is in the PostgreSQL regression tests.

Support for expression simplification

This kind of support function is called when the optimizer simplifies constant expressions, so it could be used to replace a function call with a simpler expression if one or more of its arguments are constants.

For example, an expression like x + 0 (which internally calls the function int4pl) could be replaced with x.

PostgreSQL already had such a feature (called transform functions), but that was not exposed at the SQL level. The place where this was used in the code (simplification of some type casts) has been changed to use a support function in v12.

Conclusion and outlook

Support functions open the field for much better optimizer support for functions. I imagime that they will prove useful for PostgreSQL’s built-in functions, as well as for third-party extensions.

There is a lot of low hanging fruit which might be harvested by beginners who want to get involved with PostgreSQL hacking:

  • Set returning functions could be provided with better row count and perhaps also cost estimates: the various generate_series functions, generate_subscripts, jsonb_populate_recordset and other JSON functions
  • starts_with could be supported by indexes, and the selectivity estimates could be improved, quite similar to LIKE