Avoiding unnecessary function calls in PostgreSQL

01.2018 | Category: Performance | Tags: development, performance tuning

01.2018

It is possible to write functions in PostgreSQL in almost any widespread language such as Perl, Python or C. In general this offers a great deal of flexibility and acceptable performance. However, in some cases customers might say: “We have the feeling that procedures are slow”. The thing is: PostgreSQL might not be to blame for this observation - in many cases it is simply “pilot error”. Let me try to elaborate on the key issue of function calls related to bad performance.

Table of Contents

How PostgreSQL treats procedures

As mentioned already, it is possible to write functions in basically any language. PostgreSQL simply passes the code of a function to the external language and takes back the result. In a way a function is a kind of black box - PostgreSQL rarely knows what is going on inside a procedure.

Here is an example of a function call:

test=# CREATE OR REPLACE FUNCTION mymax(int, int)
RETURNS int
AS
$
  BEGIN
       RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
  END;
$ LANGUAGE 'plpgsql';
CREATE FUNCTION

test=# CREATE OR REPLACE FUNCTION mymax(int, int)

RETURNS int

BEGIN

RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END;

END;

$ LANGUAGE 'plpgsql';

CREATE FUNCTION

The results are not very spectacular:

test=# SELECT mymax(20, 30);
 mymax
-------
    30
(1 row)

test=# SELECT mymax(20, 30);

mymax

-------

(1 row)

The important thing here is: The PL/pgSQL function is a complete black box. The planner has no idea what the “external” language does in this case. This has some important implication.

Consider the following example:

test=# CREATE TABLE demo AS
	SELECT 	*
	FROM generate_series(1, 1000000) AS id;
SELECT 1000000
test=# CREATE INDEX idx_id ON demo(id);
CREATE INDEX

test=# CREATE TABLE demo AS

SELECT *

FROM generate_series(1, 1000000) AS id;

SELECT 1000000

test=# CREATE INDEX idx_id ON demo(id);

CREATE INDEX

The table is already large enough to consider indexes:

test=# explain SELECT * FROM demo WHERE id = 20;
                      QUERY PLAN                               
---------------------------------------------------------------
 Index Only Scan using idx_id on demo  
	(cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 20)
(2 rows)

test=# explain SELECT * FROM demo WHERE id = 20;

QUERY PLAN

---------------------------------------------------------------

Index Only Scan using idx_id on demo

(cost=0.42..8.44 rows=1 width=4)

Index Cond: (id = 20)

(2 rows)

Function calls in PostgreSQL are usually black boxes

The problem is: The situation changes completely if we start to use the function I have just shown:

test=# explain SELECT *
	FROM 	demo
	WHERE 	id = mymax(20, 20);
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on demo  (cost=0.00..266925.00 rows=1 width=4)
   Filter: (id = mymax(20, 20))
(2 rows)

test=# explain SELECT *

FROM demo

WHERE id = mymax(20, 20);

QUERY PLAN

---------------------------------------------------------

Seq Scan on demo (cost=0.00..266925.00 rows=1 width=4)

Filter: (id = mymax(20, 20))

(2 rows)

PostgreSQL has no idea that the function will return 20. It is clear to humans, but nobody told the machine that this is guaranteed to happen. To PostgreSQL, the result of a function is considered to be “volatile” - anything can happen. Therefore, it cannot simply ask the index for the correct row. The first function call might not return the same as the second call - even if the parameters are identical. The optimizer has to play it safe and will go for a sequential scan, which is definitely going to produce the correct result.

In PostgreSQL a function can be:

• VOLATILE
• STABLE
• IMMUTABLE

If a function is marked as VOLATILE, it can return anything if you call it multiple times using the very same input parameters. In case of STABLE the function is going to return the same result given the same parameters within the same transaction.

The most prominent STABLE function is now(), which will always return the same result within the same transaction:

 test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:46.385457+01
(1 row)

test=# BEGIN;
BEGIN
test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:51.073123+01
(1 row)

test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:51.073123+01
(1 row)

test=# COMMIT;
COMMIT
test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:59.640697+01
(1 row)

test=# SELECT now();

now

-------------------------------

2018-01-09 11:48:46.385457+01

(1 row)

test=# BEGIN;

BEGIN

test=# SELECT now();

now

-------------------------------

2018-01-09 11:48:51.073123+01

(1 row)

test=# SELECT now();

now

-------------------------------

2018-01-09 11:48:51.073123+01

(1 row)

test=# COMMIT;

COMMIT

test=# SELECT now();

now

-------------------------------

2018-01-09 11:48:59.640697+01

(1 row)

Some functions are even IMMUTABLE: In this case the result given the same input parameters will be constant, regardless of the transaction. Cosine would be an example of one of these function calls:

test=# SELECT cos(10), cos(20);
        cos         |        cos        
--------------------+-------------------
 -0.839071529076452 | 0.408082061813392
(1 row)

test=# SELECT cos(10), cos(20);
        cos         |        cos        
--------------------+-------------------
 -0.839071529076452 | 0.408082061813392
(1 row)

test=# SELECT cos(10), cos(20);

cos | cos

--------------------+-------------------

-0.839071529076452 | 0.408082061813392

(1 row)

test=# SELECT cos(10), cos(20);

cos | cos

--------------------+-------------------

-0.839071529076452 | 0.408082061813392

(1 row)

Even in the second transaction the cosine of a number will be the same.

Reducing the number of function calls

To fix our problem we have to drop the existing function …

test=# DROP FUNCTION mymax(int, int);
DROP FUNCTION

1 2	test=# DROP FUNCTION mymax(int, int); DROP FUNCTION

… and recreate it:

test=# CREATE OR REPLACE FUNCTION mymax(int, int)
RETURNS int
AS
$
  BEGIN
       RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
  END;
$ LANGUAGE 'plpgsql' IMMUTABLE;
CREATE FUNCTION

test=# CREATE OR REPLACE FUNCTION mymax(int, int)

RETURNS int

BEGIN

RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END;

END;

$ LANGUAGE 'plpgsql' IMMUTABLE;

CREATE FUNCTION

The planner will detect that the function is supposed to return a static value given the same input values and go for an index scan:

test=# explain SELECT *
	FROM 	demo
	WHERE 	id = mymax(20, 20);
                      QUERY PLAN                               
------------------------------------------------------
 Index Only Scan using idx_id on demo  
	(cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 20)
(2 rows)

test=# explain SELECT *

FROM demo

WHERE id = mymax(20, 20);

QUERY PLAN

------------------------------------------------------

Index Only Scan using idx_id on demo

(cost=0.42..8.44 rows=1 width=4)

Index Cond: (id = 20)

(2 rows)

Of course the index scan orders magnitude faster and will return in basically no time.

Detecting bottlenecks

Fortunately PostgreSQL has a system view, which might shed some light on functions, that could be a problem. The first thing you have to do is to set “track_functions = 'all' “ - it will tell PostgreSQL to collection function statistics:

test=# SELECT * FROM demo WHERE id = mymax(20, 20);
 id
----
 20
(1 row)

test=# SELECT * FROM demo WHERE id = mymax(20, 20);

----

(1 row)

Once this setting has been turned on, pg_stat_user_functions will contain valuable information:

test=# SELECT * FROM pg_stat_user_functions ;
 funcid | schemaname | funcname | calls | total_time | self_time
--------+------------+----------+-------+------------+-----------
  16429 | public     | mymax    |     1 |      0.025 |     0.025
(1 row)

test=# SELECT * FROM pg_stat_user_functions ;

--------+------------+----------+-------+------------+-----------

16429 | public | mymax | 1 | 0.025 | 0.025

(1 row)

If you happen to see that a function is called insanely often, it can make sense to inspect it and check, if it happens to be VOLATILE for no good reason. Changing the function definition can significantly improve speed.

Creating indexes on functions in PostgreSQL

If you want to create indexes on a function, you have to make sure that the function itself is in fact IMMUTABLE. Otherwise PostgreSQL won't create the index for you. The reason is simple: PostgreSQL has to make sure that the content of the index is stable and does not have to be changed over time if the underlying data is unchanged.

Read the latest blogs about the PostgreSQL optimizer, or find out more about fixing slow queries.

One response to “Avoiding unnecessary function calls in PostgreSQL”

bobby_r says:

February 1, 2018 at 1:40 pm

Your posts are great, full of practical tips. Thanks so much!

Reply

Avoiding unnecessary function calls in PostgreSQL

How PostgreSQL treats procedures

The results are not very spectacular:

Consider the following example:

Function calls in PostgreSQL are usually black boxes

Reducing the number of function calls

Detecting bottlenecks

Creating indexes on functions in PostgreSQL

One response to “Avoiding unnecessary function calls in PostgreSQL”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts