In PostgreSQL it is possible to write stored procedures in almost any widespread language such as Perl, Python or C. In general this offers a great deal of flexibility and acceptable performance. However, in some cases customers might say: “We have the feeling that procedures are slow”. The thing is: PostgreSQL might not be to blame for this observation – in many cases it is simply “pilot error”. Let me try to elaborate on one of the key issues related to bad performance.

How PostgreSQL treats procedures

As mentioned already it is possible to write stored procedures in basically any language. PostgreSQL simply passes the code of a procedure to the external language and takes back the result. In a way a stored procedure is a kind of black box – PostgreSQL rarely knows what is going on inside a procedure.

Here is an example:

test=# CREATE OR REPLACE FUNCTION mymax(int, int)
RETURNS int
AS
$$
  BEGIN
       RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
  END;
$$ LANGUAGE 'plpgsql';
CREATE FUNCTION

The results are not very spectacular:

test=# SELECT mymax(20, 30);
 mymax
-------
    30
(1 row)

The important thing here is: The PL/pgSQL function is a complete blackbox. The planner has no idea, what the “external” language does in this case. This has some important implication. Consider the following example:

test=# CREATE TABLE demo AS
	SELECT 	*
	FROM generate_series(1, 1000000) AS id;
SELECT 1000000
test=# CREATE INDEX idx_id ON demo(id);
CREATE INDEX

The table is already large enough to consider indexes:

test=# explain SELECT * FROM demo WHERE id = 20;
                      QUERY PLAN                               
---------------------------------------------------------------
 Index Only Scan using idx_id on demo  
	(cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 20)
(2 rows)

Functions in PostgreSQL are usually blackboxes

The problem is: The situation changes completely if we start to use the procedure I have just shown:

test=# explain SELECT *
	FROM 	demo
	WHERE 	id = mymax(20, 20);
                       QUERY PLAN                        
---------------------------------------------------------
 Seq Scan on demo  (cost=0.00..266925.00 rows=1 width=4)
   Filter: (id = mymax(20, 20))
(2 rows)

PostgreSQL has no idea that the function will return 20. It is clear to humans but nobody told the machine that this is guaranteed to happen. To PostgreSQL the result of a function is considered to be “volatile” – anything can happen. Therefore it cannot simply ask the index for the correct row. The first function call might not return the same as the second call – even if the parameters are identical. The optimizer has to play safe and will go for a sequential scan, which is definitely going to produce the correct result.

In PostgreSQL a function can be:

• VOLATILE
• STABLE
• IMMUTABLE

If a function is marked as VOLATILE it can return anything if you call it multiple times using the very same input parameters. In case of STABLE the function is going to return the same result given the same parameters within the same transaction.

The most prominent STABLE function is now(), which will always return the same result within the same transaction:

 test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:46.385457+01
(1 row)

test=# BEGIN;
BEGIN
test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:51.073123+01
(1 row)

test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:51.073123+01
(1 row)

test=# COMMIT;
COMMIT
test=# SELECT now();
              now              
-------------------------------
 2018-01-09 11:48:59.640697+01
(1 row)

Some functions are even IMMUTABLE: In this case the result given the same input parameters will be constant regardless of the transaction. Cosine would be an example in this case:

test=# SELECT cos(10), cos(20);
        cos         |        cos        
--------------------+-------------------
 -0.839071529076452 | 0.408082061813392
(1 row)

test=# SELECT cos(10), cos(20);
        cos         |        cos        
--------------------+-------------------
 -0.839071529076452 | 0.408082061813392
(1 row)

Even in the second transaction the cosine of a number will be the same.

Reducing the number of function calls

To fix our problem we have to drop the existing function …

test=# DROP FUNCTION mymax(int, int);
DROP FUNCTION

… and recreate it:

test=# CREATE OR REPLACE FUNCTION mymax(int, int)
RETURNS int
AS
$$
  BEGIN
       RETURN CASE WHEN $1 > $2 THEN $1 ELSE $2 END;
  END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
CREATE FUNCTION

The planner with detect that the function is supposed to return a static value given the same input values and go for an index scan:

test=# explain SELECT *
	FROM 	demo
	WHERE 	id = mymax(20, 20);
                      QUERY PLAN                               
------------------------------------------------------
 Index Only Scan using idx_id on demo  
	(cost=0.42..8.44 rows=1 width=4)
   Index Cond: (id = 20)
(2 rows)

Of course the index scan orders magnitude faster and will return in basically no time.

Detecting bottlenecks

Fortunately PostgreSQL has a system view, which might shed some light on functions, which might be a problem. The first thing you have to do is to set “track_functions = ‘all’ “ – it will tell PostgreSQL to collection function statistics:

test=# SELECT * FROM demo WHERE id = mymax(20, 20);
 id
----
 20
(1 row)

Once this setting has been turned on, pg_stat_user_functions will contain valuable information:

test=# SELECT * FROM pg_stat_user_functions ;
 funcid | schemaname | funcname | calls | total_time | self_time
--------+------------+----------+-------+------------+-----------
  16429 | public     | mymax    |     1 |      0.025 |     0.025
(1 row)

If you happen to see that a function is called insanely often, it can make sense to inspect it and check, if it happens to be VOLATILE for no good reason. Changing the function definition can significantly improve speed.

Creating indexes on functions in PostgreSQL

If you want to create indexes on a function, you have to make sure that the function itself is in fact IMMUTABLE. Otherwise PostgreSQL won’t create the index for you. The reason is simple: PostgreSQL has to make sure that the content of the index is stable and does not have to be changed over time if the underlying data is unchanged.