It happens quite frequently that people complain about stored procedure performance in PostgreSQL. In many cases the reason for bad performance can be explained quite nicely when looking at the definition of a function.

In general, a PostgreSQL function can be marked as follows: VOLATILE, STABLE, IMMUTABLE or [NOT] LEAKPROOF. What does this actually mean?

To show which impact this can have, we can create a small table consisting of just integer values:

test=# CREATE TABLE t_test AS
SELECT *
FROM generate_series(1, 1000000) AS id;
SELECT 1000000

In our example we have added 1 mio rows to the table. In the next step we can define a simple index:

test=# CREATE INDEX idx_test ON t_test (id);
CREATE INDEX

The point now is: Many complaints about poor performance arise when people use the output of a function and compare it to a column as shown in the following listing:

test=# explain SELECT * FROM t_test WHERE id = round(17.5, 0)::int4;
QUERY PLAN
—————————————————————————-
Index Only Scan using idx_test on t_test (cost=0.00..8.38 rows=1 width=4)
Index Cond: (id = 18)
(2 rows)

We want to check, if any of the rows is identical to round(17.5, 0)::int4. As you can see PostgreSQL can use an index nicely. The reason for that is that the round function will ALWAYS return 18 if you pass 17.5 as parameter. It is simply a mathematical fact. Technically this means that PostgreSQL can calculate the function ONCE and use it to search the index.
Inside PostgreSQL the function is marked a IMMUTABLE. Its output will never change:

test=# SELECT proname, provolatile FROM pg_proc WHERE proname = ’round’ LIMIT 1;
proname | provolatile
———+————-
round | i
(1 row)

What happens if we try the very same thing using the random() function? The execution plan will be very different:

test=# explain SELECT * FROM t_test WHERE id = random()::int4;
QUERY PLAN
———————————————————-
Seq Scan on t_test (cost=0.00..21925.00 rows=1 width=4)
Filter: (id = (random())::integer)
(2 rows)

In this case we have to read the entire table to find the right answer. Reading 1 mio rows is expensive and therefore it is not a good strategy. The problem is that there is no way for the optimizer to use an index. The random() function will change its result everytime it is called. So, which value would you look up in the index? The answer is: There is no way to use the index because everytime you are inspecting a row the output of random() will already be something else. This behavior is called VOLATILE:

test=# SELECT proname, provolatile FROM pg_proc WHERE proname = ‘random’ LIMIT 1;
proname | provolatile
———+————-
random | v
(1 row)

The performance difference between a VOLATILE and an IMMUTABLE function will be substantial because the VOLATILE function has to be called 1 mio times:

test=# explain analyze SELECT * FROM t_test WHERE id = random()::int4;
QUERY PLAN
——————————————————————————————————–
Seq Scan on t_test (cost=0.00..21925.00 rows=1 width=4) (actual time=244.884..244.884 rows=0 loops=1)
Filter: (id = (random())::integer)
Rows Removed by Filter: 1000000
Total runtime: 244.947 ms
(4 rows)

test=# explain analyze SELECT * FROM t_test WHERE id = round(17.5, 0)::int4;
QUERY PLAN
———————————————————————————————————————-
Index Only Scan using idx_test on t_test (cost=0.00..8.38 rows=1 width=4) (actual time=1.064..1.065 rows=1 loops=1)
Index Cond: (id = 18)
Heap Fetches: 1
Total runtime: 1.084 ms
(4 rows)

In our example we can see an execution time, which is 240 times higher than in the index optimized case.

Tweaking your own functions

So, if you are writing your own procedures – please never forget to mark your function appropriately. Otherwise the function might be called ways too often, which is bad for performance. Defining your function properly is a very simple tweak offering great potential.

Here is how this can be done:

test=# CREATE OR REPLACE FUNCTION ld(int)
RETURNS numeric
AS ‘ SELECT log(2, $1) ‘
LANGUAGE sql IMMUTABLE;
CREATE FUNCTION