Functions: The most widely ignored performance tweak

It happens quite frequently that people complain about stored procedure performance in PostgreSQL. In many cases the reason for bad performance can be explained quite nicely when looking at the definition of a function.

Table of Contents

In general, a PostgreSQL function can be marked as follows: VOLATILE, STABLE, IMMUTABLE or [NOT] LEAKPROOF.

What does this actually mean?

To show which impact this can have, we can create a small table consisting of just integer values:

test=# CREATE TABLE t_test AS 
         SELECT * 
         FROM   generate_series(1, 1000000) AS id;
SELECT 1000000

test=# CREATE TABLE t_test AS

SELECT *

FROM generate_series(1, 1000000) AS id;

SELECT 1000000

In our example we have added 1 mio rows to the table. In the next step we can define a simple index:

test=# CREATE INDEX idx_test ON t_test (id);
CREATE INDEX

1 2	test=# CREATE INDEX idx_test ON t_test (id); CREATE INDEX

The point now is: Many complaints about poor performance arise when people use the output of a function and compare it to a column as shown in the following listing:

test=# explain SELECT * 
               FROM   t_test 
               WHERE  id = round(17.5, 0)::int4;
                                 QUERY PLAN 
----------------------------------------------------------------------------
Index Only Scan using idx_test on t_test (cost=0.00..8.38 rows=1 width=4)
   Index Cond: (id = 18)
(2 rows)

test=# explain SELECT *

FROM t_test

WHERE id = round(17.5, 0)::int4;

QUERY PLAN

----------------------------------------------------------------------------

Index Only Scan using idx_test on t_test (cost=0.00..8.38 rows=1 width=4)

Index Cond: (id = 18)

(2 rows)

We want to check, if any of the rows is identical to round(17.5, 0)::int4. As you can see PostgreSQL can use an index nicely. The reason for that is that the round function will ALWAYS return 18 if you pass 17.5 as parameter. It is simply a mathematical fact. Technically this means that PostgreSQL can calculate the function ONCE and use it to search the index.
Inside PostgreSQL the function is marked an IMMUTABLE. Its output will never change:

test=# SELECT proname, provolatile FROM pg_proc WHERE proname = 'round' LIMIT 1;
 proname | provolatile 
---------+-------------
   round |           i
(1 row)

test=# SELECT proname, provolatile FROM pg_proc WHERE proname = 'round' LIMIT 1;

proname | provolatile

---------+-------------

round | i

(1 row)

What happens if we try the very same thing using the random() function? The execution plan will be very different:

test=# explain SELECT * 
               FROM   t_test 
               WHERE  id = random()::int4;
                        QUERY PLAN 
----------------------------------------------------------
 Seq Scan on t_test (cost=0.00..21925.00 rows=1 width=4)
     Filter: (id = (random())::integer)
(2 rows)

test=# explain SELECT *

FROM t_test

WHERE id = random()::int4;

QUERY PLAN

----------------------------------------------------------

Seq Scan on t_test (cost=0.00..21925.00 rows=1 width=4)

Filter: (id = (random())::integer)

(2 rows)

In this case we have to read the entire table to find the right answer. Reading 1 mio rows is expensive and therefore it is not a good strategy. The problem is that there is no way for the optimizer to use an index. The random() function will change its result everyt ime it is called. So, which value would you look up in the index? The answer is: There is no way to use the index because every time you are inspecting a row the output of random() will already be something else. This behavior is called VOLATILE:

test=# SELECT proname, provolatile 
       FROM   pg_proc 
       WHERE  proname = 'random' 
       LIMIT 1;
 proname | provolatile 
---------+-------------
  random | v
(1 row)

test=# SELECT proname, provolatile

FROM pg_proc

WHERE proname = 'random'

LIMIT 1;

proname | provolatile

---------+-------------

random | v

(1 row)

The performance difference between a VOLATILE and an IMMUTABLE function will be substantial because the VOLATILE function has to be called 1 mio times:

test=# explain analyze SELECT * FROM t_test WHERE id = random()::int4;
                              QUERY PLAN 
------------------------------------------------------------------
 Seq Scan on t_test (cost=0.00..21925.00 rows=1 width=4) 
                    (actual time=244.884..244.884 rows=0 loops=1)
     Filter: (id = (random())::integer)
     Rows Removed by Filter: 1000000
Total runtime: 244.947 ms
(4 rows)

test=# explain analyze SELECT * FROM t_test WHERE id = random()::int4;

QUERY PLAN

------------------------------------------------------------------

Seq Scan on t_test (cost=0.00..21925.00 rows=1 width=4)

(actual time=244.884..244.884 rows=0 loops=1)

Filter: (id = (random())::integer)

Rows Removed by Filter: 1000000

Total runtime: 244.947 ms

(4 rows)

test=# explain analyze SELECT * 
                       FROM   t_test 
                       WHERE  id = round(17.5, 0)::int4;
                     QUERY PLAN 
------------- ---------------------------------------
 Index Only Scan using idx_test on t_test 
          (cost=0.00..8.38 rows=1 width=4) 
          (actual time=1.064..1.065 rows=1 loops=1)
       Index Cond: (id = 18)
       Heap Fetches: 1
Total runtime: 1.084 ms
(4 rows)

test=# explain analyze SELECT *

FROM t_test

WHERE id = round(17.5, 0)::int4;

QUERY PLAN

------------- ---------------------------------------

Index Only Scan using idx_test on t_test

(cost=0.00..8.38 rows=1 width=4)

(actual time=1.064..1.065 rows=1 loops=1)

Index Cond: (id = 18)

Heap Fetches: 1

Total runtime: 1.084 ms

(4 rows)

In our example we can see an execution time, which is 240 times higher than in the index optimized case.

Tweaking your own functions

So, if you are writing your own procedures – please never forget to mark your function appropriately. Otherwise the function might be called ways too often, which is bad for performance. Defining your function properly is a very simple tweak offering great potential.

Here is how this can be done:

test=# CREATE OR REPLACE FUNCTION ld(int) 
RETURNS numeric 
AS ' SELECT log(2, $1) ' 
LANGUAGE sql IMMUTABLE;
CREATE FUNCTION

test=# CREATE OR REPLACE FUNCTION ld(int)

RETURNS numeric

AS ' SELECT log(2, $1) '

LANGUAGE sql IMMUTABLE;

CREATE FUNCTION

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Comments are closed.

Stay tuned with our

Functions: The most widely ignored performance tweak

What does this actually mean?

Tweaking your own functions

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts