In PostgreSQL, functions and procedures can be written in many different programming languages including but not limited to PL/pgSQL, PL/Perl, PL/Python and a lot more. This flexibility gives the end user the choice to write code in languages of their choice. However, this flexibility has some implications which are often forgotten. Loading libraries early is one of them.

Executing functions and procedures in PostgreSQL

To understand how PostgreSQL works, we first got to take a look at a simple function call. The next listing shows some simple PostGIS code:

test=# \timing 
Timing is on.
test=# SELECT * FROM hans.points WHERE id = 1;
 id │                         p                          
════╪════════════════════════════════════════════════════
  1 │ 0101000020E610000097515B9536C33140A252824D6FDC1440
(1 row)

Time: 10.004 ms
test=# SELECT * FROM hans.points WHERE id = 1;
 id │                  *       p                          
════╪════════════════════════════════════════════════════
  1 │ 0101000020E610000097515B9536C33140A252824D6FDC1440
(1 row)

Time: 0.664 ms

In a standard PostgreSQL database connection, PostGIS is not available yet. In fact, it has to be loaded when PostGIS is used for the first time inside a connection. What we see is that the second call is WAY faster than the first one because during the first call the library is loaded and initialized. After a database restart the first can be a major performance problem. Consider: After a restart caches are cold, libraries are not yet loaded and the connections have to be established – this can cause a fairly nasty cocktail of issues which all happen at the same time.

Loading libraries early

One way to address the problem is to make use of a fairly old feature in PostgreSQL: The session_preload_library can ensure that the library is already there when the connection is used for the first time. Of course we will see some serious performance improvements:


> PGOPTIONS='-c session_preload_libraries=postgis-3' psql -U postgres
…

test=# \timing 
Timing is on.
test=# SELECT * FROM hans.points WHERE id = 1;
 id │                         p                          
════╪════════════════════════════════════════════════════
  1 │ 0101000020E610000097515B9536C33140A252824D6FDC1440
(1 row)

Time: 2.809 ms
test=# SELECT * FROM hans.points WHERE id = 1;
 id │                         p                          
════╪════════════════════════════════════════════════════
  1 │ 0101000020E610000097515B9536C33140A252824D6FDC1440
(1 row)

Time: 0.674 ms

The first call is way faster than before. The second call is still a lot slower than the first invocation because we still have to do some init work. However, we have shaved off a fair amount of overhead for the first call which can be incredibly beneficial in case of reboots, cluster failures and so on.

Conclusion

All server side languages are loaded as libraries and the same is true for extensions and many others things. session_preload_libraries can therefore be really helpful to speed up the first call inside a database connection.

Read more about optimization in PostgreSQL.