Detect PostgreSQL performance problems with ease: Is there a single significantly large and important database in the world which does not suffer from performance problems once in a while? I bet that there are not too many. Therefore, every DBA (database administrator) in charge of PostgreSQL should know how to track down potential performance problems to figure out what is really going on.
Improving PostgreSQL performance beyond parameter tuning
Many people think that changing parameters in postgresql.conf is the real way to success. However, this is not always the case. Sure, more often than not, good database config parameters are highly beneficial. Still, in many cases the real problems will come from some strange query hidden deep in some application logic. It is even quite likely that those queries causing real issues are not the ones you happen to focus on. The natural question now arising is: How can we track down those queries and figure out, what is really going on? My favorite tool to do that is pg_stat_statements, which should always be enabled in my judgement if you are using PostgreSQL 9.2 or higher (please do not use it in older versions).
To enable pg_stat_statements on your server change the following line in postgresql.conf and restart PostgreSQL:
shared_preload_libraries = ‘pg_stat_statements’
Once this module has been loaded into the server, PostgreSQL will automatically start to collect information. The good thing is that the overhead of the module is really really low (the overhead is basically just “noise”).
Then run the following command to create the necessary view to access the data:
CREATE EXTENSION pg_stat_statements;
The extension will deploy a view called pg_stat_statements and make the data easily accessible.
Detecting slow queries in PostgreSQL
The easiest way to find the most interesting queries is to sort the output of pg_stat_statements by total_time:
SELECT * FROM pg_stat_statements ORDER BY total_time DESC;
The beauty here is that the type of query, which is consuming most of time, will naturally show up on top of the listing. The best way is to work your way down from the first to the, say, 10th query and see, what is going on there.
In my judgement there is no way to tune a system without looking at the most time-consuming queries on the database server.
Taking a deep look at PostgreSQL performance problems
pg_stat_statements has a lot more to offer than just the query and the time it has eaten. Here is the structure of the view:
test=# \d pg_stat_statements View "public.pg_stat_statements" Column | Type | Collation | Nullable | Default ---------------------+------------------+-----------+----------+--------- userid | oid | | | dbid | oid | | | queryid | bigint | | | query | text | | | calls | bigint | | | total_time | double precision | | | min_time | double precision | | | max_time | double precision | | | mean_time | double precision | | | stddev_time | double precision | | | rows | bigint | | | shared_blks_hit | bigint | | | shared_blks_read | bigint | | | shared_blks_dirtied | bigint | | | shared_blks_written | bigint | | | local_blks_hit | bigint | | | local_blks_read | bigint | | | local_blks_dirtied | bigint | | | local_blks_written | bigint | | | temp_blks_read | bigint | | | temp_blks_written | bigint | | | blk_read_time | double precision | | | blk_write_time | double precision | | |
It can be quite useful to take a look at the stddev_time column as well. It will tell you if queries of a certain type tend to have similar runtimes or not. If the standard deviation is high you can expect some of those queries to be fast and some of them to be slow, which might lead to bad user experience.
The “rows” column can also be quite informative. Suppose 1000 calls have returned 1.000.000.000 rows: It actually means that every call has returned 1 million rows in average. It is easy to see that returning so much data all the time is not a good thing to do.
If you want to check if a certain type of query shows bad caching performance, the shared_* will be of interest. In short: PostgreSQL is able to tell you the cache hit rate of every single type of query in case pg_stat_statements has been enabled.
It also makes sense to take a look at the temp_blks_* fields. Whenever PostgreSQL has to go to disk to sort or to materialize, temporary blocks will be needed.
Finally there are blk_read_time and blk_write_time. Usually those fields are empty unless track_io_timing is turned on. The idea here is to be able to measure the amount of time a certain type of query spends on I/O. It will allow you to answer the question whether your system is I/O bound or CPU bound. In most cases it is a good idea to turn on I/O timing because it will give you vital information.
Dealing with Java and Hibernate
pg_stat_statements delivers good information. However, in some cases it can cut off the query because of a config variable:
test=# SHOW track_activity_query_size; track_activity_query_size --------------------------- 1024 (1 row)
For most applications 1024 bytes are absolutely enough. However, this is usually not the case if you are running Hibernate or Java. Hibernate tends to send insanely long queries to the database and thus the SQL code might be cut off long before the relevant parts start (e.g. the FROM-clause etc.). Therefore it makes a lot of sense to increase track_activity_query_size to a higher value (maybe 32.786).
Helpful queries to detect bottlenecks in PostgreSQL
There is one query I have found especially useful in this context: The following query shows 20 statements, which need a lot of time:
test=# SELECT substring(query, 1, 50) AS short_query, round(total_time::numeric, 2) AS total_time, calls, round(mean_time::numeric, 2) AS mean, round((100 * total_time / sum(total_time::numeric) OVER ())::numeric, 2) AS percentage_cpu FROM pg_stat_statements ORDER BY total_time DESC LIMIT 20; short_query | total_time | calls | mean | percentage_cpu ----------------------------------------------------+------------+-------+------+---------------- SELECT name FROM (SELECT pg_catalog.lower(name) A | 11.85 | 7 | 1.69 | 38.63 DROP SCHEMA IF EXISTS performance_check CASCADE; | 4.49 | 4 | 1.12 | 14.64 CREATE OR REPLACE FUNCTION performance_check.pg_st | 2.23 | 4 | 0.56 | 7.27 SELECT pg_catalog.quote_ident(c.relname) FROM pg_c | 1.78 | 2 | 0.89 | 5.81 SELECT a.attname, + | 1.28 | 1 | 1.28 | 4.18 SELECT substring(query, ?, ?) AS short_query,roun | 1.18 | 3 | 0.39 | 3.86 CREATE OR REPLACE FUNCTION performance_check.pg_st | 1.17 | 4 | 0.29 | 3.81 SELECT query FROM pg_stat_activity LIMIT ?; | 1.17 | 2 | 0.59 | 3.82 CREATE SCHEMA performance_check; | 1.01 | 4 | 0.25 | 3.30 SELECT pg_catalog.quote_ident(c.relname) FROM pg_c | 0.92 | 2 | 0.46 | 3.00 SELECT query FROM performance_check.pg_stat_activi | 0.74 | 1 | 0.74 | 2.43 SELECT * FROM pg_stat_statements ORDER BY total_ti | 0.56 | 1 | 0.56 | 1.82 SELECT query FROM pg_stat_statements LIMIT ?; | 0.45 | 4 | 0.11 | 1.45 GRANT EXECUTE ON FUNCTION performance_check.pg_sta | 0.35 | 4 | 0.09 | 1.13 SELECT query FROM performance_check.pg_stat_statem | 0.30 | 1 | 0.30 | 0.96 SELECT query FROM performance_check.pg_stat_activi | 0.22 | 1 | 0.22 | 0.72 GRANT ALL ON SCHEMA performance_check TO schoenig_ | 0.20 | 3 | 0.07 | 0.66 SELECT query FROM performance_check.pg_stat_statem | 0.20 | 1 | 0.20 | 0.67 GRANT EXECUTE ON FUNCTION performance_check.pg_sta | 0.19 | 4 | 0.05 | 0.62 SELECT query FROM performance_check.pg_stat_statem | 0.17 | 1 | 0.17 | 0.56 (20 rows)
The last column is especially noteworthy: It tells us the percentage of total time burned by a single query. It will help you to figure out whether a single statement is relevant to overall performance problems or not.