In many PostgreSQL databases, you never have to think or worry about tuning autovacuum. It runs automatically in the background and cleans up without getting in your way.
But sometimes the default configuration is not good enough, and you have to tune autovacuum to make it work properly. This article presents some typical problem scenarios and describes what to do in these cases.
There are many autovacuum configuration parameters, which makes tuning complicated. The main reason is that autovacuum has many different tasks. In a way, autovacuum has to fix all the problems arising from PostgreSQL's Multiversioning Concurrency Control (MVCC) implementation:
UPDATE
or DELETE
operationsANALYZE
runs to keep the table statistics updatedDepending on which of these functionalities cause a problem, you need different approaches to tuning autovacuum.
The best-known autovacuum task is cleaning up of dead tuples from UPDATE
or DELETE
operations. If autovacuum cannot keep up with cleaning up dead tuples, you should follow these three tuning steps:
Check the known reasons that keep vacuum from removing dead tuples. Most often, the culprit are long running transactions. Unless you can remove these obstacles, tuning autovacuum will be useless.
If you cannot fight the problem at its root, you can use the configuration parameter idle_in_transaction_session_timeout
to have PostgreSQL terminate sessions that stay “idle in transaction” for too long. That causes errors on the client side, but may be justified if you have no other way to keep your database operational. Similarly, to fight long-running queries, you can use statement_timeout
.
If autovacuum cannot keep up with cleaning up dead tuples, the solution is to make it work faster. This may seem obvious, but many people fall into the trap of thinking that making autovacuum start earlier or run more often will solve the problem.
VACUUM
is a resource-intensive operation, so autovacuum by default operates deliberately slowly. The goal is to have it work in the background without being in the way of normal database operation. But if your workload creates lots of dead tuples, you will have to make it more aggressive:
autovacuum_vacuum_cost_limit
from its default value of 200 (this is the gentle method)autovacuum_vacuum_cost_delay
from its default value of 2 (in older versions: 20!) milliseconds (this is the effective method)Setting autovacuum_vacuum_cost_delay
to 0 will make autovacuum as fast as a manual VACUUM
– that is, as fast as possible.
Since not all tables grow dead tuples at the same pace, it is usually best not to change the global setting in postgresql.conf
, but to change the setting individually for busy tables:
1 |
ALTER TABLE busy_table SET (autovacuum_vacuum_cost_delay = 1); |
Partitioning a table can also help with getting the job done faster; see below for more.
If nothing else works, you have to see that fewer dead tuples are generated. Perhaps several UPDATE
s to a single row could be combined to a single UPDATE
?
Often you can significantly reduce the number of dead tuples by using “HOT updates”:
fillfactor
for the table to a value less than 100, so that INSERT
s leave some free space in each blockUPDATE
is indexedThen any SELECT
or DML statement can clean up dead tuples, and there is less need for VACUUM
.
The expensive part of an index scan is looking up the actual table rows. If all columns you want are in the index, it should not be necessary to visit the table at all. But in PostgreSQL you also have to check if a tuple is visible or not, and that information is only stored in the table.
To work around that, PostgreSQL has a “visibility map” for each table. If a table block is marked as “all visible” in the visibility map, you don't have to visit the table for the visibility information.
So to get true index-only scans, autovacuum has to process the table and update the visibility map frequently. How you configure autovacuum for that depends on the kind of data modifications the query receives:
UPDATE
s or DELETE
sFor that, you reduce autovacuum_vacuum_scale_factor
for the table, for example
1 |
ALTER TABLE mytable SET (autovacuum_vacuum_scale_factor = 0.01); |
It may be a good idea to also speed up autovacuum as described above.
INSERT
sThis is simple from v13 on: tune autovacuum_vacuum_insert_scale_factor
as shown above for autovacuum_vacuum_scale_factor
.
For older PostgreSQL versions, the best you can do is to significantly lower autovacuum_freeze_max_age
. The best value depends on the rate at which you consume transaction IDs. If you consume 100000 transaction IDs per day, and you want the table to be autovacuumed daily, you can set
1 |
ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 100000); |
To measure the rate of transaction ID consumption, use the function txid_current()
(or pg_current_xact_id()
from v13 on) twice with a longer time interval in between and take the difference.
Normally, autovacuum takes care of that and starts a special “anti-warparound” autovacuum worker whenever the oldest transaction ID in a table is older than autovacuum_freeze_max_age
transactions or the oldest multixact is older than autovacuum_multixact_freeze_max_age
transactions.
Again, you have to make sure that there is nothing that blocks autovacuum from freezing old tuples and advancing pg_database.datfrozenxid
and pg_database.datminmxid
. Such blockers can be:
To prevent data corruption, use good hardware and always run the latest PostgreSQL minor release.
UPDATE
s or DELETE
s for anti-wraparound vacuumOn tables that receive UPDATE
s or DELETE
s, all that you have to do is to see that autovacuum is running fast enough to get done in time (see above).
INSERT
s for anti-wraparound vacuumFrom PostgreSQL v13 on, there are no special considerations in this case, because you get regular autovacuum runs on such tables as well.
Before that, insert-only tables were problematic: since there are no dead tuples, normal autovacuum runs are never triggered. Then, as soon as autovacuum_freeze_max_age
or autovacuum_multixact_freeze_max_age
are exceeded, you may suddenly get a massive autovacuum run that freezes a whole large table, takes a long time and causes massive I/O.
To avoid that, reduce autovacuum_freeze_max_age
for such a table:
1 |
ALTER TABLE insert_only SET (autovacuum_freeze_max_age = 10000000); |
With very big tables, it can be advisable to use partitioning. The advantage here is you can have several autovacuum workers working on several partitions in parallel, so that the partitioned table as a whole is done faster than a single autovacuum worker could.
If you have many partitions, you should increase autovacuum_max_workers
, the maximum number of autovacuum workers.
Partitioning can also help with vacuuming tables that receive lots of updates, as long as the updates affect all partitions.
Updating table statistics is a “side job” of autovacuum.
You know that automatic statistics collection does not happen often enough if your query plans get better after a manual ANALYZE
of the table.
In that case, you can lower autovacuum_analyze_scale_factor
so that autoanalyze processes the table more often:
1 |
ALTER TABLE mytable SET (autovacuum_analyze_scale_factor = 0.02); |
An alternative is not to use the scale factor, but set autovacuum_analyze_threshold
, so that table statistics are calculated whenever a fixed number of rows changes. For example, to configure a table to be analyzed whenever more than a million rows change:
1 2 3 4 |
ALTER TABLE mytable SET ( autovacuum_analyze_scale_factor = 0, autovacuum_analyze_threshold = 1000000 ); |
Depending on your specific problem and your PostgreSQL version, there are different tuning knobs to make autovacuum do its job correctly. The many tasks of autovacuum and the many configuration parameters don't make that any easier.
If the tips in this article are not enough for you, consider getting professional consulting.
GROUP BY is nothing new and is available in any relational database I am aware of. It is an integral part of SQL and PostgreSQL but what many people might not know is the fact that the GROUP BY expression can do more than just group by simple fields. You can use expressions to group in an even more sophisticated way and here is how:
To show you what you can do with GROUP BY I have compiled some test data:
1 2 3 4 5 6 7 8 |
test=# CREATE TABLE t_oil ( region text, country text, year int, production int, consumption int ); CREATE TABLE |
This data set is pretty easy to understand. It tells us how much oil was produced and consumed by which country in the past. To load the data you can either download the file or simply ask COPY to load it directly from the web.
1 2 3 |
test=# COPY t_oil FROM PROGRAM 'curl https://www.cybertec-postgresql.com/secret/oil_ext.txt'; COPY 644 |
To use COPY … FROM PROGRAM you have to be superuser otherwise it does not work for security reasons. In my example 644 rows have been loaded successfully.
Using GROUP BY is pretty simple. If we want to make some kind of analysis for each group we basically got two choices:
1 2 3 4 5 6 |
test=# SELECT region, avg(production) FROM t_oil GROUP BY 1; region | avg ---------------+----------------------- North America | 4541.3623188405797101 Middle East | 1992.6036866359447005 (2 rows) |
“GROUP BY 1” basically means the same as the following query:
1 2 3 4 5 6 |
test=# SELECT region, avg(production) FROM t_oil GROUP BY region; region | avg ---------------+----------------------- North America | 4541.3623188405797101 Middle East | 1992.6036866359447005 (2 rows) |
“GROUP BY region” and “GROUP BY 1” are therefore identical. It is a question of faith which type of syntax you prefer. People have told me once than once that one or the other syntax is “evil”. In reality it makes absolutely no difference. It is just syntactic sugar.
Most people group by one or more columns or no column at all:
1 2 3 4 5 |
test=# SELECT avg(production) FROM t_oil WHERE country = 'USA'; avg ----------------------- 9141.3478260869565217 (1 row) |
However, there is more: You can also use an expression to determine the groups on the fly.
1 2 3 4 5 6 7 8 9 |
test=# SELECT production > 9000, count(production) FROM t_oil WHERE country = 'USA' GROUP BY production > 9000; ?column? | count ----------+------- f | 20 t | 26 (2 rows) |
In this case we got two groups: One group is for rows greater than 9000 and one for rows lower or equal than 9000. The name of those two groups is therefore “true” (> 9000) or “false” (<= 9000). You can use any expression to calculate those groups on your own. Consider the next example:
1 2 3 4 5 6 7 8 9 |
test=# SELECT count(production) FROM t_oil WHERE country = 'USA' GROUP BY CASE WHEN year % 2 = 0 THEN true ELSE false END; count ------- 23 23 (2 rows) |
In this example we are counting odd and even years. Note that it is not necessary to list the grouping criterial in the SELECT clause. You might not understand the result if you miss half of the column list but you are free to do that. What is also interesting is that you can use a full SQL query in the GROUP BY clause.
1 2 3 4 5 6 7 8 |
test=# SELECT count(production) FROM t_oil WHERE country = 'USA' GROUP BY (SELECT CASE WHEN year % 2 = 0 THEN true ELSE false END); count ------- 23 23 (2 rows) |
If you run those queries it makes sense to take a look at the execution plan. As you can see the plan is just like any other GROUP BY statement. It is also noteworthy that PostgreSQL does a really good job to estimate the number of groups:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# explain SELECT count(production) FROM t_oil WHERE country = 'USA' GROUP BY (SELECT CASE WHEN year % 2 = 0 THEN true ELSE false END); QUERY PLAN ------------------------------------------------------------- HashAggregate (cost=14.97..15.02 rows=2 width=9) Group Key: (SubPlan 1) -> Seq Scan on t_oil (cost=0.00..14.74 rows=46 width=5) Filter: (country = 'USA'::text) SubPlan 1 -> Result (cost=0.00..0.01 rows=1 width=1) (6 rows) |
PostgreSQL successfully figured out that two groups are to be expected.
HAVING clauses done right
Recently people have asked us about aliases in the HAVING clause: Can one use aliases in a HAVING clause?
1 2 3 4 5 6 7 |
test=# SELECT count(production) AS x FROM t_oil WHERE country = 'USA' GROUP BY year < 1990 HAVING x > 22; ERROR: column 'x' does not exist LINE 1: ...il WHERE country = 'USA' GROUP BY year < 1990 HAVING x > 22; ^ |
The answer is simply no. SQL does not allow that. I remember seeing people who actually wanted to implement this feature in PostgreSQL over the years but such a feature is not going to be supported.
If you want to use a HAVING clause it is necessary to explicitly use the entire expression:
1 2 3 4 5 6 7 8 |
test=# SELECT count(production) AS x FROM t_oil WHERE country = 'USA' GROUP BY year < 1990 HAVING count(production) > 22; x ---- 25 (1 row) |
1 2 3 4 5 6 7 8 9 |
test=# SELECT count(production) AS x FROM t_oil WHERE country = 'USA' GROUP BY year < 1990 HAVING avg(production) > 0; x ---- 21 25 (2 rows) |
As you can see we use “count” in the SELECT clause, but we use “avg” in the HAVING clause. This is definitely possible and in some cases really useful. You can use different aggregate functions in GROUP BY and HAVING clauses without any problems.
So far you have seen what GROUP BY can do for you in the simple case butthere is more. I don't want to go into too much detail in this blog but just one thing: You can do more than just one aggregation at the same time to speed up your query. Here is an example:
1 2 3 4 5 6 7 8 9 10 |
test=# SELECT year < 1990, count(production) AS x FROM t_oil WHERE country = 'USA' GROUP BY GROUPING SETS ((year < 1990), ()); ?column? | x ----------+---- | 46 f | 21 t | 25 (3 rows) |
In this case we got two grouping operations: One containing all rows and one for before respectively after 1990.
If you don't like this syntax you can also try ROLLUP which is equivalent to the query you have just seen:
1 2 3 4 5 6 7 8 9 10 |
test=# SELECT year < 1990, count(production) AS x FROM t_oil WHERE country = 'USA' GROUP BY ROLLUP (year < 1990); ?column? | x ----------+---- | 46 f | 21 t | 25 (3 rows) |
ROLLUP is basically the “bottom line” - it adds one additional row counting everything.
If you want to find out more about grouping in general I recommend checking out my post about speeding up GROUP BY in general.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
Many people consider recursive queries a difficult topic. Still, they enable you to do things that would otherwise be impossible in SQL.
This articles gives a simple introduction with examples and shows the differences to Oracle's implementation of recursive queries.
WITH
clauses)A common table expression (CTE) can be seen as a view that is only valid for a single query:
1 2 3 4 5 |
WITH ctename AS ( SELECT ... ) SELECT ... FROM ctename ... |
This could also be written as a subquery in FROM
, but there are some advantages to using CTEs:
RETURNING
clause).Note that before v12, PostgreSQL always materialized CTEs. That means, the CTE was calculated independently from the containing query. From v12 on, CTEs can be “inlined” into the query, which provides further optimization potential.
CTEs (as well as the recursive CTEs mentioned below) are defined in the SQL standard, although PostgreSQL does not implement the SEARCH
and CYCLE
clause.
Recursive queries are written using recursive CTEs, that are CTEs containing the RECURSIVE
keyword:
1 2 3 4 5 6 7 |
WITH RECURSIVE ctename AS ( SELECT /* non-recursive branch, cannot reference 'ctename' */ UNION [ALL] SELECT /* recursive branch referencing 'ctename' */ ) SELECT ... FROM ctename ... |
PostgreSQL internally uses a working table to process recursive CTEs. This processing is not really recursive, but rather iterative:
First, the working table is initialized by executing the non-recursive branch of the CTE. The result of the CTE is also initialized with this result set. If the recursive CTE uses UNION
rather than UNION ALL
, duplicate rows are removed.
Then, PostgreSQL repeats the following until the working table is empty:
UNION
is used to combine the branches, discard duplicate rows.Note that the self-referential branch of the CTE is not executed using the complete CTE result so far, but only the rows that are new since the previous iteration (the working table).
If the iteration never ends, the query will just keep running until the result table becomes big enough to cause an error. There are two ways to deal with that:
UNION
, which removes duplicate result rows (but of course requires extra processing effort).LIMIT
clause on the query that uses the CTE, because PostgreSQL stops processing if the recursive CTE has calculated as many rows as are fetched by the parent query. Note that this technique is not portable to other standard compliant databases.Let's assume a self-referential table like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
TABLE emp; empno | ename | job | mgr | hiredate | sal | comm | deptno -------+--------+-----------+------+------------+---------+---------+-------- 7839 | KING | PRESIDENT | | 1981-11-17 | 5000.00 | | 10 7698 | BLAKE | MANAGER | 7839 | 1981-05-01 | 2850.00 | | 30 7782 | CLARK | MANAGER | 7839 | 1981-06-09 | 2450.00 | | 10 7566 | JONES | MANAGER | 7839 | 1981-04-02 | 2975.00 | | 20 7902 | FORD | ANALYST | 7566 | 1981-12-03 | 3000.00 | | 20 7369 | SMITH | CLERK | 7902 | 1980-12-17 | 800.00 | | 20 7499 | ALLEN | SALESMAN | 7698 | 1981-02-20 | 1600.00 | 300.00 | 30 7521 | WARD | SALESMAN | 7698 | 1981-02-22 | 1250.00 | 500.00 | 30 7654 | MARTIN | SALESMAN | 7698 | 1981-09-28 | 1250.00 | 1400.00 | 30 7844 | TURNER | SALESMAN | 7698 | 1981-09-08 | 1500.00 | 0.00 | 30 7900 | JAMES | CLERK | 7698 | 1981-12-03 | 950.00 | | 30 7934 | MILLER | CLERK | 7782 | 1982-01-23 | 1300.00 | | 10 (12 rows) |
We want to find all the subordinates of person 7566, including the person itself.
The non-recursive branch of the query will be:
1 2 3 |
SELECT empno, ename FROM emp WHERE empno = 7566; |
The recursive branch will find all subordinates of all entries in the working table:
1 2 3 |
SELECT emp.empno, emp.ename FROM emp JOIN ctename ON emp.mgr = ctename.empno; |
We can assume that the dependencies contain no cycles (nobody is his or her own manager, directly or indirectly). So we can combine the queries with UNION ALL
, because no duplicates can occur.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
WITH RECURSIVE ctename AS ( SELECT empno, ename FROM emp WHERE empno = 7566 UNION ALL SELECT emp.empno, emp.ename FROM emp JOIN ctename ON emp.mgr = ctename.empno ) SELECT * FROM ctename; empno | ename -------+------- 7566 | JONES 7902 | FORD 7369 | SMITH (3 rows) |
Sometimes you want to add more information, like the hierarchical level. You can do that by adding the starting level as a constant in the non-recursive branch. In the recursive branch you simply add 1 to the level:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
WITH RECURSIVE ctename AS ( SELECT empno, ename, 0 AS level FROM emp WHERE empno = 7566 UNION ALL SELECT emp.empno, emp.ename, ctename.level + 1 FROM emp JOIN ctename ON emp.mgr = ctename.empno ) SELECT * FROM ctename; empno | ename | level -------+-------+------- 7566 | JONES | 0 7902 | FORD | 1 7369 | SMITH | 2 (3 rows) |
If you use UNION
to avoid duplicated rows in the case of circular references, you cannot use this technique. This is because adding level
will make rows that were identical before different. But in that case a hierarchical level wouldn't make much sense anyway because an entry could appear on infinitely many levels.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
WITH RECURSIVE ctename AS ( SELECT empno, ename, ename AS path FROM emp WHERE empno = 7566 UNION ALL SELECT emp.empno, emp.ename, ctename.path || ' -> ' || emp.ename FROM emp JOIN ctename ON emp.mgr = ctename.empno ) SELECT * FROM ctename; empno | ename | path -------+-------+------------------------ 7566 | JONES | JONES 7902 | FORD | JONES -> FORD 7369 | SMITH | JONES -> FORD -> SMITH |
Oracle database has a different syntax for recursive queries that does not conform to the SQL standard. The original example would look like this:
1 2 3 4 5 6 7 8 9 10 |
SELECT empno, ename FROM emp START WITH empno = 7566 CONNECT BY PRIOR empno = mgr; EMPNO ENAME ---------- ---------- 7566 JONES 7902 FORD 7369 SMITH |
This syntax is more concise, but less powerful than recursive CTEs. For more complicated queries involving joins, it can become difficult and confusing.
It is always easy to translate an Oracle “hierarchical query” to a recursive CTE:
CONNECT BY
clause but including the START WITH
clause.START WITH
clause but including the CONNECT BY
clause. You add a join with the name of the recursive CTE and replace all PRIOR
columns with columns from that joined CTE.CONNECT BY NOCYCLE
, use UNION
, otherwise UNION ALL
.Apart from that, Oracle also supports standard compliant recursive CTEs. These also support the SEARCH
and CYCLE
clauses that PostgreSQL doesn't implement.
Without recursive CTEs, many things that can be written in procedural languages cannot be written in SQL. That is usually no problem, because SQL is made to query databases. If you need procedural code, you can always write a database function in one of the many procedural languages available in PostgreSQL.
But recursive CTEs make SQL turing complete, that is, it can perform the same calculations as any other programming language. Obviously, such a “program” would often be quite complicated and inefficient, and this is a theoretical consideration and not of much practical value. Still, the previous examples showed that recursive CTEs can do useful work that you couldn't otherwise perform in SQL.
As an example for the power of recursive queries, here is a recursive CTE that computes the first elements of the Fibonacci sequence:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
WITH RECURSIVE fib AS ( SELECT 1 AS n, 1::bigint AS 'fibₙ', 1::bigint AS 'fibₙ₊₁' UNION ALL SELECT n+1, 'fibₙ₊₁', 'fibₙ' + 'fibₙ₊₁' FROM fib ) SELECT n, 'fibₙ' FROM fib LIMIT 20; n | fibₙ ----+------ 1 | 1 2 | 1 3 | 2 4 | 3 5 | 5 6 | 8 7 | 13 8 | 21 9 | 34 10 | 55 11 | 89 12 | 144 13 | 233 14 | 377 15 | 610 16 | 987 17 | 1597 18 | 2584 19 | 4181 20 | 6765 (20 rows) |
This example also demonstrates how an endless loop can be avoided with a LIMIT
clause on the parent query.
You shouldn't be afraid of recursive queries, even if the term “recursive” scares you. They are not that hard to understand, and they provide the simplest way to query databases containing hierarchies and graphs. In many cases they are also much more efficient than equivalent procedural code because they avoid repeated queries to the same tables.
Together with window functions, recursive queries are among the most powerful tools that SQL has to offer.
How can I use pg_stat_statements for troubleshooting? First, an introductory story (with some slight “drama” added for fun) from those good old days of on-site consulting 🙂 So...I’m at a client where the database is not behaving nicely among other things...what a crappy DB product indeed I hear, it gets cranky every morning although there should be a constant workload and is saturating the CPU and making some important queries time out!
Hmm very interesting I think, Postgres usually doesn’t tend to care too much about mornings or time in general...Okay, well let’s sit down and look at the metrics I say...oh, surprise-surprise - we discover that there’s no monitoring framework in place! Sounds very bad and rare, right? Well, quite common actually as many people rely only on the server log files...which I wouldn’t recommend doing personally. But not too tragic in this case actually - as luckily their DBA at least had read from somewhere that it’s good to install the pg_stat_statements extension when taking an instance into use - which is of course absolutely true and a highly recommended thing to do!
So I say - well, let’s just quickly install some monitoring tool and have a coffee and talk about other issues while the metrics, that will help us to get to the bottom of this, are being gathered. And then I hear that distinct audible “sigh” and they look at me with despair in their eyes - you know, we just can’t install some software like that...firstly there’s no Internet...and 2nd, it needs to go through managerial and security acceptance processes X,Y and Z and it would take a week at best.
OK, doesn’t make life easier for sure...the poor DBA is also already feeling bad that their organisation is not exactly the candidate for the “Agile company of the year” award...But, then I ask...you know, we do have direct console access to the database still, right? Yes, we do! And then I can finally use my wise Yoda quote: Well great, this is pretty much the same as a monitoring tool - it’s all about the data anyways! The DBA starts to look at me in a very suspicious and puzzled way...what the hell am I talking about? How can a plain text-based query interface be nearly as useful as some modern graphical monitoring tool for troubleshooting?
So let’s shed some light on what I mean. A disclaimer - it’s not the most “humane” method of course, but it will definitely gather all the necessary information and thus do the trick. For most users though, who have the possibility, I’d recommend installing some real monitoring tool - like pgwatch2 or pghero, which are both dead simple to use and have a graphical interface for pinpointing problems.
Making sure the pg_stat_statements extension is loaded via the shared_preload_libraries parameter and also the extension created in some database - doesn’t have to be even the problematic one, as the extension works on the cluster level. For new installs, a restart of the PostgreSQL server is needed though, which might be problematic in some cases - so again: it’s a really highly recommended step when performing the initial server setup.
Log into the database where pg_stat_statements is available and create a table to hold the periodic snapshots of the tracking information the extension is providing. And this is the crux of the “trick” actually - we want to regularly store snapshots of detailed query execution details so that we can later calculate the so called “deltas” for each query and see who was burning the most resources within a certain timespan! This is actually exactly what most “continuous monitoring” tools are doing in a nicer coat - just storing snapshots of the internal PostgreSQL performance counters for various objects or queries and then calculating deltas or anomalies based on that data and representing it in some nice and digestible, usually graphical, format.
A DDL statement to create a table for such “snapshot” storing purposes would look something like that:
1 2 |
CREATE TABLE stat_statements_snapshots AS SELECT now() AS ts, * FROM pg_stat_statements WHERE false; |
Set up periodic gathering of pg_stat_statements data into our “snapshot” table. In the simplest form on the “psql” console it can be done with the built-in “watch” subcommand. Namely, this command will re-execute the previous user query with some regular interval - by default set at 2 seconds...which is a bit too frequent mostly though for general pg_stat_statements monitoring (but might again be too infrequent for pg_locks or pg_stat_activity monitoring). So in short it would look something like this:
1 2 3 4 |
/* let’s do the first insert normally and then let “watch” do the same every INSERT INTO stat_statements_snapshots SELECT now(), * FROM pg_stat_statements; watch 60 |
Remember - “psql” + “watch” is just one option and a simple Bash loop or a Cron with the same SQL would be as good, and maybe even better - if you plan to keep it running for a day or so as there might be network interruptions or restarts that terminate your “psql“ session.
Now when the data collection has been running for a while and our target time range is covered, we obviously need to analyze the data somehow. And there are quite some ways to do that - many resource usage columns to choose from depend on the detected problem type and also many SQL query techniques could be used. But to give you a general idea, in the chapter below there are some simpler examples of some standard questions and the resulting queries for such ad-hoc troubleshooting sessions...but as already said, there are many-many different paths here and the queries provided are by far not the only ones.
Which top 3 queries were burning the most CPU time between times t1 and t2 and approximately how much of the total Postgres runtime did that represent? Let's see:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
explain with q_snap_data as ( select ts, queryid, max(query) as query, /* avoiding groupings over long text values is a good idea generally */ sum(total_time) as total_time, sum(calls) as calls from stat_statements_snapshots where not query like 'insert into stat_statements_snapshots%' /* don't include our snapshot collecting query */ group by ts, queryid /* 1st gruping needed as queries have separate rows for various users which we want to ignore */ ), q_min_total as ( select sum(total_time) as min from q_snap_data group by ts order by ts limit 1 ), q_max_total as ( select sum(total_time) as max from q_snap_data group by ts order by ts desc limit 1 ) select queryid, max(query) as query, max(total_time) - min(total_time) as total_time, max(calls) - min(calls) as calls, (max(total_time) - min(total_time)) / (max(calls) - min(calls)) as mean_time, 100 * (max(total_time) - min(total_time))::numeric / (select max - min from q_max_total, q_min_total) as approx_pct_of_total from q_snap_data /*where ts between t1 and t2 */ group by queryid having max(calls) > min(calls) order by total_time desc limit 3; |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
with q_snap_data as ( select ts, queryid, max(query) as query, /* avoiding grupings over long text values is a good idea generally */ sum(total_time) as total_time, sum(calls) as calls from stat_statements_snapshots where not query like 'insert into stat_statements_snapshots%' /* don't include our snapshot collecting query */ /* and ts between t1 and t2 */ group by ts, queryid /* 1st grouping needed as queries have separate rows for various users which we want to ignore */ ), q_snap_data_hour as ( select extract(hour from ts) as ts, queryid, max(query) as query, max(total_time) - min(total_time) as total_time, max(calls) - min(calls) as calls from q_snap_data group by extract(hour from ts), queryid ) select *, total_time - lag_total_time as total_time_growth from ( select ts, queryid, query, total_time, lag(total_time) over (partition by queryid order by ts) as lag_total_time, calls from q_snap_data_hour ) x where lag_total_time > 0 and total_time > lag_total_time order by total_time - lag_total_time desc limit 3; |
The results can be considered useful only in case there was no Postmaster crash or user invoked statistics reset in between. If that was the case, your best bet is almost always to go all in on the awesome Window Functions!
In some cases, as in the above described fully utilized CPU case, one doesn’t really want to add any unnecessary load on the database server with monitoring activities (especially if you use shorter intervals), as it might actually be that last straw that breaks the camel's back...so how do you minimize such risks a bit?
We could (and mostly should) use the UNLOGGED table modifier for our snapshot gathering table, to bypass the transaction log completely for our frequently inserted data - thus reducing disk flushing and network load (remember - WAL is sent also to the replicas). This modifier of course has the downside that the table is not effectively crash safe and we would lose our data in case of a PostgreSQL crash or a power outage! Nevertheless, this is especially recommended when the whole monitoring operation is planned to be relatively short-lived, as it mostly should, or if the strange “peaks” are appearing very frequently and we could just re-start our capturing any time and still get usable “leads”. I actually use “UNLOGGED” almost always for such ad-hoc troubleshooting - better to be safe than sorry and not to strain the system under monitoring.
We could leave out the longest info (byte-wise) from the pg_stat_statements view snapshots - the actual queries! This is OK to do as the internal query IDs are static and good enough to look up the real query texts later with a JOIN, when we have already identified some suspicious query ID-s.
We don’t actually need to store the pg_stat_statement statistics on each interval for all queries...as probably only a subset of them are “active” in the monitored time range. So basically we could add some WHERE filter condition that looks up the very first entry for a specific query and dismisses it if the execution counter has not progressed for the current timestamp - this usually minimizes the data amount at least 2 times based on my experience. A hint though - for fast lookups on that first entry, you might want to also throw some index on the “snapshot” table, so that we again burn a bit less CPU time.
1 2 3 4 5 |
CREATE INDEX ON stat_statements_snapshots (queryid, ts); INSERT INTO stat_statements_snapshots SELECT now(), * FROM pg_stat_statements s WHERE calls > (select calls from stat_statements_snapshots where queryid = s.queryid and dbid = s.dbid order by ts limit 1); |
By the way, exactly the same approach can be used also for all the other internal statics views that are cumulative in nature, as most of them are - first such that comes to mind for example is the pg_stat_user_tables view, that enables to determine at which times some tables are getting a lot of sequential scans. There though I’d also add some minimum table size limit into the mix as well, as scans on tiny tables, unless in billions per day, are not usually a thing to worry about.
Hopefully, that wasn’t too “hacky”, and you have a new trick up your sleeve now.
Read more about putting pg_stat_statements to good use in Hans' blog Find and Fix a Missing PostgreSQL Index.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at