Did you know that your temporary tables are not cleaned up by autovacuum? If you did not, consider reading this blog post about PostgreSQL and autovacuum. If you did – well, you can still continue to read this article.
Autovacuum cleans tables automatically
Since the days of PostgreSQL 8.0, the database has provided this miraculous autovacuum daemon which is in charge of cleaning tables and indexes. In many cases, the default configuration is absolutely ok and people don’t have to worry about VACUUM much. However, recently one of our support clients sent us an interesting request related to temporary tables and autovacuum.
What is the problem? The main issue is that autovacuum does not touch temporary tables. Yes, it’s true – you have to VACUUM temporary tables on your own. But why is this the case? Let’s take a look at how the autovacuum job works in general: Autovacuum sleeps for a minute, wakes up and checks if a table has seen a sufficiently large number of changes before it fires up a cleanup process. The important thing is that the cleanup process actually has to see the objects it will clean, and this is where the problem starts. An autovacuum process has no way of seeing a temporary table, because temporary tables can only be seen by the database connection which actually created them. Autovacuum therefore has to skip temporary tables. Unfortunately, most people are not aware of this issue. As long as you don’t use your temporary tables for extended periods, the missing cleanup job is not an issue. However, if your temp tables are repeatedly changed in long transactions, it can become a problem.
Proving my point
The main question now is: How can we verify what I have just said? To show you what I mean, I will load the pgstattuple extension and create two tables– a “real” one, and a temporary one:
test=# CREATE EXTENSION pgstattuple; CREATE EXTENSION test=# CREATE TABLE t_real AS SELECT * FROM generate_series(1, 5000000) AS id; SELECT 5000000 test=# CREATE TEMPORARY TABLE t_temp AS SELECT * FROM generate_series(1, 5000000) AS id; SELECT 5000000
Let us now kill half of the data in each those two tables:
test=# DELETE FROM t_real WHERE id % 2 = 0; DELETE 2500000 test=# DELETE FROM t_temp WHERE id % 2 = 0; DELETE 2500000
The tables will now contain around 50% trash each. If we wait sufficiently long, we will see that autovacuum has cleaned up the real table while the temporary one is still in jeopardy:
test=# \x Expanded display is on. test=# SELECT * FROM pgstattuple('t_real'); -[ RECORD 1 ] ............---------------+-..--------- table_len | 181239808 tuple_count | 2500000 tuple_len | 70000000 tuple_percent | 38.62 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 80620336 free_percent | 44.48 test=# SELECT * FROM pgstattuple('t_temp'); -[ RECORD 1 ] --------------------+------------ table_len | 181239808 tuple_count | 2500000 tuple_len | 70000000 tuple_percent | 38.62 dead_tuple_count | 2500000 dead_tuple_len | 70000000 dead_tuple_percent | 38.62 free_space | 620336 free_percent | 0.34
The “real table” has already been cleaned and a lot of free space is available, while the temporary table still contains a ton of dead rows. Only a manual job will find the free space in all that jumble.
Keep in mind that VACUUM is only relevant if you really want to keep the temporary table for a long time. If you close your connection, the entire space will be automatically reclaimed anyway– so there is no need to worry about dropping the table.
If you want to learn more about VACUUM in general, consider checking out one of our other blogposts. If you are interested in how VACUUM works, it also is definitely useful to read the official documentation, which can be found here