CYBERTEC Logo

VACUUM does not shrink my PostgreSQL table

10.2022 / Category: / Tags: | | |

Did you ever wonder why VACUUM does not make your PostgreSQL tables smaller? Did you ever wonder why VACUUM does not shrink data files? Well, maybe this is the article you have been looking for. The point is: usually, space is not returned to the operating system after a cleanup and it is important to understand why. Often people make wrong assumptions about the inner workings of VACUUM. It makes sense to dig deeper and understand what's really going on.

The following post reveals the most useful secrets about VACUUM to users.

Understanding tuple visibility

To understand VACUUM in PostgreSQL, it is important to see how PostgreSQL handles visibility in the first place. The entire concept is based around a set of hidden columns which are part of the row. Here's how it works:

The table we have just created consists of just one column - to keep things simple. Note that data has been loaded into the table using two separate transactions. Each transaction has inserted 3 rows, which is clearly indicated in the hidden columns:

xmin, xmax, cmin and cmax are hidden columns containing transaction IDs. As you can see, the first three rows have been written by transaction number 764, while the rest of the data has been created using transaction number 765.
The hidden columns will handle visibility and PostgreSQL will (often but not always) determine based on those columns whether a row can be seen by a certain transaction or not.

Running UPDATE statements will show what's really going on here:

One row has been changed. But let's focus on the CTID which represents the physical position of the row on disk. Notice that (0, 6) is gone because PostgreSQL had to copy the row. If we run a second UPDATE, the row will be copied again:

Copying these rows is important because we have to keep the old row around. Otherwise ROLLBACK would not work - therefore the old version has to remain.

Let's inspect the table once again:

Everything between 5 and 9 is full of dead rows which have to be removed.

VACUUM: Cleaning out rows

Keep in mind that COMMIT must not kill dead rows either. Therefore the cleanup process has to be done asynchronously. This is exactly what VACUUM does. Let's run it and see what happens:

VACUUM is actively looking for rows which are not seen by anyone anymore. Those rows can be in the middle of the data file somewhere. What happens is that VACUUM allows PostgreSQL to reuse that space - however, it does not return that space to the operating system. It can't do that because if you have a datafile that is 1 GB in size, you can't simply return “the middle of the file” to the operating system in case it is empty - there is no file system operation which supports that. Instead, PostgreSQL has to remember this free space and reuse it later.

Exception to the VACUUM rule

However, there is an exception to the rule. Consider the following code snippet:

The table will retain its size even after the DELETE statement. Remember: Cleanup is done asynchronously. Therefore VACUUM can be called to delete those rows:

[sql gutter="false"]
test=# VACUUM t_test;
VACUUM

This is a bit of a special case. The rule is: If from a certain position in a table onward, ALL rows are dead, VACUUM can truncate the table. That's exactly what happened in this example:

After all, in large tables there are always a handful of rows towards the end of the data file (under normal circumstances). For that reason, don't count on VACUUM to shrink tables.

VACUUM FULL vs. pg_squeeze

One way to battle table bloat (which is the technical term used to describe tables growing out of proportion) is to use VACUUM FULL. However, the problem is that VACUUM FULL needs a table lock which can be a real problem if you are in a business critical environment. Therefore we have developed a method to reorganize large tables without extensive locking.

The solution is called pg_squeeze. Download it for free from Github or our website.

Finally …

To find out more, you can watch my Youtube video about VACUUM:

You are currently viewing a placeholder content from Youtube. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.

More Information

If your VACUUM process won't remove dead rows and you don't know why, check out Laurenz' blog, 4 Reasons Why VACUUM Won't Remove Dead Rows. In case you are looking to dig deeper, check out our further blogs related to VACUUM.

To get important tips about how disabling AUTOCOMMIT affects autovacuum maintenance, read Laurenz Albe's blog, Disabling Autocommit in PostgreSQL Can Damage Your Health.

0 0 votes
Article Rating
Subscribe
Notify of
guest
2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
PJD
PJD
1 year ago

One thing that isn't clear to me is how much free space overhead is required with pg_squeeze? Do I need to store an entire duplicate copy of the table+indexes during processing?

laurenz
laurenz
1 year ago
Reply to  PJD

Yes, just like with VACUUM (FULL).

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    2
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram