CYBERTEC PostgreSQL Logo

pg_squeeze: Optimizing PostgreSQL storage

10.2020 / Category: / Tags: |

Is your database growing at a rapid rate? Does your database system slow down all the time? And maybe you have trouble understanding why this happens? Maybe it is time to take a look at pg_squeeze and fix your database once and for all. pg_squeeze has been designed to shrink your database tables without downtime. No more need for VACUUM FULL - pg_squeeze has it all.

The first question any PostgreSQL person will ask is: Why not use VACUUM or VACUUM FULL? There are various reasons: A normal VACUUM does not really shrink the table in disk. Normal VACUUM will look for free space, but it won’t return this space to the operating system. VACUUM FULL does return space to the operating system but it needs a table lock. In case your table is small this usually does not matter. However, what if your table is many TBs in size? You cannot simply lock up a large table for hours just to shrink it after table bloat has ruined performance. pg_squeeze can shrink large tables using only a small, short lock.

However, there is more. The following listing contains some of the operations pg_squeeze can do with minimal locking:

  • Shrink tables
  • Move tables and indexes from one tablespace to another
  • Index organize (“cluster”) a table
  • Change the on-disk FILLFACTOR

After this basic introduction it is time to take a look and see how pg_squeeze can be installed and configured.

PostgreSQL: Installing pg_squeeze

pg_squeeze can be downloaded for free from our GitHub repository. However, binary packages are available for most Linux distributions. If you happen to run Solar, AIX, FreeBSD or some other less widespread operating system just get in touch with us. We are eager to help.

After you have compiled pg_squeeze or installed the binaries some changes have to be made to postgresql.conf:

The most important thing is to set the wal_level to logical. Internally, pg_squeeze works as follows: It creates a new datafile (snapshot) and then applies changes made to the table while this snapshot is copied over. This is done using logical decoding. Of course logical decoding needs replication slots. Finally the library has to be loaded when PostgreSQL is started. This is basically it - pg_squeeze is ready for action.

Understanding table bloat in PostgreSQL

Before we dive deeper into pg_squeeze it is important to understand table bloat in general. Let us take a look at the following example:

Once we have imported 2 million rows the size of the table is 69 MB. What happens if we update these rows and simply add one?

The size of the table is going to double. Remember, UPDATE has to duplicate the row which of course eats up some space. The most important observation, however, is: If you run VACUUM the size of the table on disk is still 138 MB - storage IS NOT returned to the operating system. VACUUM can shrink tables in some rare instances. However, in reality the table is basically never going to return space to the filesystem which is a major issue. Table bloat is one of the most frequent reasons for bad performance, so it is important to either prevent it or make sure the table is allowed to shrink again.

PostgreSQL: Shrinking tables again

If you want to use pg_squeeze you have to make sure that a table has a primary key. It is NOT enough to have unique indexes - it really has to be a primary key. The reason is that we use replica identities internally, so we basically suffer from the same restrictions as other tools using logical decoding.
Let us add a primary key and squeeze the table:

Calling pg_squeeze manually is one way to handle a table. It is the preferred method if you want to shrink a table once. As you can see the table is smaller than before:

The beauty is that minimal locking was needed to do that.

Scheduling table reorganization

pg_squeeze has a builtin job scheduler which can operate in many ways. It can tell the system to squeeze a table within a certain timeframe or trigger a process in case some thresholds have been reached. Internally pg_squeeze uses configuration tables to control its behavior. Here is how it works:

The last column here is worth mentioning: It is a custom data type capable of holding cron-style scheduling information. The custom data type looks as follows:

If you want to make sure that pg_squeeze takes care of a table simple insert the configuration into the table:

In this case public.t_test will be squeezed at 22:30h in the evening every 3rd and 5th day of the week. The main question is: When is that? In our setup days 0 and 7 are sundays. So 3 and 5 means wednesday and friday at 22:30h.
Let us check what the configuration looks like:

Once this configuration is in place, pg_squeeze will automatically take care of things. Everything is controlled by configuration tables so you can easily control and monitor the inner workings of pg_squeeze.

Handling errors

If pg_squeeze decides to take care of a table it can happen that the reorg process actually fails. Why is that the case? One might drop a table and recreate it, the structure might change or pg_squeeze might not be able to get the brief lock at the end. Of course it is also possible that the tablespace you want to move a table too does not have enough space. There are many issues which can lead to errors. Therefore one has to track those reorg processes.
The way to do that is to inspect squeeze.errors:

This log table contains all the relevant information needed to track things fast and easily.

Finally …

pg_squeeze is not the only Open Source tool we have published for PostgreSQL. If you are looking for a cutting edge scheduler we recommend taking a look at what pg_timetable has to offer.

One response to “pg_squeeze: Optimizing PostgreSQL storage”

  1. Hello,
    This is very nice tool with less locking we can reduce and free the space on operation system. I have two questions.

    1) I'm using streaming replication with Postgresql 11. wal_level = hot_standby is set up. in this case Can I use pg_squeeze?

    2) I have huge database with many tables. In this how we can run pg_squeeze on whole database instead of single table?

    Thanks
    Darshan Shah

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram