CYBERTEC PostgreSQL Logo

Cleaning up a large number of BLOBs in PostgreSQL

04.2021 / Category: / Tags: | | |

PostgreSQL allows end users to store BLOBs (binary large objects) in the database. Many people use these functions to handle data directly in SQL. There has long been a discussion about whether this is a good thing or not. We expect that particular discussion to go on indefinitely. However, if you are in the “pro BLOB” camp, we want to share some insights into how binary data can be handled in PostgreSQL with maximum efficiency.

Loading files into the database

BLOBs are important: to show how they work, I have created a file which we can import into the database later on:

To import this file, we can call the lo_import function and pass the name of the file you want to load (as superuser). Here is how it works:

What we see here is that PostgreSQL gives us a number (= object ID). Note that the filename has “vanished”, so you have to store it somewhere if you want to remember those values. The way to do this is to create a basic table:

PostgreSQL gives you the option of using the OID data type to store object IDs. Storing the filename is easy - a text or varchar column will do the job, in case the path is needed later on. In general, the large object is totally independent of the file in the filesystem - it has nothing to do with it. Storing the filename is therefore merely done in order to remember what we imported.

However, don’t let’s worry about a single file. Let’s import it a million times, and see what happens:

The file has been imported nicely. Each file has a new object ID, as shown in the previous listing.

Behind the scenes: How PostgreSQL stores BLOBs

Now that you have seen how to import data, we can take a look at the internals to figure out how data is stored behind the scenes. The way PostgreSQL handles things is by adding the data to pg_largeobject:

The “loid” contains the object ID we have stored in our table. The real data is stored as a “bytea” (byte array) column. Since some files might be really large, PostgreSQL breaks them up into chunks.

Byte array fields are easy to display. However, in the default setting it is not really human readable (which is pointless anyway, because we are talking about binary data). I have imported some text in my example, so the “escape” format is a bit more readable:

What we see here is that the BLOB interface is really a wrapper around the “bytea” data type. That is the real magic.

Removing BLOBs

Now let’s move on to deleting BLOBs again. The way to do that is to call the “lo_unlink” function. It will remove the entry from the system catalog:

Removing a simple entry will be easy. However, let’s try to answer the following question: What happens when millions of BLOBs are removed from the system within the very same transaction?

Removing millions of BLOBs at a time

Running the deletion is straightforward. The result, on the other hand, is fairly unexpected:

PostgreSQL is not able to run the transaction because we have run out of memory! How can that happen? When a large object is deleted, it has to be locked. The trouble is that the number of locks in shared memory is limited.

NOTE: Please bear in mind that row locks
are NOT stored in shared memory. This is
only true for objects (table locks, etc.)

As a result, we are running out of memory. The main question now becomes: How many locks can we store? The answer can be found within three config variables (these are located in postgresql.conf):

The number of locks available is:

 

So in my case it is 6.400. In short, there is no way we can handle these deletions in one transaction.

Basically, we’ve got two ways to solve this problem:

  • Change the PostgreSQL configuration (requires a restart)
  • Delete BLOBs in chunks

Those are the only ways we can get rid of those BLOBs in the database.

Dealing with orphan BLOBs

Sometimes, BLOBs are lost because the reference to the OID goes missing, but the object itself is not unlinked. In this case, a command line tool is available:

vacuumlo will look for unreferenced large objects and delete them. This is a good way to clean a database, in case your application is buggy or something bad happens.

Finally …

If you are struggling with BLOBs, we want to point out that we offer PostgreSQL consulting as well as 24x7 database support. Our highly professional team can definitely help you with your BLOB-related problems. Contact us >>

If you want to learn about the latest PostgreSQL tips and tricks, consider checking out this blogpost >>

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram