CYBERTEC Logo
the shibboleth of PostgreSQL
© Laurenz Albe 2020

 

After all the technical articles I have written, I thought it would be nice to write about PostgreSQL sociology for a change.

Language and community

A community like PostgreSQL has no clearly defined borders. There is no membership certificate; you belong to it if you feel that you belong. That said, you can only get that feeling if you feel accepted.

In all communities, language is an important factor. In the case of an international technical community where much of the communication is in written form, this is not so much about the correct accent or general mastership of the language: it is more the choice of technical terms that separates the “in” crowd from the outsiders. This “jargon”, which primarily serves to unambiguously convey certain technical concepts, also has a social dimension.

What is a shibboleth?

A shibboleth, as Wikipedia defines it, is

any custom or tradition, usually a choice of phrasing or even a single word, that distinguishes one group of people from another.

The term comes from the Biblical Book of Judges (12:5-6):

The Gileadites seized the fords of the Jordan and held them against Ephraim. When any Ephraimite who had escaped wished to cross, the men of Gilead would ask, ‘Are you an Ephraimite?’ and if he said ‘No,’ they would retort, ‘Say “Shibboleth.”’ He would say ‘Sibboleth,” and because he could not pronounce the word properly, they seized him and killed him at the fords. At that time fourty-two thousand men of Ephraim lost their lives.

Now, I am not suggesting that the PostgreSQL crowd engages in genocide, but there is a word that immediately exposes you as an outsider if you pronounce or write it wrong: The name “PostgreSQL” itself.

How to write and pronounce PostgreSQL

The official name is of course “PostgreSQL”, and the documentation states that

Many people continue to refer to PostgreSQL as “Postgres” (now rarely in all capital letters) because of tradition or because it is easier to pronounce. This usage is widely accepted as a nickname or alias.

Also, it is not unusual to refer to the product as “PGSQL” or, even shorter, “PG”.

Now “PostgreSQL” is not a word that lends itself to pronunciation easily (and there have been repeated attempts to change the name), so you can imagine that the pronunciation varies even more wildly. My research showed a surprising number of Youtube videos teaching what they perceive as the correct pronunciation, and there has even been a survey by the project about people's preferred pronunciation. The widely accepted forms are “POST-gres-cue-ell” and “POST-gres”.

So given that diversity, you'd think that you cannot get it wrong - but you can.

How not to write and pronounce PostgreSQL

There are two ways to write and pronounce “PostgreSQL” that will immediately identify you as an outsider:

Nobody will seize and kill you if you use these, but people will assume that you don't know a thing about PostgreSQL.

Conclusion

Fake it till you make it! Show that you belong and avoid calling the product “Postgre” or “Postgres SQL”.

I wrote about the new pg_timetable 3 major release not so long ago. Two essential features were highlighted:
- new session locking implementation
- new jackc/pgx Golang library used

Today I want to reveal one more advanced feature! Fasten your seat belts!
First, we need to distinguish exclusive client session mode from exclusive chain execution mode.

Exclusive client session

Each pg_timetable instance, known as a client, is pre-set to work in the exclusive mode. Meaning there is no possibility each target database can hold more than one active client connection with the same client name. How we distinguish clients? Simple, the command-line parameter:
pg_timetable --clientname=worker01 ...

It is possible to run as many different clients as you want. Take a look:

Of course, life is complicated, and sometimes more than one pg_timetable instance with the same client name may connect to the target database. In this case, pg_timetable ensures that one and only one is executing chains exclusively. The second will wait until the active one will disconnect.

Frankly speaking, you can run whatever number of pg_timetable clients with the same name. The only one will be active (primary), and all others will be waiting (standby). The obvious drawback of such a situation is losing database connection slots to idle clients.

Exclusive chain

Let's get back to chains. By default, all chains are supposed to run simultaneously at their schedule. This number is physically limited by the number of pg_timetable instance parallel workers. Right now, this number is 16 for scheduled chains workers and 16 for interval chains workers. These workers are implemented using lightweight goroutines. That doesn't mean you cannot have more than 32 chains scheduled for the same time. It means no more than 16 scheduled are executed at any given time, and others are waiting in the queue.

Maybe later, we can add a settings parameter to specify the exact number of workers to run. However, we have never yet hit the situation where the default settings were not enough.

Imagine there are several pg_timetable instances working simultaneously against the same PostgreSQL database. And some task (exclusive or not) has no specified executor (by default). That leads all clients will execute the same chain in their own session.

One more time. We can only guarantee exclusive chain execution within the pg_timetable client session. We won't block any chain's simultaneous execution in several parallel pg_timetable sessions by design. Please use the client_name field of the timetable.chain_execution_config table during chain creation to prevent this situation.

Why exclusive chains matter? I'm sure you can find many use cases: report building, software updates, external non-transactional data updates (files), other process waiting, etc.

Let's sum up the concept of the exclusive task:

  1. exclusive chain adds itself to the queue
  2. exclusive chain waits until all working chains and jobs finished
  3. exclusive chain starts and blocks all other chains until it's done

Implementation details

If you look at the problem closer, you might see the classic readers-writers problems.

Readers-writers problems (there are some variations, check the wiki for details) occur when a shared piece of resource needs to be accessed by multiple executors. We assume our exclusive chains are writers, and non-exclusive (regular) chains are readers.

⚠ This is not related to a database, file system, or any other IO read-write. Just the concept itself is identical, so we can use the same algorithms efficiently.

Thank God we have standard Golang sync package which provides RWMutex type to solve this in a simple way.

From the Golang docs:

An RWMutex is a reader/writer mutual exclusion lock. The lock can be held by an arbitrary number of readers or a single writer. The zero value for an RWMutex is an unlocked mutex.

In other words, non-exclusive regular chains (readers) don't have to wait for each other. They only have to wait for exclusive chains (writers) holding the lock. And only one exclusive (writer) lock is possible at any given time.

During parallel operation, regular chains will call RWMutex.RLock() before and RWMutex.RUnlock() after transaction is done.

And exclusive chains will call RWMutex.Lock() before and RWMutex.Unlock() after. Simple and effective. I really like to Go!

Further reading:

Finally

This was the first in a series of posts dedicated to the new pg_timetable v3 features. Stay tuned for the components to be highlighted:

The previous post can be found here.

Stay safe, healthy, and wealthy!
Be happy! Peace! Love! ❤

UPDATE is not the same as DELETE + INSERT!
© Laurenz Albe 2020

 

Introduction

We know that PostgreSQL does not update a table row in place. Rather, it writes a new version of the row (the PostgreSQL term for a row version is “tuple”) and leaves the old row version in place to serve concurrent read requests. VACUUM later removes these “dead tuples”.

If you delete a row and insert a new one, the effect is similar: we have one dead tuple and one new live tuple. This is why many people (me, among others) explain to beginners that “an UPDATE in PostgreSQL is almost the same as a DELETE, followed by an INSERT”.

This article is about that “almost”.

Example showing the difference between UPDATE and DELETE + INSERT

Let's take this simple test table:

In the two following tests, we will issue statements from two concurrent sessions.

First, the UPDATE:

Let's reset the table before the second test;

Now let's repeat the experiment with DELETE and INSERT:

Explanation for the observed difference

The documentation describes what happens when an SQL statement runs into a lock in a transaction with the default READ COMMITTED isolation level:

UPDATE, DELETE, SELECT FOR UPDATE, and SELECT FOR SHARE commands behave the same as SELECT in terms of searching for target rows: they will only find target rows that were committed as of the command start time. However, such a target row might have already been updated (or deleted or locked) by another concurrent transaction by the time it is found. In this case, the would-be updater will wait for the first updating transaction to commit or roll back (if it is still in progress). If the first updater rolls back, then its effects are negated and the second updater can proceed with updating the originally found row. If the first updater commits, the second updater will ignore the row if the first updater deleted it, otherwise it will attempt to apply its operation to the updated version of the row. The search condition of the command (the WHERE clause) is re-evaluated to see if the updated version of the row still matches the search condition. If so, the second updater proceeds with its operation using the updated version of the row. In the case of SELECT FOR UPDATE and SELECT FOR SHARE, this means it is the updated version of the row that is locked and returned to the client.

The above shows that there is some way for PostgreSQL to find the new version of an updated row. That is why the first experiment returned a result row. In the second experiment, there was no connection between the old, deleted row and the newly inserted one, that's why we get no result in that case.

To figure out how the old and the new version are connected, we have to look deeper.

Looking at UPDATE with “pageinspect”

The extension pageinspect allows us to see all data in a PostgreSQL data page. It requires superuser permissions.

Let's use it to see what's on disk after the first experiment:

The first entry is the old version of the row, the second the new version.

The lp is the line pointer number, which stands for the number of the tuple within the data page. Together with the page number, this constitutes the physical address (tuple ID or tid) of a tuple.

It follows that the t_ctid stored in the tuple header is usually redundant, since it is implicit in the line pointer. However, it becomes relevant after an UPDATE: then t_ctid contains the tuple identifier of the updated version of the row.

This is the “missing link” between the old row version and the updated one!

Looking at DELETE + DELETE with “pageinspect”

Let's compare this to DELETE + INSERT:

Here the t_ctid column from the old, deleted tuple is unchanged and there is no link to the new tuple. The second tuple is not found by the SELECT ... FOR UPDATE, since it is “invisible” to the “snapshot” used for scanning the table.

Differences in infomask and infomask2

There are also some relevant differences in the attributes infomask and infomask2. You can see the meaning of these flags in the PostgreSQL source file src/include/access/htup_details.h.

Conclusion

To understand the difference between UPDATE and DELETE+INSERT, we had a closer look at the tuple headers. We saw infomask, infomask2 and t_ctid, where the latter provides the link between the old and the new version of a row.

PostgreSQL's row header occupies 23 bytes, which is more storage overhead than in other databases, but is required for PostgreSQL's special multiversioning and tuple visibility implementation.

UPDATE can be challenging in PostgreSQL: if you want to read more about its problems and how to deal with them, read my article on HOT update.

Blog post updated 05.12.2022 People keep asking: "What are the best ways to upgrade and update my PostgreSQL to the latest version?" This blog post covers how you can upgrade or update to the latest PostgreSQL release.

Before we get started, we have to make a distinction between two things:

Let’s take a look at both scenarios.

Updating PostgreSQL

The first scenario I want to shed some light on is updating within a major version of PostgreSQL: What does that mean? Suppose you want to move from PostgreSQL 15.0 to PostgreSQL 15.1. In that case, all you have to do is to install the new binaries and restart the database.
There is no need to reload the data. The downtime required is minimal. The binary format does not change; a quick restart is all you need. The same is true if you are running a HA (high availability) cluster solution such as Patroni. Simply restart all nodes in your PostgreSQL cluster and you are ready to go.
The following table contains a little summary:

Tooling Task Reloading data Downtime needed
yum/dnf/apt Minor release update Not needed Close to zero

Upgrading PostgreSQL

Now let’s take a look at upgrades: if you want to move from PostgreSQL 9.6, 10, 13 or some other older version to PostgreSQL 15, an upgrade is needed. To do that, you have various options:

If you dump and reload data, it might take a lot of time. The bigger your database is, the more time you will need to do the upgrade. It follows that pg_dump/pg_dumpall and restore are not the right tools to upgrade a large multi-terabyte database.
pg_upgrade is here to do a binary upgrade. It copies all the data files from the old directory to the new one. Depending on the amount of data, this can take quite a lot of time and cause serious downtime. However, if the new and the old data directory are on the same filesystem, there is a better option: “pg_upgrade --link”. Instead of copying all the files, pg_upgrade will create hard links for those data files. The amount of data is not a limiting factor anymore, because hard links can be created quickly. “pg_upgrade --link” therefore promises close to zero downtime.

Tooling Task Reloading data Downtime needed
pg_upgrade Major release update Metadata and files are copied Downtime is needed
pg_upgrade --link Major release update Only metadata are copied Close to zero

 

What is important to note here is that pg_upgrade is never destructive. If something goes wrong, you can always delete the new data directory and start from scratch.

Preparing a sample database

To show pg_upgrade in action, I have created a little sample database to demonstrate how things work in real life:

This is a fairly small database, but it is already large enough so that users can feel the difference when doing the upgrade:

7.4 GB are ready to be upgraded. Let’s see how it works.

pg_upgrade in action

To upgrade a database, three steps are needed:

Let us start with initdb:

Note that pg_upgrade is only going to work in case the encodings of the old and the new database instance match. Otherwise, it will fail.

After adapting the configuration files, we can run pg_upgrade: Basically, we need four pieces of information here: The old and the new data directory as well as the path of the old and the new binaries:

What is worth mentioning here is that the upgrade process takes over four minutes because all the data had to be copied to the new directory. My old Mac is not very fast and copying takes a very long time.

To reduce downtime, we can clean out the directory, run initdb again and add the --link option:

In this case, it took only 3.5 seconds to upgrade. We can start the database instance directly and keep working with the new instance.

pg_upgrade --link under the hood

The --link option can reduce the downtime to close to zero. The question is what happens under the hood? Let's check out the data files:

What you see here is that the relevant data files have two entries: “2” means that the same file (inode) shows up as two directories. pg_upgrade creates hard links, and this is exactly what we see here.

That's why it is super important to make sure that the mount point is /data and not /data/db12. Otherwise, hard links are not possible.

Finally …

pg_upgrade is an important tool. However, there are more ways to upgrade: Logical replication is an alternative method. Check out our blog post about this exciting technology.

PostgreSQL offers a nice BLOB interface which is widely used. However, recently we came across problems faced by various customers, and it makes sense to reflect a bit and figure out how PostgreSQL handles BLOBs - and especially BLOB cleanup.

Using the PostgreSQL BLOB interface

In PostgreSQL, you can use various means to store binary data. The simplest form is definitely to make use of the “bytea” (= byte array) data type. In this case a binary field is basically seen as part of a row.
Here is how it works:

As you can see, this is a normal column and it can be used just like a normal column. The only thing worth mentioning is the encoding one has to use on the SQL level. PostgreSQL uses a variable to configure this behavior:

Bytea_output

The bytea_output variable accepts two values: “hex” tells PostgreSQL to send the data in hex format. “escape” means that data has to be fed in as an octal string. There is not much the application has to worry about here, apart from the maximum size of 1 GB per field.
However, PostgreSQL has a second interface to handle binary data: The BLOB interface. Let me show an example of this powerful tool in action:

In this case, the content of /etc/hosts has been imported into the database. Note that PostgreSQL has a copy of the data - it is not a link to the filesystem. What is noteworthy here is that the database will return the OID (object ID) of the new entry. To keep track of these OIDs, some developers do the following:

This is absolutely fine, unless you do something like below:

The problem is that the object id has been forgotten. However, the object is still there. pg_largeobject is the system table in charge of storing the binary data inside PostgreSQL. All lo_functions will simply talk to this system table in order to handle thesethings:

Why is that a problem? The reason is simple: Your database will grow and the number of “dead objects” will accumulate.

Therefore the correct way to kill a BLOB entry is as follows:

If you forget to unlink the object, you will suffer in the long run - and we have often seen that happen. It is a major issue if you are using the BLOB interface.

vacuumlo: Cleaning up dead large objects

However, how can one fix the problem once you have accumulated thousands, or maybe millions, of dead BLOBs? The answer is a command line tool called “vacuumlo”.
Let us first create a dead entry:

Then we can run vacuumlo from any client:

As you can see, two dead objects have been killed by the tool. vacuumlo is the easiest way to clean out orphan objects.

Additional functionality

However, there is more than just lo_import and lo_unlink. PostgreSQL offers a variety of functions to handle large objects in a nice way:

There are two more functions which don’t follow the naming convention for historic reasons: loread and lowrite:

They are functions whose names cannot easily be changed anymore. However, it is worth noting that they exist.

Finally …

The PostgreSQL BLOB interface is really useful and can be used for many things. The beauty is that it is fully transactional and therefore binary content and metadata cannot go out of sync anymore.

If you want to learn more about triggers to enforce constraints in PostgreSQL, we recommend you check out our blog post written by Laurenz Albe. It will shed some light on this important topic.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram