CYBERTEC PostgreSQL Logo

SQL is a very easy to use language and it is pretty easy to make things work out of the box. In fact, from my point of view simplicity is one of the core advantages of SQL in general. But, what if you want to compare dozens of columns with each other? It is not hard to do but it can be a fair amount of unnecessary typing. 

A lengthy example

Let us create a table featuring 4 columns:

To do the test we can add a simple row:

Now we want to see all

This is simple, yet pretty lengthy. The point is: When you are about to write a query you want a result – not a typing exercise.

Simplified filtering

To make this entire filtering process a little easier, we can use brackets:

From my point of view this is ways more readable and a lot faster to write.
But, there is one more to do this:

Remember, in PostgreSQL every table can be seen as a composite type. So, this means that we can use the name of the table inside the WHERE clause directly. PostgreSQL will compare one field after the other and see if they are all equal. This is an even shorter version to compare those fields with each other.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

When doing trainings or consulting sessions I have the feeling that most people are not fully aware of PostgreSQL's shutdown modes. However, knowing about this functionality is pretty important. Without this information sysadmins may cause unintended damage and issues. This can easily be avoided.

PostgreSQL has 3 shutdown modes

When doing pg_ctl --help the system will reveal its secrets:

What we see here is that there are three ways to shut down a PostgreSQL database.

smart - the default

If you run pg_ctl -D ... stop you will do a so called “smart” shutdown. What does it mean? Well, PostgreSQL will shut down as soon as all open connections have been terminated. This is very nice because no queries will be interrupted, and nobody will lose his database connection.

The downside here is: What if database connections are never terminated because there is some sort of connection pool around or so?

In this case it is necessary to shut the PostgreSQL server down in a more aggressive way

fast - usually used in init scripts

“fast” will terminate all open connections in a “nice” way and therefore terminate the database a lot faster than in “smart” mode. The main advantage here is that you can terminate the database server a lot faster than normal. This is also the reason why “fast” is the default mode used by most binary packages.

In most cases “-m fast” is the desired method to perform a shutdown.

immediate - be careful

If “fast” is still not fast enough, you can rely on “immediate” (which is the cruellest method to do a shutdown). As stated before, “fast” kills all open connections in a “nice” way. In this context “nice” means that a process will receive a signal, which can be caught by the process. The process can then perform a proper cleanup and exit in a proper way.

In case of immediate a process will receive a harsher signal and will not be able to respond anymore. In short: It is shot in the head. The consequences of this type of shutdown are: PostgreSQL is not able to finish its disk I/O and therefore has to do a recovery when it comes back up. This will work but it is not the nicest way of doing things. Therefore we don't recommend this type of shutdown.

kill - 9 is not your friend

Sometimes it is not about shutting down an entire database instance. In many cases users just want to get rid of one single database connection. Of course PostgreSQL provides ways to do that cleanly. However, some rookie administrators tend to use brute force methods to kill a connection: “kill -9”.

But, what are the consequences of a hard kill? Consider the following scenario: Let us imagine a database connection is about to modify some shared memory segment. What happens if it dies somewhere during this operation because it is killed? Well, nobody knows what has been changed in shared memory - in fact, there is a fair chance of spreading some corruption due to some half-completed write operation. To avoid this, PostgreSQL will instantly throw out everybody and enter recovery mode to sort out potential problems.

From a user's point of view: All database connections are lost and PostgreSQL is doing something strange. Most users will not understand what is about to happen. We recommend avoiding “kill -9” unless you know precisely what you are doing!

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

UPDATED 21 March 2023: Tracking database changes and tracing users has always been a vitally important part of PostgreSQL database security and application security. Especially when critical data are stored, it might be interesting to know who has changed which data when and how.

Generic Changelog Triggers in PostgreSQL are the key to tracking changes

To track the changes made to tables in PostgreSQL you can write yourself a generic changelog trigger. The easiest way to do that is to write a simple PL/pgSQL function and use it for all tables in the system. As PostgreSQL provides good support for stored procedures, this is definitely not hard to do. In this post you will learn how this can be done and the easiest way to achieve your goal.

Create a table to store some history - tracking changes in PostgreSQL databases

First of all, we need a table to store the changes. For a quick prototype, we can use the following table structure:

For the sake of simplicity, we didn't use enumerators (enumerators are used to store data in a more efficient way). What is also important: The data we store should be in the most generic possible format. What we did here was to use the jsonb data type.
The point of this table is to keep track of all changes made to other tables. We want to know which operation has taken place. Whenever we insert, update or delete a row somewhere, we want to keep a record of that operation.

The backbone of tracking changes infrastructure in PostgreSQL

Taking this into account, we come up with the following code:

The interesting part here is that the trigger is totally generic. We can use the row_to_json function to encode any table row into a JSON object. The advantage is that we can use the very same trigger for countless tables. NEW and OLD will contain the rows before and after the data modification.
Once we have the backbone in place, we can test things using a table.

Here is an example:

Once we have created our tables, we can deploy triggers to do the real work:

What is important here is that we are using an AFTER trigger. We have two choices: BEFORE and AFTER. One trigger fires before the row really makes it into the table. The other one will fire after the row has made it into the table.
The main risk here is that if multiple triggers exist on the same table, we have to make sure that the order of firing is correct: In PostgreSQL (since version 7.3), all BEFORE triggers will fire in alphabetical order (of the trigger name), and then all AFTER triggers will fire in alphabetical order. This has to be taken into account when you have more than one trigger.

Let's test the trigger and see what happens:

Our history table will contain all changes we have made to the underlying tables:

Security considerations when tracking changes in a PostgreSQL database

If we assume that we use this mechanism to track changes to protect our data, we have to make sure that somebody changing the data cannot change the log as well. Therefore, we have to take precautions for this special case. One way to do this is to mark our trigger function as SECURITY DEFINER. This means that the function itself is not executed as the user making the change to the table, but as the user who has written the function. If the superuser deploys the trigger, we can protect ourselves against evil action.

Finally …

Quite often, tracking changes to the database is an important issue for many people. We hope the code listed above is a blueprint which will help you to get started quickly.

If you want to find out more about PostgreSQL, learn how to write SECURITY DEFINER functions securely - read this blog: /en/abusing-security-definer-functions/


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Basically the idea of a random generator is to generate random data
– ideally a lot of random and non-deterministic data. However, sometimes it is necessary to be able to reproduce a random sequence of numbers. Why would anybody want to do that? Well, what if you are running a simulation model? Or what if you want to compare results coming from different servers? In all those cases it is necessary to reset the random generator to restart a certain value.

To make sure the random generator starts all over again, you have to set a seed just like shown in the following listing:

As you can see, the next random value is the same all the time.

This is highly important if you want to control the behavior of your random number generation process (just like you would do it in any programming language).

If you're also interested in PostgreSQL performance and logging, check out our WAL blog spot.

UPDATED August 2023: I was inspired by some previous posts with the idea of abusing PostgreSQL security barriers to cheat on the optimizer. I have constructed an example of how a security barrier can theoretically be used to speed up things under some very nasty circumstances.

WARNING - Please keep in mind: This is a demo – it is not meant to be a tuning guideline. It is merely an example showing how things work and what PostgreSQL is capable of. Please don't use this to tune your databases.

Ordering restrictions

If you write a query in PostgreSQL, the optimizer will try to find a smart way to execute the query. Ideally any SQL database will try to get rid of as much data as possible as soon as possible. In addition it tries to reduce costs by executing cheaper restrictions before expensive restrictions. Usually the user has no control over this process, and there is usually no point in trying to control this process. Remember, the core idea of SQL is that a user can send a query and the database tries to find out the best way – it is not the user who is supposed to arrange restrictions in a clever way.

A little experiment using security barriers

Just for fun (and nothing more) I came up with a small example showing how the order of those quals can be tweaked using security barriers. Here is an example.

First of all we create two functions. One function is going to be fast and one function is going to be slow:

To make the function slow, we simply call pg_sleep to make it wait for a while. In this example we have also set cost parameters to tell PostgreSQL how expensive a function is. In my example I have marked the slow function as cheaper so that the planner will execute it first under normal circumstances.

Then we can create some test data. 20 rows should be sufficient for our purpose:

Let's create a normal view with:

Now let's run the code:

What we see here is that the slow function is now executed quite frequently. Since the function filters out all rows, there is no need to post-filter anything using the fast function. We've misled PostgreSQL into calling the slow function first using our cost parameters, and now we can observe the following result:

We need 10 seconds to execute this one.

Adding a security barrier

Now, let's abuse a security barrier and see what happens:

We can execute the very same query again:

What we see now is that the query is a lot faster because PostgreSQL was not allowed to swap the filters.

Bottom line on security barriers

As I have stated before, security barriers are a SECURITY issue and not a performance issue. However, you have to take into consideration that the optimizer is restricted by these barriers, so that execution plans may change and performance might differ significantly.

Further important security info

Find out all about View Permissions and Row Level Security as of PostgreSQL v15 here.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Christmas is approaching fast so I thought I'd give PostgreSQL users some inspiration when it comes to buying their Christmas presents. I assume that I am not the only one who has constant troubles finding the right present for somebody. The trouble is that prices are usually strange numbers. Maybe EUR 49.95 or EUR 49.99. This makes it very hard to look for something that costs, say, EUR 50.

Let us assume you want to buy something that costs around EUR 50. Of course, if there is nothing for EUR 49.95 or 48.99 - but given the fact that it is Christmas - you are also fine with EUR 54.99.

KNN comes to the rescue

Using the = operator is clearly not enough to fixing your problem. What you really want is some kind of fuzzy search. The trivial approach to this database problem is to just look for a range of values. In many cases this seems like a good idea but remember: If you happen to be a large website, looking for everything between EUR 40 and EUR 60 might yield hundreds of thousands of products. This is clearly not an option. All you want is a handful of suggestions, which are as close as possible to your desired target price.

Finding something close to what you are looking for is exactly what KNN search (K Nearest Neighbor Search) has been made for. It allows you to break the chains of the = operator and allows fuzzy search.

An example

To show you how you can use PostgreSQL to search for similar numbers, we can create a table containing some randomly generated entries:

The generate_series function is a nice and easy way to generate a list of values. As you can see, we have evenly distributed values between 0 and 100 – exactly what we need for our test.

The trivial approach: Ranges

As mentioned before, most people would go for the trivial approach here and just come up with a range. To do so we first create an index:

Then we can just do the query:

Just take a look at the estimates of the PostgreSQL optimizer: We are expected to need 950 rows for this query. This is not surprising as it is close to 10% of the data. Just imagine doing that on 1 mio rows – you had to read 100.000 rows to display just a handful of them. Clearly – performance will go down the drain in this case.

A second problem is: What if there are no products in the range of 45 – 55? What if the cheapest product is EUR 56?

KNN can come to the rescue here.

To use KNN we first have to install the extension containing some fancy operator classes:

Then we can deploy a Gist index capable of doing KNN:

The module will introduce the so called “distance operator” (<->). It measures the distance between the value in doubt and the target we are looking for. The beauty of the PostgreSQL KNN mechanism is that we can sort by this distance to fetch those rows in the order we desire:

Our query just takes a fraction of a millisecond. And more important: Runtimes are pretty stable – even if the number of rows grows.

On behalf of my entire team, we wish you all a Merry Christmas time and efficient Christmas shopping 😉

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram