December 2013

SQL is a very easy to use language and it is pretty easy to make things work out of the box. In fact, from my point of view simplicity is one of the core advantages of SQL in general. But, what if you want to compare dozens of columns with each other? It is not hard to do but it can be a fair amount of unnecessary typing.

A lengthy example

Let us create a table featuring 4 columns:

test=# CREATE TABLE t_test (a int, b int, c int, d int);
CREATE TABLE

1 2	test=# CREATE TABLE t_test (a int, b int, c int, d int); CREATE TABLE

To do the test we can add a simple row:

test=# INSERT INTO t_test VALUES (1, 2, 3, 4);
INSERT 0 1

1 2	test=# INSERT INTO t_test VALUES (1, 2, 3, 4); INSERT 0 1

Now we want to see all

test=# SELECT * FROM t_test WHERE a = 1 AND b = 2 AND c = 3 AND d = 4;
 a | b | c | d
---+---+---+---
 1 | 2 | 3 | 4
(1 row)

test=# SELECT * FROM t_test WHERE a = 1 AND b = 2 AND c = 3 AND d = 4;

a | b | c | d

---+---+---+---

1 | 2 | 3 | 4

(1 row)

This is simple, yet pretty lengthy. The point is: When you are about to write a query you want a result – not a typing exercise.

Simplified filtering

To make this entire filtering process a little easier, we can use brackets:

test=# SELECT * FROM t_test WHERE (a, b, c, d) = (1, 2, 3, 4);
 a | b | c | d
---+---+---+---
 1 | 2 | 3 | 4
(1 row)

test=# SELECT * FROM t_test WHERE (a, b, c, d) = (1, 2, 3, 4);

a | b | c | d

---+---+---+---

1 | 2 | 3 | 4

(1 row)

From my point of view this is ways more readable and a lot faster to write.
But, there is one more to do this:

test=# SELECT * FROM t_test WHERE t_test = (1, 2, 3, 4);
 a | b | c | d
---+---+---+---
 1 | 2 | 3 | 4
(1 row)

test=# SELECT * FROM t_test WHERE t_test = (1, 2, 3, 4);

a | b | c | d

---+---+---+---

1 | 2 | 3 | 4

(1 row)

Remember, in PostgreSQL every table can be seen as a composite type. So, this means that we can use the name of the table inside the WHERE clause directly. PostgreSQL will compare one field after the other and see if they are all equal. This is an even shorter version to compare those fields with each other.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

When doing trainings or consulting sessions I have the feeling that most people are not fully aware of PostgreSQL's shutdown modes. However, knowing about this functionality is pretty important. Without this information sysadmins may cause unintended damage and issues. This can easily be avoided.

PostgreSQL has 3 shutdown modes

When doing pg_ctl --help the system will reveal its secrets:

Options for stop or restart:

  -m SHUTDOWN-MODE   can be "smart", "fast", or "immediate"

 Shutdown modes are:

  smart       quit after all clients have disconnected
  fast        quit directly, with proper shutdown
  immediate   quit without complete shutdown; will lead to recovery on restart

Options for stop or restart:

-m SHUTDOWN-MODE can be "smart", "fast", or "immediate"

Shutdown modes are:

smart quit after all clients have disconnected

fast quit directly, with proper shutdown

immediate quit without complete shutdown; will lead to recovery on restart

What we see here is that there are three ways to shut down a PostgreSQL database.

smart - the default

If you run pg_ctl -D ... stop you will do a so called “smart” shutdown. What does it mean? Well, PostgreSQL will shut down as soon as all open connections have been terminated. This is very nice because no queries will be interrupted, and nobody will lose his database connection.

The downside here is: What if database connections are never terminated because there is some sort of connection pool around or so?

In this case it is necessary to shut the PostgreSQL server down in a more aggressive way

fast - usually used in init scripts

“fast” will terminate all open connections in a “nice” way and therefore terminate the database a lot faster than in “smart” mode. The main advantage here is that you can terminate the database server a lot faster than normal. This is also the reason why “fast” is the default mode used by most binary packages.

In most cases “-m fast” is the desired method to perform a shutdown.

immediate - be careful

If “fast” is still not fast enough, you can rely on “immediate” (which is the cruellest method to do a shutdown). As stated before, “fast” kills all open connections in a “nice” way. In this context “nice” means that a process will receive a signal, which can be caught by the process. The process can then perform a proper cleanup and exit in a proper way.

In case of immediate a process will receive a harsher signal and will not be able to respond anymore. In short: It is shot in the head. The consequences of this type of shutdown are: PostgreSQL is not able to finish its disk I/O and therefore has to do a recovery when it comes back up. This will work but it is not the nicest way of doing things. Therefore we don't recommend this type of shutdown.

kill - 9 is not your friend

Sometimes it is not about shutting down an entire database instance. In many cases users just want to get rid of one single database connection. Of course PostgreSQL provides ways to do that cleanly. However, some rookie administrators tend to use brute force methods to kill a connection: “kill -9”.

But, what are the consequences of a hard kill? Consider the following scenario: Let us imagine a database connection is about to modify some shared memory segment. What happens if it dies somewhere during this operation because it is killed? Well, nobody knows what has been changed in shared memory - in fact, there is a fair chance of spreading some corruption due to some half-completed write operation. To avoid this, PostgreSQL will instantly throw out everybody and enter recovery mode to sort out potential problems.

From a user's point of view: All database connections are lost and PostgreSQL is doing something strange. Most users will not understand what is about to happen. We recommend avoiding “kill -9” unless you know precisely what you are doing!

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

UPDATED 21 March 2023: Tracking database changes and tracing users has always been a vitally important part of PostgreSQL database security and application security. Especially when critical data are stored, it might be interesting to know who has changed which data when and how.

Generic Changelog Triggers in PostgreSQL are the key to tracking changes

To track the changes made to tables in PostgreSQL you can write yourself a generic changelog trigger. The easiest way to do that is to write a simple PL/pgSQL function and use it for all tables in the system. As PostgreSQL provides good support for stored procedures, this is definitely not hard to do. In this post you will learn how this can be done and the easiest way to achieve your goal.

Create a table to store some history - tracking changes in PostgreSQL databases

First of all, we need a table to store the changes. For a quick prototype, we can use the following table structure:

CREATE SCHEMA logging;
 
CREATE TABLE logging.t_history (
        id             	serial,
        tstamp         	timestamp   	DEFAULT now(),
        schemaname     	text,
        tabname        	text,
        operation      	text,
        who            	text        	DEFAULT current_user,
        new_val        	jsonb,
        old_val        	jsonb
);

CREATE SCHEMA logging;

CREATE TABLE logging.t_history (

id serial,

tstamp timestamp DEFAULT now(),

schemaname text,

tabname text,

operation text,

who text DEFAULT current_user,

new_val jsonb,

old_val jsonb

);

For the sake of simplicity, we didn't use enumerators (enumerators are used to store data in a more efficient way). What is also important: The data we store should be in the most generic possible format. What we did here was to use the jsonb data type.
The point of this table is to keep track of all changes made to other tables. We want to know which operation has taken place. Whenever we insert, update or delete a row somewhere, we want to keep a record of that operation.

The backbone of tracking changes infrastructure in PostgreSQL

Taking this into account, we come up with the following code:

CREATE FUNCTION change_trigger() RETURNS trigger AS $$
BEGIN
	IF  	TG_OP = 'INSERT'
	THEN
		INSERT INTO logging.t_history (tabname, schemaname, operation, new_val)
			VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(NEW));
			RETURN NEW;
	ELSIF   TG_OP = 'UPDATE'
	THEN
		INSERT INTO logging.t_history (tabname, schemaname, operation, new_val, old_val)
			VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP,
				row_to_json(NEW), row_to_json(OLD));
		RETURN NEW;
	ELSIF   TG_OP = 'DELETE'
	THEN
		INSERT INTO logging.t_history (tabname, schemaname, operation, old_val)
			VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(OLD));
		RETURN OLD;
	END IF;
END;
$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

CREATE FUNCTION change_trigger() RETURNS trigger AS $$

BEGIN

IF TG_OP = 'INSERT'

THEN

INSERT INTO logging.t_history (tabname, schemaname, operation, new_val)

VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(NEW));

RETURN NEW;

ELSIF TG_OP = 'UPDATE'

THEN

INSERT INTO logging.t_history (tabname, schemaname, operation, new_val, old_val)

VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP,

row_to_json(NEW), row_to_json(OLD));

RETURN NEW;

ELSIF TG_OP = 'DELETE'

THEN

INSERT INTO logging.t_history (tabname, schemaname, operation, old_val)

VALUES (TG_RELNAME, TG_TABLE_SCHEMA, TG_OP, row_to_json(OLD));

RETURN OLD;

END IF;

END;

$$ LANGUAGE 'plpgsql' SECURITY DEFINER;

The interesting part here is that the trigger is totally generic. We can use the row_to_json function to encode any table row into a JSON object. The advantage is that we can use the very same trigger for countless tables. NEW and OLD will contain the rows before and after the data modification.
Once we have the backbone in place, we can test things using a table.

Here is an example:

CREATE TABLE t_trig (id int, name text);

1	CREATE TABLE t_trig (id int, name text);

Once we have created our tables, we can deploy triggers to do the real work:

CREATE TRIGGER t 
AFTER INSERT OR UPDATE OR DELETE ON t_trig
        FOR EACH ROW EXECUTE PROCEDURE change_trigger();

CREATE TRIGGER t

AFTER INSERT OR UPDATE OR DELETE ON t_trig

FOR EACH ROW EXECUTE PROCEDURE change_trigger();

What is important here is that we are using an AFTER trigger. We have two choices: BEFORE and AFTER. One trigger fires before the row really makes it into the table. The other one will fire after the row has made it into the table.
The main risk here is that if multiple triggers exist on the same table, we have to make sure that the order of firing is correct: In PostgreSQL (since version 7.3), all BEFORE triggers will fire in alphabetical order (of the trigger name), and then all AFTER triggers will fire in alphabetical order. This has to be taken into account when you have more than one trigger.

Let's test the trigger and see what happens:

INSERT INTO t_trig VALUES (1, 'hans'); 
UPDATE t_trig SET id = 10 * id, name = 'paul';

1 2	INSERT INTO t_trig VALUES (1, 'hans'); UPDATE t_trig SET id = 10 * id, name = 'paul';

Our history table will contain all changes we have made to the underlying tables:

-[ RECORD 1 ]--------------------------
id         | 1
tstamp     | 2023-02-27 10:25:37.530579
schemaname | public
tabname    | t_trig
operation  | INSERT
who        | hs
new_val    | {'id': 1, 'name': 'hans'}
old_val    | 
-[ RECORD 2 ]--------------------------
id         | 2
tstamp     | 2023-02-27 10:25:37.530579
schemaname | public
tabname    | t_trig
operation  | UPDATE
who        | hs
new_val    | {'id': 10, 'name': 'paul'}
old_val    | {'id': 1, 'name': 'hans'}

-[ RECORD 1 ]--------------------------

id | 1

tstamp | 2023-02-27 10:25:37.530579

schemaname | public

tabname | t_trig

operation | INSERT

who | hs

new_val | {'id': 1, 'name': 'hans'}

old_val |

-[ RECORD 2 ]--------------------------

id | 2

tstamp | 2023-02-27 10:25:37.530579

schemaname | public

tabname | t_trig

operation | UPDATE

who | hs

new_val | {'id': 10, 'name': 'paul'}

old_val | {'id': 1, 'name': 'hans'}

Security considerations when tracking changes in a PostgreSQL database

If we assume that we use this mechanism to track changes to protect our data, we have to make sure that somebody changing the data cannot change the log as well. Therefore, we have to take precautions for this special case. One way to do this is to mark our trigger function as SECURITY DEFINER. This means that the function itself is not executed as the user making the change to the table, but as the user who has written the function. If the superuser deploys the trigger, we can protect ourselves against evil action.

Finally …

Quite often, tracking changes to the database is an important issue for many people. We hope the code listed above is a blueprint which will help you to get started quickly.

If you want to find out more about PostgreSQL, learn how to write SECURITY DEFINER functions securely - read this blog: /en/abusing-security-definer-functions/

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Basically the idea of a random generator is to generate random data
– ideally a lot of random and non-deterministic data. However, sometimes it is necessary to be able to reproduce a random sequence of numbers. Why would anybody want to do that? Well, what if you are running a simulation model? Or what if you want to compare results coming from different servers? In all those cases it is necessary to reset the random generator to restart a certain value.

To make sure the random generator starts all over again, you have to set a seed just like shown in the following listing:

test=# SELECT setseed(0.5);
 setseed
---------

(1 row)

test=# SELECT random();
      random      
-------------------
 0.798512778244913
(1 row)

test=# SELECT setseed(0.5);
 setseed
---------

(1 row)

test=# SELECT random();
      random      
-------------------
 0.798512778244913
(1 row)

test=# SELECT setseed(0.5);

setseed

---------

(1 row)

test=# SELECT random();

random

-------------------

0.798512778244913

(1 row)

test=# SELECT setseed(0.5);

setseed

---------

(1 row)

test=# SELECT random();

random

-------------------

0.798512778244913

(1 row)

As you can see, the next random value is the same all the time.

This is highly important if you want to control the behavior of your random number generation process (just like you would do it in any programming language).

If you're also interested in PostgreSQL performance and logging, check out our WAL blog spot.

UPDATED August 2023: I was inspired by some previous posts with the idea of abusing PostgreSQL security barriers to cheat on the optimizer. I have constructed an example of how a security barrier can theoretically be used to speed up things under some very nasty circumstances.

WARNING - Please keep in mind: This is a demo – it is not meant to be a tuning guideline. It is merely an example showing how things work and what PostgreSQL is capable of. Please don't use this to tune your databases.

Ordering restrictions

If you write a query in PostgreSQL, the optimizer will try to find a smart way to execute the query. Ideally any SQL database will try to get rid of as much data as possible as soon as possible. In addition it tries to reduce costs by executing cheaper restrictions before expensive restrictions. Usually the user has no control over this process, and there is usually no point in trying to control this process. Remember, the core idea of SQL is that a user can send a query and the database tries to find out the best way – it is not the user who is supposed to arrange restrictions in a clever way.

A little experiment using security barriers

Just for fun (and nothing more) I came up with a small example showing how the order of those quals can be tweaked using security barriers. Here is an example.

First of all we create two functions. One function is going to be fast and one function is going to be slow:

-- funcs
CREATE FUNCTION slow_func(int4) RETURNS int4 AS $
        BEGIN
                EXECUTE 'SELECT pg_sleep(1)';
                RAISE NOTICE 'slow_func: %', $1;
                RETURN $1;
        END;
$ LANGUAGE 'plpgsql' 
        IMMUTABLE
        COST  10;

CREATE FUNCTION fast_func(int4) RETURNS int4 AS $
        BEGIN
                RAISE NOTICE 'fast_func: %', $1;
                RETURN $1;
        END;
$ LANGUAGE 'plpgsql' 
        IMMUTABLE
        COST  100;

-- funcs

CREATE FUNCTION slow_func(int4) RETURNS int4 AS $

BEGIN

EXECUTE 'SELECT pg_sleep(1)';

RAISE NOTICE 'slow_func: %', $1;

RETURN $1;

END;

$ LANGUAGE 'plpgsql'

IMMUTABLE

COST 10;

CREATE FUNCTION fast_func(int4) RETURNS int4 AS $

BEGIN

RAISE NOTICE 'fast_func: %', $1;

RETURN $1;

END;

$ LANGUAGE 'plpgsql'

IMMUTABLE

COST 100;

To make the function slow, we simply call pg_sleep to make it wait for a while. In this example we have also set cost parameters to tell PostgreSQL how expensive a function is. In my example I have marked the slow function as cheaper so that the planner will execute it first under normal circumstances.

Then we can create some test data. 20 rows should be sufficient for our purpose:

-- test data

CREATE TABLE t_test (id int4);
INSERT INTO t_test SELECT * FROM generate_series(1, 20);

-- test data

CREATE TABLE t_test (id int4);

INSERT INTO t_test SELECT * FROM generate_series(1, 20);

Let's create a normal view with:

CREATE VIEW v AS
        SELECT  *
        FROM    t_test
        WHERE   id % 2 = 0
                AND fast_func(id) = 0;

CREATE VIEW v AS

SELECT *

FROM t_test

WHERE id % 2 = 0

AND fast_func(id) = 0;

Now let's run the code:

            SELECT * FROM v WHERE slow_func(id) = 0;

1	SELECT * FROM v WHERE slow_func(id) = 0;

What we see here is that the slow function is now executed quite frequently. Since the function filters out all rows, there is no need to post-filter anything using the fast function. We've misled PostgreSQL into calling the slow function first using our cost parameters, and now we can observe the following result:

timing

SELECT * FROM v WHERE slow_func(id) = 0;
NOTICE:  slow_func: 2
NOTICE:  slow_func: 4
NOTICE:  slow_func: 6
NOTICE:  slow_func: 8
NOTICE:  slow_func: 10
NOTICE:  slow_func: 12
NOTICE:  slow_func: 14
NOTICE:  slow_func: 16
NOTICE:  slow_func: 18
NOTICE:  slow_func: 20
 id
----
(0 rows)

Time: 10012.580 ms

timing

SELECT * FROM v WHERE slow_func(id) = 0;

NOTICE: slow_func: 2

NOTICE: slow_func: 4

NOTICE: slow_func: 6

NOTICE: slow_func: 8

NOTICE: slow_func: 10

NOTICE: slow_func: 12

NOTICE: slow_func: 14

NOTICE: slow_func: 16

NOTICE: slow_func: 18

NOTICE: slow_func: 20

----

(0 rows)

Time: 10012.580 ms

We need 10 seconds to execute this one.

Adding a security barrier

Now, let's abuse a security barrier and see what happens:

CREATE VIEW v WITH (security_barrier) AS  
        SELECT  *
        FROM    t_test
        WHERE   id % 2 = 0
                AND fast_func(id) = 0;

CREATE VIEW v WITH (security_barrier) AS

SELECT *

FROM t_test

WHERE id % 2 = 0

AND fast_func(id) = 0;

We can execute the very same query again:

Timing is on.
NOTICE:  fast_func: 2
NOTICE:  fast_func: 4
NOTICE:  fast_func: 6
NOTICE:  fast_func: 8
NOTICE:  fast_func: 10
NOTICE:  fast_func: 12
NOTICE:  fast_func: 14
NOTICE:  fast_func: 16
NOTICE:  fast_func: 18
NOTICE:  fast_func: 20
 id
----
(0 rows)

Time: 0.685 ms

Timing is on.

NOTICE: fast_func: 2

NOTICE: fast_func: 4

NOTICE: fast_func: 6

NOTICE: fast_func: 8

NOTICE: fast_func: 10

NOTICE: fast_func: 12

NOTICE: fast_func: 14

NOTICE: fast_func: 16

NOTICE: fast_func: 18

NOTICE: fast_func: 20

----

(0 rows)

Time: 0.685 ms

What we see now is that the query is a lot faster because PostgreSQL was not allowed to swap the filters.

Bottom line on security barriers

As I have stated before, security barriers are a SECURITY issue and not a performance issue. However, you have to take into consideration that the optimizer is restricted by these barriers, so that execution plans may change and performance might differ significantly.

Further important security info

Find out all about View Permissions and Row Level Security as of PostgreSQL v15 here.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

Christmas is approaching fast so I thought I'd give PostgreSQL users some inspiration when it comes to buying their Christmas presents. I assume that I am not the only one who has constant troubles finding the right present for somebody. The trouble is that prices are usually strange numbers. Maybe EUR 49.95 or EUR 49.99. This makes it very hard to look for something that costs, say, EUR 50.

Let us assume you want to buy something that costs around EUR 50. Of course, if there is nothing for EUR 49.95 or 48.99 - but given the fact that it is Christmas - you are also fine with EUR 54.99.

KNN comes to the rescue

Using the = operator is clearly not enough to fixing your problem. What you really want is some kind of fuzzy search. The trivial approach to this database problem is to just look for a range of values. In many cases this seems like a good idea but remember: If you happen to be a large website, looking for everything between EUR 40 and EUR 60 might yield hundreds of thousands of products. This is clearly not an option. All you want is a handful of suggestions, which are as close as possible to your desired target price.

Finding something close to what you are looking for is exactly what KNN search (K Nearest Neighbor Search) has been made for. It allows you to break the chains of the = operator and allows fuzzy search.

An example

To show you how you can use PostgreSQL to search for similar numbers, we can create a table containing some randomly generated entries:

test=# CREATE TABLE t_random AS SELECT (random() * 100) AS r
       FROM generate_series(1, 10000);
SELECT 10000

test=# SELECT * FROM t_random LIMIT 10;
         r
------------------
 58.10591513291
 22.2964704036713
 89.8202452808619
 54.2416563257575
 81.23788847588
 70.1738043688238
 7.34952366910875
 41.887197131291
 13.8173459097743
 7.20933876000345
(10 rows)

test=# CREATE TABLE t_random AS SELECT (random() * 100) AS r

FROM generate_series(1, 10000);

SELECT 10000

test=# SELECT * FROM t_random LIMIT 10;

------------------

58.10591513291

22.2964704036713

89.8202452808619

54.2416563257575

81.23788847588

70.1738043688238

7.34952366910875

41.887197131291

13.8173459097743

7.20933876000345

(10 rows)

The generate_series function is a nice and easy way to generate a list of values. As you can see, we have evenly distributed values between 0 and 100 – exactly what we need for our test.

The trivial approach: Ranges

As mentioned before, most people would go for the trivial approach here and just come up with a range. To do so we first create an index:

test=# CREATE INDEX idx_r ON t_random (r);
CREATE INDEX

1 2	test=# CREATE INDEX idx_r ON t_random (r); CREATE INDEX

Then we can just do the query:

test=# explain SELECT * FROM t_random WHERE r BETWEEN 45 AND 55;

                                   QUERY PLAN
-----------------------------------------------------------------------------------
 Bitmap Heap Scan on t_random (cost=21.99..81.24 rows=950 width=8)
    Recheck Cond: ((r >= 45::double precision) AND (r <= 55::double precision))
    -> Bitmap Index Scan on idx_r (cost=0.00..21.75 rows=950 width=0)
       Index Cond: ((r >= 45::double precision) AND (r <= 55::double precision))
(4 rows)

test=# explain SELECT * FROM t_random WHERE r BETWEEN 45 AND 55;

QUERY PLAN

-----------------------------------------------------------------------------------

Bitmap Heap Scan on t_random (cost=21.99..81.24 rows=950 width=8)

Recheck Cond: ((r >= 45::double precision) AND (r <= 55::double precision))

-> Bitmap Index Scan on idx_r (cost=0.00..21.75 rows=950 width=0)

Index Cond: ((r >= 45::double precision) AND (r <= 55::double precision))

(4 rows)

Just take a look at the estimates of the PostgreSQL optimizer: We are expected to need 950 rows for this query. This is not surprising as it is close to 10% of the data. Just imagine doing that on 1 mio rows – you had to read 100.000 rows to display just a handful of them. Clearly – performance will go down the drain in this case.

A second problem is: What if there are no products in the range of 45 – 55? What if the cheapest product is EUR 56?

Supersonic fast KNN search

KNN can come to the rescue here.

To use KNN we first have to install the extension containing some fancy operator classes:

test=# CREATE EXTENSION btree_gist;
CREATE EXTENSION

1 2	test=# CREATE EXTENSION btree_gist; CREATE EXTENSION

Then we can deploy a Gist index capable of doing KNN:

test=# CREATE INDEX idx_knn ON t_random USING gist(r);
CREATE INDEX

1 2	test=# CREATE INDEX idx_knn ON t_random USING gist(r); CREATE INDEX

The module will introduce the so called “distance operator” (<->). It measures the distance between the value in doubt and the target we are looking for. The beauty of the PostgreSQL KNN mechanism is that we can sort by this distance to fetch those rows in the order we desire:

test=# explain analyze SELECT *, 50 <-> r FROM t_random ORDER BY 50 <-> r LIMIT 10 ;

                                     QUERY PLAN
------------------------------------------------------------------------------------
 Limit (cost=0.00..0.63 rows=10 width=8) (actual time=0.264..0.284 rows=10 loops=1)
 -> Index Scan using idx_knn on t_random (cost=0.00..628.25 rows=10000 width=8) 
    (actual time=0.263..0.279 rows=10 loops=1)
    Order By: (r <-> 50::double precision)
Total runtime: 0.367 ms
(4 rows)

test=# explain analyze SELECT *, 50 <-> r FROM t_random ORDER BY 50 <-> r LIMIT 10 ;

QUERY PLAN

------------------------------------------------------------------------------------

Limit (cost=0.00..0.63 rows=10 width=8) (actual time=0.264..0.284 rows=10 loops=1)

-> Index Scan using idx_knn on t_random (cost=0.00..628.25 rows=10000 width=8)

(actual time=0.263..0.279 rows=10 loops=1)

Order By: (r <-> 50::double precision)

Total runtime: 0.367 ms

(4 rows)

Our query just takes a fraction of a millisecond. And more important: Runtimes are pretty stable – even if the number of rows grows.

On behalf of my entire team, we wish you all a Merry Christmas time and efficient Christmas shopping 😉

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.