August 2021

Index missing?

When an index is missing,
good performance won’t be kissing
a PostgreSQL user looking for efficiency
but instead feels like a legacy.

To satisfy a DBA’s desire and thirst,
let us load some data first.

pgbench is the tool of the day
but the next listing will explain that anyway:

[hs@fedora ~]$ pgbench -s 100 -i test
dropping old tables...
NOTICE: table 'pgbench_accounts' does not exist, skipping
NOTICE: table 'pgbench_branches' does not exist, skipping
NOTICE: table 'pgbench_history' does not exist, skipping
NOTICE: table 'pgbench_tellers' does not exist, skipping
creating tables...
generating data (client-side)...
10000000 of 10000000 tuples (100%) done (elapsed 9.96 s, remaining 0.00 s)
vacuuming...
creating primary keys...
done in 14.58 s (drop tables 0.00 s, create tables 0.07 s, client-side generate 10.03 s, vacuum 
1.68 s, primary keys 2.80 s).

[hs@fedora ~]$ pgbench -s 100 -i test

dropping old tables...

NOTICE: table 'pgbench_accounts' does not exist, skipping

NOTICE: table 'pgbench_branches' does not exist, skipping

NOTICE: table 'pgbench_history' does not exist, skipping

NOTICE: table 'pgbench_tellers' does not exist, skipping

creating tables...

generating data (client-side)...

10000000 of 10000000 tuples (100%) done (elapsed 9.96 s, remaining 0.00 s)

vacuuming...

creating primary keys...

done in 14.58 s (drop tables 0.00 s, create tables 0.07 s, client-side generate 10.03 s, vacuum

1.68 s, primary keys 2.80 s).

Loading millions of rows into PostgreSQL is not hard,
that is not the tricky part.
Data so quickly created,
is usually not hated.

10 seconds for 10 million rows,
the DBA in charge bows.
Performance will be good
ensuring all users’ good mood.

To celebrate the success
let the database run again and confess.

This time we are looking for fast reads,
an important thing when aiming for high speeds:

[hs@fedora ~]$ pgbench -S -c 10 -T 30 -j 10 test
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 10
duration: 30 s
number of transactions actually processed: 3054408
latency average = 0.098 ms
initial connection time = 13.872 ms
tps = 101859.582811 (without initial connection time)

[hs@fedora ~]$ pgbench -S -c 10 -T 30 -j 10 test

starting vacuum...end.

transaction type:

scaling factor: 100

query mode: simple

number of clients: 10

number of threads: 10

duration: 30 s

number of transactions actually processed: 3054408

latency average = 0.098 ms

initial connection time = 13.872 ms

tps = 101859.582811 (without initial connection time)

pgbench is the name of the game.
My slow box, 101 thousand transactions, insane.
30 seconds to execute the test,
10 concurrent connections that did not rest.
Even 10 threads for the client code,
running in super quick mode.

However, only indexes kept us happy and fine,
bringing good performance, almost divine.
Without indexing life is harsh to the one that suffers,
just like misconfigured shared_buffers.

Our holy lord the superior b-tree.
without it bad performance we see.
A single missing index for a test,
the entire database feels like it is at rest.

Do you want to know why?
Let us give it a try:

test=# d+
                                      List of relations
 Schema | Name             | Type  | Owner | Persistence | Access method | Size    | Description 
--------+------------------+-------+-------+-------------+---------------+---------+-------------
 public | pgbench_accounts | table | hs    | permanent   | heap          | 1281 MB | 
 public | pgbench_branches | table | hs    | permanent   | heap          | 40 kB   | 
 public | pgbench_history  | table | hs    | permanent   | heap          | 0 bytes | 
 public | pgbench_tellers  | table | hs    | permanent   | heap          | 80 kB   | 
(4 rows)

test=# d pgbench_accounts
              Table 'public.pgbench_accounts'
 Column   | Type          | Collation | Nullable | Default 
----------+---------------+-----------+----------+---------
 aid      | integer       |           | not null | 
 bid      | integer       |           |          | 
 abalance | integer       |           |          | 
 filler   | character(84) |           |          | 
Indexes:
    'pgbench_accounts_pkey' PRIMARY KEY, btree (aid)

test=# d+

List of relations

--------+------------------+-------+-------+-------------+---------------+---------+-------------

(4 rows)

test=# d pgbench_accounts

Table 'public.pgbench_accounts'

Column | Type | Collation | Nullable | Default

----------+---------------+-----------+----------+---------

aid | integer | | not null |

bid | integer | | |

abalance | integer | | |

filler | character(84) | | |

Indexes:

'pgbench_accounts_pkey' PRIMARY KEY, btree (aid)

Killing an index, only one …
All the performance will be gone:

test=# ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;
ALTER TABLE
test=# d pgbench_accounts
Table 'public.pgbench_accounts'
 Column   | Type          | Collation | Nullable | Default 
----------+---------------+-----------+----------+---------
 aid      | integer       |           | not null | 
 bid      | integer       |           |          | 
 abalance | integer       |           |          | 
 filler   | character(84) |           |

test=# ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;

ALTER TABLE

test=# d pgbench_accounts

Table 'public.pgbench_accounts'

Column | Type | Collation | Nullable | Default

----------+---------------+-----------+----------+---------

aid | integer | | not null |

bid | integer | | |

abalance | integer | | |

filler | character(84) | |

Let me run the test again.
Good performance? Back then?

Good lord? What happened to my data?
Will speed be back later?

[hs@fedora ~]$ pgbench -S -c 10 -T 30 -j 10 test
pgbench (14beta2)
starting vacuum...end.
transaction type: 
scaling factor: 100
query mode: simple
number of clients: 10
number of threads: 10
duration: 30 s
number of transactions actually processed: 259
latency average = 1189.653 ms
initial connection time = 5.727 ms
tps = 8.405815 (without initial connection time)

[hs@fedora ~]$ pgbench -S -c 10 -T 30 -j 10 test

pgbench (14beta2)

starting vacuum...end.

transaction type:

scaling factor: 100

query mode: simple

number of clients: 10

number of threads: 10

duration: 30 s

number of transactions actually processed: 259

latency average = 1189.653 ms

initial connection time = 5.727 ms

tps = 8.405815 (without initial connection time)

8 transactions per second will reveal
that bad performance is hard to conceal.
A single index is broken,
all end users have been walking.
Calling support hotlines like crazy,
No more chances for DBAs to be lazy.

Conclusion

The conclusion of the day.
Make sure no important indexes go away.
It is a really important affair
to index data with care.
Don’t forget a single thing
and performance will be king.

I hope this little poem makes people happy.
Make sure your databases are not crappy.

Creating auto increment columns in PostgreSQL is easy. Simply use two pseudo data types serial and serial8, respectively, then PostgreSQL will automatically take care of your auto increment columns. However, once in a while problems can still occur. Let us take a look and see.

Sequences: Avoid manual values

To understand the underlying problem, one has to understand how auto increment and sequences work in PostgreSQL:

test=# CREATE TABLE t_data (
    id          serial     PRIMARY KEY, 
    payload     text
);
CREATE TABLE
test=# INSERT INTO t_data (payload) VALUES ('foo') RETURNING *;
 id | payload
----+---------
  1 | foo
(1 row)

INSERT 0 1
test=# INSERT INTO t_data (payload) VALUES ('bar') RETURNING *;
 id | payload
----+---------
  2 | bar
(1 row)

INSERT 0 1

test=# CREATE TABLE t_data (

id serial PRIMARY KEY,

payload text

);

CREATE TABLE

test=# INSERT INTO t_data (payload) VALUES ('foo') RETURNING *;

id | payload

----+---------

1 | foo

(1 row)

INSERT 0 1

test=# INSERT INTO t_data (payload) VALUES ('bar') RETURNING *;

id | payload

----+---------

2 | bar

(1 row)

INSERT 0 1

In this case, everything is fine. But what happens if a manual value is added? Note that PostgreSQL does allow this. Other databases, such as Informix, explicitly prohibit it. However, in PostgreSQL an autoincrement value is really just a sequence which adds a default value. Nothing stops you from breaking things:

test=# INSERT INTO t_data VALUES (3, 'bang') RETURNING *;
 id | payload
----+---------
  3 | bang
(1 row)

INSERT 0 1
test=# INSERT INTO t_data (payload) VALUES ('boom') RETURNING *;
ERROR: duplicate key value violates unique constraint 't_data_pkey'
DETAIL: Key (id)=(3) already exists.

test=# INSERT INTO t_data VALUES (3, 'bang') RETURNING *;

id | payload

----+---------

3 | bang

(1 row)

INSERT 0 1

test=# INSERT INTO t_data (payload) VALUES ('boom') RETURNING *;

ERROR: duplicate key value violates unique constraint 't_data_pkey'

DETAIL: Key (id)=(3) already exists.

The trouble is that the manual insertion does not increment the sequence. Thus, the second INSERT statement is going to fail because the value is already there. This often happens when people import data into the database.

So the main question is: How can we fix these sequences? The answer is: Use pg_sequence_fixer.

Using pg_sequence_fixer

The idea behind pg_sequence_fixer is to have a tool that can easily fix these things with one simple stored procedure call. It automatically looks for all sequences associated with a column, finds the max value, and pumps it up to what is needed.

First clone the repository which can be found here: https://github.com/cybertec-postgresql/pg_sequence_fixer

Then simply install the extension:

iMac:pg_sequence_fixer hs$ make install
/bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d 
'/usr/local/share/postgresql/extension'
/bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d 
'/usr/local/share/postgresql/extension'
/bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d 
'/usr/local/Cellar/postgresql/13.1/share/doc/postgresql/extension'
/usr/bin/install -c -m 644 .//pg_sequence_fixer.control 
'/usr/local/share/postgresql/extension/'
/usr/bin/install -c -m 644 .//pg_sequence_fixer--*.sql 
'/usr/local/share/postgresql/extension/'
/usr/bin/install -c -m 644 .//README.md 
'/usr/local/Cellar/postgresql/13.1/share/doc/postgresql/extension/'

iMac:pg_sequence_fixer hs$ make install

/bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d

'/usr/local/share/postgresql/extension'

/bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d

'/usr/local/share/postgresql/extension'

/bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d

'/usr/local/Cellar/postgresql/13.1/share/doc/postgresql/extension'

/usr/bin/install -c -m 644 .//pg_sequence_fixer.control

'/usr/local/share/postgresql/extension/'

/usr/bin/install -c -m 644 .//pg_sequence_fixer--*.sql

'/usr/local/share/postgresql/extension/'

/usr/bin/install -c -m 644 .//README.md

'/usr/local/Cellar/postgresql/13.1/share/doc/postgresql/extension/'

The extension is now available and one can enable it:

test=# CREATE EXTENSION pg_sequence_fixer;
CREATE EXTENSION

1 2	test=# CREATE EXTENSION pg_sequence_fixer; CREATE EXTENSION

Finally, you can run the procedure:

test=# SELECT pg_sequence_fixer(1000, false);
NOTICE: setting sequence for t_data to 1003
 pg_sequence_fixer
-------------------

(1 row)

test=# SELECT pg_sequence_fixer(1000, false);

NOTICE: setting sequence for t_data to 1003

pg_sequence_fixer

-------------------

(1 row)

The first parameter (= 1000) means that we want to add a safety margin of 1000 to the max value determined by the system. The second parameter controls whether we want to lock those tables now or during the operation. Usually, locks are not desirable, but sometimes they are necessary in case you cannot take down the application in which you are fixing your sequences.

Note that for each sequence fixed, the tool will tell PostgreSQL to issue one log entry (= NOTICE).

Voila, your database is sound and healthy again.

Finally...

Fixing sequences is an important issue you can achieve easily using pg_sequence_fixer. Also: We want to point out that this is just the first incarnation of this tool, so if you are willing to contribute - feel free to submit changes.

If you want to learn more about PostgreSQL right now, consider checking out my post about parallel CREATE INDEX in PostgreSQL.

Our team is proud to introduce a new major pg_timetable v4 release!

This time a huge amount of work has been done and the pg_timetable v4 release is backward-incompatible with previous versions. However, job migration is straightforward and shouldn't stop anyone from updating.

We've added detailed Read the Docs documentation and will enhance it further. Let us know what areas should be highlighted or what topics we should add.

Now, let's check out some of the great new features! You can review the official release page for a complete changelog with links to issues, commits, and pull requests.

I want to remind you that pg_timetable is a community project. So, please, don't hesitate to ask any questions , to report bugs, to star pg_timetable project, and to tell the world about it.

pg_timetable v4 vs v3

The previous release is still considered stable. There are no confirmed bugs nor critical performance issues and is absolutely safe to stick with it.

However, it won't be backpatched if new bugs are discovered. In addition, new features won't be implemented for the v3 branch.

Performance

To increase performance, we worked on several fronts.

First of all, we reworked the database schema and simplified it, as much as possible. This way new pg_timetable can execute fewer system queries leaving more bandwidth and connections to the real workers.

Secondly, we rewrote cron-related functions in pure SQL instead of PL/PgSQL. It allows us not only to increase performance but also to get rid of tedious debugging.

And the third pillar is the comprehensive logging system. See details below in the separate section.

Configuration File Support

As soon as the functionality grows, it becomes error-prone and tedious to handle configuration using only command-line parameters and environmental variables.

Rules of common sense suggest to use configuration files, in this case. We've added a detailed self-explanatory config.example.yaml file for you.

Of course, you are free to use a custom set of options or not to use it at all. For example, it seems like a good idea to provide general options in the configuration file and connection options using environmental variables, for the sake of security.

CopyFromFile built-in task

One of the most frequent tasks for database schedulers is to import some data from external sources to the database, e.g. sensors, logs, official statistics, etc. The CSV format is standard de facto for such kinds of data. PostgreSQL can use the COPY command to import CSV files. But to use this function, the file should be placed in the server's file system, so the PostgreSQL process can access it. That is not always the case for production environments, especially in the cloud era.

To overcome this limitation in the previous version of pg_timetable, we suggest using psql copy command, which performs a frontend (client) copy. This is an operation that runs an SQL COPY command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.

From now on you don't need any additional programs to perform client copy. One can use the new built-in CopyFromFile task. You may find the full example in the manual. Below you will notice that the task accepts JSON parameters in the form of an object, e.g.

{
    'sql': 'COPY location FROM STDIN WITH DELIMITER '|' NULL '<NULL>'',
    'filename': 'download/orte_ansi.txt'
}

{

'sql': 'COPY location FROM STDIN WITH DELIMITER '|' NULL '<NULL>'',

'filename': 'download/orte_ansi.txt'

}

Here sql is the correct COPY .. FROM STDIN command. One can use any of the standard clauses controlling file specification, e.g. file encoding, condition for insert, escaping, quoting, etc.
And the filename is the correct path (absolute or relative to pg_timetable) of the source CSV file.

This task perfectly works for all formats supported by PostgreSQL: text, csv and binary.

Enhanced logging

One of the key architectural decisions made for pg_timetable was the database-driven concept. Meaning scheduler sessions are fully controlled by the database and every piece of the output is stored back in the database for further analysis.

File logs

That said, people were using output redirection, if they wanted to have logs stored in files, e.g.

pg_timetable -c loader -u scheduler -d timetable > pgtt.log

1	pg_timetable -c loader -u scheduler -d timetable > pgtt.log

The drawback of this approach is that information is stored as plain text, making it hard for machine processing. In the new version, we have introduced the ability to duplicate logs to a file system using --log-file command line parameter. Moreover, the user can choose the exact format of the underlying file specifying --log-file-format: json or text.

Database log performance

Another bottleneck we were trying to fix was the database logging. Heavy clients were producing a lot of information to be stored by separate INSERT statements, causing intensive use of connections. We have come up with an outstanding log hook implementation for the famous logrus package.

The key benefits are:

Usage of the COPY machinery;
Cache controlled by the time and the number of entries to be logged;
Ability to skip entry logging in case of extremely high load.

I personally highly encourage you to use our log hook in your projects. Let us know if you think we should make it a standalone project.

Tasks output

pg_timetable has been storing output for program tasks from the very beginning. It's useful for debugging and back-in-time incident analysis. We never thought this feature could be applied to the built-in SQL tasks until Nikolay Samokhvalov proposed it. So now one can have SQL task with SELECT statement to determine which output will be saved upon execution, e.g.

my_database=> timetable=> SELECT timetable.add_job('log-sessions', '* * * * *', 
'SELECT * FROM timetable.active_session');
 add_job
---------
       1
(1 row)

...

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;
 chain_id |           finished            |  output
----------+-------------------------------+----------
        1 | 2021-08-12 15:40:46.064795+02 | SELECT 2
(1 row)

my_database=> timetable=> SELECT timetable.add_job('log-sessions', '* * * * *',

'SELECT * FROM timetable.active_session');

add_job

---------

(1 row)

...

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;

chain_id | finished | output

----------+-------------------------------+----------

1 | 2021-08-12 15:40:46.064795+02 | SELECT 2

(1 row)

The output of the SQL task is the tag of the command executed and the number of rows returned or affected. We are not storing the result set since commands can be arbitrarily complicated or rather huge. Users should take care of this and explicitly store anything important, e.g.

my_database=> CREATE TABLE history_session (
ts timestamptz DEFAULT now(), 
LIKE timetable.active_session);
CREATE TABLE

my_database=> SELECT timetable.add_job('dump-sessions', '* * * * *', 
'INSERT INTO history_session SELECT now(), * FROM timetable.active_session');
 add_job
---------
       3
(1 row)

...

my_database=> SELECT * FROM history_session;
              ts               | client_pid | client_name | server_pid
-------------------------------+------------+-------------+------------
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      15676
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      16436
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      14892
 2021-08-12 15:52:02.378981+02 |      10440 | loader-foo  |      25480
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      21876
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      11852
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      10052
 2021-08-12 15:53:02.446911+02 |      19664 | writer-boo  |      27396
(8 rows)

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;
 chain_id |           finished            |   output
----------+-------------------------------+------------
        1 | 2021-08-12 15:40:46.064795+02 | SELECT 2
        1 | 2021-08-12 15:52:02.380711+02 | SELECT 4
        3 | 2021-08-12 15:52:02.382296+02 | INSERT 0 4
        1 | 2021-08-12 15:52:07.34872+02  | SELECT 3
        3 | 2021-08-12 15:52:07.448914+02 | INSERT 0 4
(5 rows)

my_database=> CREATE TABLE history_session (

ts timestamptz DEFAULT now(),

LIKE timetable.active_session);

CREATE TABLE

my_database=> SELECT timetable.add_job('dump-sessions', '* * * * *',

'INSERT INTO history_session SELECT now(), * FROM timetable.active_session');

add_job

---------

(1 row)

...

my_database=> SELECT * FROM history_session;

ts | client_pid | client_name | server_pid

-------------------------------+------------+-------------+------------

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 15676

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 16436

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 14892

2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 25480

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 21876

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 11852

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 10052

2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 27396

(8 rows)

my_database=> SELECT chain_id, finished, output FROM timetable.execution_log;

chain_id | finished | output

----------+-------------------------------+------------

1 | 2021-08-12 15:40:46.064795+02 | SELECT 2

1 | 2021-08-12 15:52:02.380711+02 | SELECT 4

3 | 2021-08-12 15:52:02.382296+02 | INSERT 0 4

1 | 2021-08-12 15:52:07.34872+02 | SELECT 3

3 | 2021-08-12 15:52:07.448914+02 | INSERT 0 4

(5 rows)

Resource Options

We want our scheduler to be as flexible as possible. That's why we have introduced the options category for resource management:

--cron-workers: Number of parallel workers for scheduled chains (default: 16);
--interval-workers: Number of parallel workers for interval chains (default: 16);
--chain-timeout: Abort any chain that takes more than the specified number of milliseconds;
--task-timeout: Abort any task within a chain that takes more than the specified number
of milliseconds.

Finally

This was the first in a series of posts dedicated to the new pg_timetable v4 features. Stay tuned for the coolest features to be highlighted.

You can find previous publications following the tag pg_timetable.

Stay safe, healthy, and wealthy!
Be happy! Peace! Love! ❤

by Kaarel Moppel

When I recently stumbled on an article comparing some main properties, and also the read-write performance of two very popular pieces of caching software, often used to speed up applications heavy on database queries, I immediately felt an itch - how would it actually look for PostgreSQL vs Redis vs Memcached on the performance side? Meaning, if one would just skip the cache and hit the database directly.

Especially after the first look, I wasn’t actually too impressed with the numbers presented for Redis and Memcache. It could have also been due to the fact that the actual test data, extracted from a linked paper, was from 2016 - that’s surely ages in “internet years”. Either way, I was curious to get some kind of a comparison point and quickly threw together a small benchmarking script in Python and let my workstation get to work.

Why Python you might ask, if the original test used Java? Well, Java is just one of my least favorite languages for database projects and I somehow have the impression that Python is very popular both for the Web and also ad-hoc database scripting, so probably a good fit here - might be wrong of course on that.

Test setup

Some characteristics on my test setup:

AMD Ryzen 3600 CPU set to performance mode. My test script is synchronous in nature i.e. basically using a single CPU core, so speed probably matters a lot.
Ubuntu 20.04 desktop as OS, swap disabled.
PostgreSQL v13.3, i.e. the latest.
Default PostgreSQL configuration except “shared_buffers=512MB”, “track_io_timing=on”, “shared_preload_libraries=’pg_stat_statements’”.
A single NVMe SSD both for system and database.
Reading/writing to Postgres was done using prepared statements to get best query performance, avoiding parsing on subsequent calls.
Size of the dataset fully fitted into the PostgreSQL managed cache (shared buffers).
Asynchronous mode was used for writing data, meaning a slight loss of most recent data is possible in case of a server crash/reboot, as a trade-off for more writing (especially in small transactions) performance. But this is consistent with how both Redis and Memcached by default handle their writes. With Redis though, one can also enable AOF persistence to get PostgreSQL default behaviour.
Both reading/writing tests go through all the inserted randomly generated floating-point data keys from 1 to $rows one-by-one. The original test didn’t mention the exact access pattern though.
The key-value table was fetched into Postgres cache before the read test so that there would be no cold-cache effect as with Redis/Memcached.
I did not measure the memory usage of Postgres as this is expected to be very constant for this use case after all data has been inserted and cached.
Execution time is recorded on the application side with calls to the system clock to match the original test. Note that I decided to measure execution time from the server-side also to get a better understanding of the amount of “waste” as after all Python is not really optimal for performance testing for many reasons that I don’t want to delve into here. And jumping ahead a bit - the results were pretty horrifying indeed and one should rather not use Python for benchmarking databases - a lot of CPU time just disappeared somewhere!
Full test script is available here. Should take around 10-15min to run. Basically something like this:

CREATE UNLOGGED TABLE kv_test(key text, value int);
CREATE INDEX ON kv_test (key);

-- pseudo-code from the Python script
for $ROWS in [1000, 10000, 100000, 1000000]:
  truncate kv_test
  generate $ROWS random values
  for $i := 1 .. $ROWS:
    insert $key into kv_test ($rand[i], $rand[i])
  vacuum analyze the table
  for $i := 1 .. $ROWS:
    select * from kv_test where key = $rand[i]

CREATE UNLOGGED TABLE kv_test(key text, value int);

CREATE INDEX ON kv_test (key);

-- pseudo-code from the Python script

for $ROWS in [1000, 10000, 100000, 1000000]:

truncate kv_test

generate $ROWS random values

for $i := 1 .. $ROWS:

insert $key into kv_test ($rand[i], $rand[i])

vacuum analyze the table

for $i := 1 .. $ROWS:

select * from kv_test where key = $rand[i]

PostgreSQL vs Redis vs Memcached: Write Operation

Represented in a similar way to the original numbers. The calculated time to write key-value pairs is in milliseconds.

	NUMBER OF RECORDS
Database	1,000	10,000	100,000	1,000,000
Redis (v3.0.7)	34	214	1,666	14,638
Memcached (v1.4.14)	23	100	276	2,813
PostgreSQL (v13.3)	29.6	304	2,888	31,230

PostgreSQL vs Redis vs Memcached: Read Operation

The calculated time to read key-value pairs (ms).

	NUMBER OF RECORDS
Database	1,000	10,000	100,000	1,000,000
Redis (v3.0.7)	8	6	8	8
Memcached (v1.4.14)	9	14	14	30
PostgreSQL (v13.3)	0.026	0.028	0.027	0.029

Conclusion

In short - the numbers looked surprisingly/suspiciously good for the reading test for Postgres! I can’t imagine how the original test managed to get such high single-digit millisecond results for random key reads across the whole dataset. For my test, I only managed to see 1ms+ worst cases for the biggest rowcount. This data can, by the way, be also looked up on the “pg_stat_statements” snapshots table called “results”.

So sadly, I still cannot possibly get too ecstatic as there was a lot of information missing on the details of exactly how the original tests were performed, so it might have been a bit of an “apples to oranges” situation still in the end, I’m afraid. The average key readout times for Redis/Memcached seemed just way too slow in comparison to Postgres. I suspect they used a remote machine still for the cache, although the paper didn’t mention it and talked about a single Core i7 node.

But about the writing speed of key-value data - well: Postgres doesn’t really compete on higher row counts. ???? But this was also more or less expected! Why? A full-blown relational database engine like Postgres goes to great lengths to ensure we cannot insert invalid data violating some constraints, plus the WAL writing (minimal though for unlogged tables but still) and the on-disk-format overhead - internal columns, alignment, some index bloat on page splits, etc. That all amplifies writes a lot! So basically, I think the results still aren’t bad in the end. Only ca 2x slower than Redis for 100k and 1M row counts. The main idea of caches is that they’re only useful if we read much more from them compared to writing/updating anyways!

But be it how it is with the comparison to the other DBs in absolute numbers, it was good to see that the relative stability of Postgres responses to growing datasets was very-very good! And even beating Memcached which deteriorated 2x on 1M rows! This all probably shows that the selected algorithms for PostgreSQL are mathematically sound and well implemented!

But what can be definitely said - Postgres performance is definitely at least good enough for a “caching use case” for some smaller amount of rows. So with your next project it might be worth asking, do you really need another external component?

Or maybe just:

add some more memory/CPU to the DB and hammer it harder - pretty safe to do for reads actually;
use your HA replicas for load balancing and achieve better total resource utilization, and enjoy the benefits of not having to change the mental model when dealing with relational data on the “real persistence” level anyways.

Other thoughts/notes

Once again - not to forget: the caching software used was by now 5 years old! But at the same time, I also left a bit on the table still even for Postgres - for example, one could probably even get a slightly better read performance when using the relatively newish (v11+) “covering” indexes feature or maybe even with hash indexes.
The thought-provoking article states it actually quite well: “A performance comparison between in-memory key-value data stores is more of an intellectual exercise than of any practical importance - unless you are deploying systems at such scale that this becomes interesting as a cost-saving measure. This is because such stores are IO bound and usually the network latency might play a bigger role in application perceived latency than the database latency.”. So in short - don’t think too much about that milli- or microsecond cache response time with a local test. Real-life network latencies can and will eat that mostly away!
Note that the difference between query execution times measured from the “app side” vs from the “DB side” is huge, up to 10x for worst cases with a little number of rows! But again as the previous chapter states - for real-life database applications, the choice of languages usually doesn’t matter performance-wise.
The maximum tested rowcount of 1 million is actually too small in today’s context I think - even in the case of Postgres with its relatively large disk format, the data together with the index is only about 128MB. I suspect an average smartwatch even holds more data nowadays.
Note though that Redis is certainly much more than a simple volatile cache as the linked article also mentions. However, in practice, it is primarily used as a key-value store... which also Postgres can be, being a general-purpose management system not particularly tuned for a specific workload.
Aspect of cost - provisioning a capable and modern DB server with lots of memory usually costs much more than only a plain “discardable”, albeit optionally memory optimized, compute node. So using a cache might still be a perfectly good idea.
Dedicated caches will always perform better if you need to hammer them really hard - as there is usually no permissions checking, sessions, snapshots, complex locking scenarios, etc.
Note that in some cases one can also use totally transparent PostgreSQL caching! It’s sadly not a built-in feature of Postgres, but for example, one of the more popular connection pooling software pieces called pgpool2 provides it as an extra feature, so might be worth checking out.

PostgreSQL is one of the best OLTP databases (OLTP = online transaction processing) in the world. However, it can do more than just OLTP. PostgreSQL offers many additional features relevant to a more OLAP-style workload. One of those features is called “GROUPING SETS”.

Before we dive into the details, I've compiled some sample data which you can easily load into your SQL database:

CREATE TABLE t_sales
(
    country        text,
    product_name   text,
    year           int,
    amount_sold    numeric
);

INSERT INTO t_sales VALUES
    ('Argentina', 'Shoes', 2020, 12),
    ('Argentina', 'Shoes', 2021, 14),
    ('Argentina', 'Hats', 2020, 54),
    ('Argentina', 'Hats', 2021, 57),
    ('Germany', 'Shoes', 2020, 34),
    ('Germany', 'Shoes', 2021, 29),
    ('Germany', 'Hats', 2020, 19),
    ('Germany', 'Hats', 2021, 22),
    ('USA', 'Shoes', 2020, 99),
    ('USA', 'Shoes', 2021, 103),
    ('USA', 'Hats', 2020, 81),
    ('USA', 'Hats', 2021, 90)
;

CREATE TABLE t_sales

(

country text,

product_name text,

year int,

amount_sold numeric

);

INSERT INTO t_sales VALUES

('Argentina', 'Shoes', 2020, 12),

('Argentina', 'Shoes', 2021, 14),

('Argentina', 'Hats', 2020, 54),

('Argentina', 'Hats', 2021, 57),

('Germany', 'Shoes', 2020, 34),

('Germany', 'Shoes', 2021, 29),

('Germany', 'Hats', 2020, 19),

('Germany', 'Hats', 2021, 22),

('USA', 'Shoes', 2020, 99),

('USA', 'Shoes', 2021, 103),

('USA', 'Hats', 2020, 81),

('USA', 'Hats', 2021, 90)

;

Note that everything you are going to see in this blog is pretty SQL-standard compliant, so you can expect most of the stuff to work in other professional SQL databases as well.

Let's get started with a simple aggregation:

test=# SELECT  country, sum(amount_sold)
       FROM    t_sales
       GROUP BY 1;
 country   | sum
-----------+-----
 USA       | 373
 Germany   | 104
 Argentina | 137
(3 rows)

test=# SELECT country, sum(amount_sold)

FROM t_sales

GROUP BY 1;

country | sum

-----------+-----

USA | 373

Germany | 104

Argentina | 137

(3 rows)

There's not much to say here, apart from the fact that we will get one sum for each group. However, there's a bit of a philosophical discussion going on. “GROUP BY 1” basically means “GROUP BY country” which is the equivalent of the first column in the SELECT clause. Therefore “GROUP BY country” and “GROUP BY 1” are the same thing:

test=# SELECT  country, product_name, sum(amount_sold)
       FROM    t_sales
       GROUP BY 1, 2
       ORDER BY 1, 2;
 country   | product_name | sum
-----------+--------------+-----
 Argentina | Hats         | 111
 Argentina | Shoes        |  26
 Germany   | Hats         |  41
 Germany   | Shoes        |  63
 USA       | Hats         | 171
 USA       | Shoes        | 202
(6 rows)

test=# SELECT country, product_name, sum(amount_sold)

FROM t_sales

GROUP BY 1, 2

ORDER BY 1, 2;

country | product_name | sum

-----------+--------------+-----

Argentina | Hats | 111

Argentina | Shoes | 26

Germany | Hats | 41

Germany | Shoes | 63

USA | Hats | 171

USA | Shoes | 202

(6 rows)

Of course, this works with more than one column as well. However, I want to point out something else. Consider the following example:

test=# SELECT CASE WHEN country = 'USA'
                THEN 'USA'
                ELSE 'non-US'
              END,
              sum(amount_sold)
       FROM t_sales
       GROUP BY 1;
 case   | sum
--------+-----
 USA    | 373
 non-US | 241
(2 rows)

test=# SELECT CASE WHEN country = 'USA'

THEN 'USA'

ELSE 'non-US'

END,

sum(amount_sold)

FROM t_sales

GROUP BY 1;

case | sum

--------+-----

USA | 373

non-US | 241

(2 rows)

Most people group by a column. In some cases, it can make sense to group by an expression. In my case, we are forming groups on the fly (= one group for the US and one for non-US sales). This feature is often underappreciated. However, it is useful in many real-world scenarios. Keep in mind that all the things you are going to see also work with expressions, meaning more flexible grouping is possible.

GROUPING SETS: The basic building blocks

GROUP BY will turn every distinct entry in a column into a group. Sometimes you might want to do more grouping at once. Why is that necessary? Suppose you are processing a 10 TB table. Clearly, reading this data is usually the limiting factor in terms of performance. So reading the data once and producing more results at once is appealing. That's exactly what you can do with GROUP BY GROUP SETS. Suppose we want to produce two results at once:

GROUP BY country
GROUP BY product_name

Here's how it works:

test=# SELECT  country, product_name, sum(amount_sold)
       FROM    t_sales
       GROUP BY GROUPING SETS ((1), (2))
       ORDER BY 1, 2;
 country   | product_name | sum
-----------+--------------+-----
 Argentina |              | 137
 Germany   |              | 104
 USA       |              | 373
           | Hats         | 323
           | Shoes        | 291
(5 rows)

test=# SELECT country, product_name, sum(amount_sold)

FROM t_sales

GROUP BY GROUPING SETS ((1), (2))

ORDER BY 1, 2;

country | product_name | sum

-----------+--------------+-----

Argentina | | 137

Germany | | 104

USA | | 373

| Hats | 323

| Shoes | 291

(5 rows)

In this case, PostgreSQL simply appends the results. The first three lines represent “GROUP BY country”. The next two lines contain the result of “GROUP BY product_name”. Logically, it's the equivalent of the following query:

test=# SELECT  NULL AS country , product_name, sum(amount_sold)
       FROM    t_sales
       GROUP BY  1, 2
       UNION ALL
       SELECT  country, NULL, sum(amount_sold)
       FROM    t_sales
       GROUP BY  1, 2
       ORDER BY 1, 2;
 country   | product_name | sum
-----------+--------------+-----
 Argentina |              | 137
 Germany   |              | 104
 USA       |              | 373
           | Hats         | 323
           | Shoes        | 291
(5 rows)

test=# SELECT NULL AS country , product_name, sum(amount_sold)

FROM t_sales

GROUP BY 1, 2

UNION ALL

SELECT country, NULL, sum(amount_sold)

FROM t_sales

GROUP BY 1, 2

ORDER BY 1, 2;

country | product_name | sum

-----------+--------------+-----

Argentina | | 137

Germany | | 104

USA | | 373

| Hats | 323

| Shoes | 291

(5 rows)

However, the GROUPING SETS version is ways more efficient because it only has to read the data once.

ROLLUP: Adding the “bottom line”

When creating reports, you will often need the “bottom line” which sums up what has been shown in the table. The way to do that in SQL is to use “GROUP BY ROLLUP”:

test=# SELECT  country, product_name, sum(amount_sold)
       FROM    t_sales
       GROUP BY ROLLUP (1, 2)
       ORDER BY 1, 2;
 country   | product_name | sum
-----------+--------------+-----
 Argentina | Hats         | 111
 Argentina | Shoes        |  26
 Argentina |              | 137
 Germany   | Hats         |  41
 Germany   | Shoes        |  63
 Germany   |              | 104
 USA       | Hats         | 171
 USA       | Shoes        | 202
 USA       |              | 373
           |              | 614
(10 rows)

test=# SELECT country, product_name, sum(amount_sold)

FROM t_sales

GROUP BY ROLLUP (1, 2)

ORDER BY 1, 2;

country | product_name | sum

-----------+--------------+-----

Argentina | Hats | 111

Argentina | Shoes | 26

Argentina | | 137

Germany | Hats | 41

Germany | Shoes | 63

Germany | | 104

USA | Hats | 171

USA | Shoes | 202

USA | | 373

| | 614

(10 rows)

PostgreSQL will inject a couple of rows into the result. As you can see, “Argentina” returns 3 and not just 2 rows. The “product_name = NULL” entry was added by ROLLUP. It contains the sum of all argentinian sales (116 + 27 = 137). Additional rows are injected for both other countries. Finally, a row is added for the overall sales worldwide.

Often those NULL entries are not what people want to see, thus it can make sense to replace them with some other kind of entry. The way to do that is to use a subselect which checks for the NULL entry and does the replacement. Here's how it works:

test=# SELECT   CASE WHEN country IS NULL
                      THEN 'TOTAL' ELSE country END,
                CASE WHEN product_name IS NULL
                      THEN 'TOTAL' ELSE product_name END,
                sum
       FROM (SELECT    country, product_name, sum(amount_sold)
             FROM      t_sales
             GROUP BY ROLLUP (1, 2)
             ORDER BY 1, 2
           ) AS x;
 country   | product_name | sum
-----------+--------------+-----
 Argentina | Hats         | 111
 Argentina | Shoes        |  26
 Argentina | TOTAL        | 137
 Germany   | Hats         |  41
 Germany   | Shoes        |  63
 Germany   | TOTAL        | 104
 USA       | Hats         | 171
 USA       | Shoes        | 202
 USA       | TOTAL        | 373
 TOTAL     | TOTAL        | 614
(10 rows)

test=# SELECT CASE WHEN country IS NULL

THEN 'TOTAL' ELSE country END,

CASE WHEN product_name IS NULL

THEN 'TOTAL' ELSE product_name END,

sum

FROM (SELECT country, product_name, sum(amount_sold)

FROM t_sales

GROUP BY ROLLUP (1, 2)

ORDER BY 1, 2

) AS x;

country | product_name | sum

-----------+--------------+-----

Argentina | Hats | 111

Argentina | Shoes | 26

Argentina | TOTAL | 137

Germany | Hats | 41

Germany | Shoes | 63

Germany | TOTAL | 104

USA | Hats | 171

USA | Shoes | 202

USA | TOTAL | 373

TOTAL | TOTAL | 614

(10 rows)

As you can see, all NULL entries have been replaced with “TOTAL”, which in many cases is the more desirable way to display this data.

CUBE: Creating data cubes in PostgreSQL efficiently

ROLLUP is useful if you want to add the “bottom line”. However, you often want to see all combinations of countries and products. GROUP BY CUBE will do exactly that:

test=# SELECT  country, product_name, sum(amount_sold)
       FROM    t_sales
       GROUP BY CUBE (1, 2)
       ORDER BY 1, 2;
 country   | product_name | sum
-----------+--------------+-----
 Argentina | Hats         | 111
 Argentina | Shoes        |  26
 Argentina |              | 137
 Germany   | Hats         |  41
 Germany   | Shoes        |  63
 Germany   |              | 104
 USA       | Hats         | 171
 USA       | Shoes        | 202
 USA       |              | 373
           | Hats         | 323
           | Shoes        | 291
           |              | 614
(12 rows)

test=# SELECT country, product_name, sum(amount_sold)

FROM t_sales

GROUP BY CUBE (1, 2)

ORDER BY 1, 2;

country | product_name | sum

-----------+--------------+-----

Argentina | Hats | 111

Argentina | Shoes | 26

Argentina | | 137

Germany | Hats | 41

Germany | Shoes | 63

Germany | | 104

USA | Hats | 171

USA | Shoes | 202

USA | | 373

| Hats | 323

| Shoes | 291

| | 614

(12 rows)

In this case, we've got all the combinations. Technically, it's the same as: GROUP BY country + GROUP BY product_name + GROUP BY country_product_name + GROUP BY (). We could do that using more than just one statement, but doing it at once is easier - and a lot more efficient.

Again, NULL values have been added to indicate various aggregation levels. For a tutorial, check out my YouTube video about GROUP BY CUBE.

Grouping sets: Execution plans

Grouping sets don’t just simply rewrite the query to turn it into a UNION ALL - there is actually specific code in the database engine to perform those aggregations.

What you will see is a “MixedAggregate” which is capable of aggregating at various levels at once. Here is an example:

test=# explain SELECT country, product_name, sum(amount_sold)
           FROM t_sales
           GROUP BY CUBE (1, 2)
           ORDER BY 1, 2;
                          QUERY PLAN
-----------------------------------------------------------
 Sort  (cost=64.15..65.65 rows=601 width=96)
   Sort Key: country, product_name
   ->  MixedAggregate  (cost=0.00..36.41 rows=601 width=96)
     Hash Key: country, product_name
     Hash Key: country
     Hash Key: product_name
     Group Key: ()
     ->  Seq Scan on t_sales ...
(8 rows)

test=# explain SELECT country, product_name, sum(amount_sold)

FROM t_sales

GROUP BY CUBE (1, 2)

ORDER BY 1, 2;

QUERY PLAN

-----------------------------------------------------------

Sort (cost=64.15..65.65 rows=601 width=96)

Sort Key: country, product_name

-> MixedAggregate (cost=0.00..36.41 rows=601 width=96)

Hash Key: country, product_name

Hash Key: country

Hash Key: product_name

Group Key: ()

-> Seq Scan on t_sales ...

(8 rows)

Looking at the MixedAggregate also reveals which aggregations are performed as part of the grouping set.

Finally ...

In general, grouping sets are a really cool feature which is often unknown or overlooked. We highly recommend making use of this awesome stuff to speed up your aggregations. It's particularly useful if you are dealing with a large data set.

If you want to know more about PostgreSQL and SQL in general, you might also like my post about “string encoding using SQL".