When an index is missing,
good performance won’t be kissing
a PostgreSQL user looking for efficiency
but instead feels like a legacy.
To satisfy a DBA’s desire and thirst,
let us load some data first.
pgbench is the tool of the day
but the next listing will explain that anyway:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[hs@fedora ~]$ pgbench -s 100 -i test dropping old tables... NOTICE: table 'pgbench_accounts' does not exist, skipping NOTICE: table 'pgbench_branches' does not exist, skipping NOTICE: table 'pgbench_history' does not exist, skipping NOTICE: table 'pgbench_tellers' does not exist, skipping creating tables... generating data (client-side)... 10000000 of 10000000 tuples (100%) done (elapsed 9.96 s, remaining 0.00 s) vacuuming... creating primary keys... done in 14.58 s (drop tables 0.00 s, create tables 0.07 s, client-side generate 10.03 s, vacuum 1.68 s, primary keys 2.80 s). |
Loading millions of rows into PostgreSQL is not hard,
that is not the tricky part.
Data so quickly created,
is usually not hated.
10 seconds for 10 million rows,
the DBA in charge bows.
Performance will be good
ensuring all users’ good mood.
To celebrate the success
let the database run again and confess.
This time we are looking for fast reads,
an important thing when aiming for high speeds:
1 2 3 4 5 6 7 8 9 10 11 12 |
[hs@fedora ~]$ pgbench -S -c 10 -T 30 -j 10 test starting vacuum...end. transaction type: scaling factor: 100 query mode: simple number of clients: 10 number of threads: 10 duration: 30 s number of transactions actually processed: 3054408 latency average = 0.098 ms initial connection time = 13.872 ms tps = 101859.582811 (without initial connection time) |
pgbench is the name of the game.
My slow box, 101 thousand transactions, insane.
30 seconds to execute the test,
10 concurrent connections that did not rest.
Even 10 threads for the client code,
running in super quick mode.
However, only indexes kept us happy and fine,
bringing good performance, almost divine.
Without indexing life is harsh to the one that suffers,
just like misconfigured shared_buffers
.
Our holy lord the superior b-tree.
without it bad performance we see.
A single missing index for a test,
the entire database feels like it is at rest.
Do you want to know why?
Let us give it a try:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
test=# d+ List of relations Schema | Name | Type | Owner | Persistence | Access method | Size | Description --------+------------------+-------+-------+-------------+---------------+---------+------------- public | pgbench_accounts | table | hs | permanent | heap | 1281 MB | public | pgbench_branches | table | hs | permanent | heap | 40 kB | public | pgbench_history | table | hs | permanent | heap | 0 bytes | public | pgbench_tellers | table | hs | permanent | heap | 80 kB | (4 rows) test=# d pgbench_accounts Table 'public.pgbench_accounts' Column | Type | Collation | Nullable | Default ----------+---------------+-----------+----------+--------- aid | integer | | not null | bid | integer | | | abalance | integer | | | filler | character(84) | | | Indexes: 'pgbench_accounts_pkey' PRIMARY KEY, btree (aid) |
Killing an index, only one …
All the performance will be gone:
1 2 3 4 5 6 7 8 9 10 |
test=# ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey; ALTER TABLE test=# d pgbench_accounts Table 'public.pgbench_accounts' Column | Type | Collation | Nullable | Default ----------+---------------+-----------+----------+--------- aid | integer | | not null | bid | integer | | | abalance | integer | | | filler | character(84) | | |
Let me run the test again.
Good performance? Back then?
Good lord? What happened to my data?
Will speed be back later?
1 2 3 4 5 6 7 8 9 10 11 12 13 |
[hs@fedora ~]$ pgbench -S -c 10 -T 30 -j 10 test pgbench (14beta2) starting vacuum...end. transaction type: scaling factor: 100 query mode: simple number of clients: 10 number of threads: 10 duration: 30 s number of transactions actually processed: 259 latency average = 1189.653 ms initial connection time = 5.727 ms tps = 8.405815 (without initial connection time) |
8 transactions per second will reveal
that bad performance is hard to conceal.
A single index is broken,
all end users have been walking.
Calling support hotlines like crazy,
No more chances for DBAs to be lazy.
The conclusion of the day.
Make sure no important indexes go away.
It is a really important affair
to index data with care.
Don’t forget a single thing
and performance will be king.
I hope this little poem makes people happy.
Make sure your databases are not crappy.
Creating auto increment columns in PostgreSQL is easy. Simply use two pseudo data types serial
and serial8
, respectively, then PostgreSQL will automatically take care of your auto increment columns. However, once in a while problems can still occur. Let us take a look and see.
To understand the underlying problem, one has to understand how auto increment and sequences work in PostgreSQL:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
test=# CREATE TABLE t_data ( id serial PRIMARY KEY, payload text ); CREATE TABLE test=# INSERT INTO t_data (payload) VALUES ('foo') RETURNING *; id | payload ----+--------- 1 | foo (1 row) INSERT 0 1 test=# INSERT INTO t_data (payload) VALUES ('bar') RETURNING *; id | payload ----+--------- 2 | bar (1 row) INSERT 0 1 |
In this case, everything is fine. But what happens if a manual value is added? Note that PostgreSQL does allow this. Other databases, such as Informix, explicitly prohibit it. However, in PostgreSQL an autoincrement value is really just a sequence which adds a default value. Nothing stops you from breaking things:
1 2 3 4 5 6 7 8 9 10 |
test=# INSERT INTO t_data VALUES (3, 'bang') RETURNING *; id | payload ----+--------- 3 | bang (1 row) INSERT 0 1 test=# INSERT INTO t_data (payload) VALUES ('boom') RETURNING *; ERROR: duplicate key value violates unique constraint 't_data_pkey' DETAIL: Key (id)=(3) already exists. |
The trouble is that the manual insertion does not increment the sequence. Thus, the second INSERT
statement is going to fail because the value is already there. This often happens when people import data into the database.
So the main question is: How can we fix these sequences? The answer is: Use pg_sequence_fixer.
The idea behind pg_sequence_fixer is to have a tool that can easily fix these things with one simple stored procedure call. It automatically looks for all sequences associated with a column, finds the max value, and pumps it up to what is needed.
First clone the repository which can be found here: https://github.com/cybertec-postgresql/pg_sequence_fixer
Then simply install the extension:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
iMac:pg_sequence_fixer hs$ make install /bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/share/postgresql/extension' /bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/share/postgresql/extension' /bin/sh /usr/local/lib/postgresql/pgxs/src/makefiles/../../config/install-sh -c -d '/usr/local/Cellar/postgresql/13.1/share/doc/postgresql/extension' /usr/bin/install -c -m 644 .//pg_sequence_fixer.control '/usr/local/share/postgresql/extension/' /usr/bin/install -c -m 644 .//pg_sequence_fixer--*.sql '/usr/local/share/postgresql/extension/' /usr/bin/install -c -m 644 .//README.md '/usr/local/Cellar/postgresql/13.1/share/doc/postgresql/extension/' |
The extension is now available and one can enable it:
1 2 |
test=# CREATE EXTENSION pg_sequence_fixer; CREATE EXTENSION |
Finally, you can run the procedure:
1 2 3 4 5 6 |
test=# SELECT pg_sequence_fixer(1000, false); NOTICE: setting sequence for t_data to 1003 pg_sequence_fixer ------------------- (1 row) |
The first parameter (= 1000
) means that we want to add a safety margin of 1000 to the max value determined by the system. The second parameter controls whether we want to lock those tables now or during the operation. Usually, locks are not desirable, but sometimes they are necessary in case you cannot take down the application in which you are fixing your sequences.
Note that for each sequence fixed, the tool will tell PostgreSQL to issue one log entry (= NOTICE
).
Voila, your database is sound and healthy again.
Fixing sequences is an important issue you can achieve easily using pg_sequence_fixer. Also: We want to point out that this is just the first incarnation of this tool, so if you are willing to contribute - feel free to submit changes.
If you want to learn more about PostgreSQL right now, consider checking out my post about parallel CREATE INDEX in PostgreSQL.
Our team is proud to introduce a new major pg_timetable v4 release!
This time a huge amount of work has been done and the pg_timetable v4 release is backward-incompatible with previous versions. However, job migration is straightforward and shouldn't stop anyone from updating.
We've added detailed Read the Docs documentation and will enhance it further. Let us know what areas should be highlighted or what topics we should add.
Now, let's check out some of the great new features! You can review the official release page for a complete changelog with links to issues, commits, and pull requests.
I want to remind you that pg_timetable is a community project. So, please, don't hesitate to ask any questions , to report bugs, to star pg_timetable project, and to tell the world about it.
The previous release is still considered stable. There are no confirmed bugs nor critical performance issues and is absolutely safe to stick with it.
However, it won't be backpatched if new bugs are discovered. In addition, new features won't be implemented for the v3 branch.
To increase performance, we worked on several fronts.
First of all, we reworked the database schema and simplified it, as much as possible. This way new pg_timetable can execute fewer system queries leaving more bandwidth and connections to the real workers.
Secondly, we rewrote cron-related functions in pure SQL instead of PL/PgSQL. It allows us not only to increase performance but also to get rid of tedious debugging.
And the third pillar is the comprehensive logging system. See details below in the separate section.
As soon as the functionality grows, it becomes error-prone and tedious to handle configuration using only command-line parameters and environmental variables.
Rules of common sense suggest to use configuration files, in this case. We've added a detailed self-explanatory config.example.yaml file for you.
Of course, you are free to use a custom set of options or not to use it at all. For example, it seems like a good idea to provide general options in the configuration file and connection options using environmental variables, for the sake of security.
One of the most frequent tasks for database schedulers is to import some data from external sources to the database, e.g. sensors, logs, official statistics, etc. The CSV format is standard de facto for such kinds of data. PostgreSQL can use the COPY command to import CSV files. But to use this function, the file should be placed in the server's file system, so the PostgreSQL process can access it. That is not always the case for production environments, especially in the cloud era.
To overcome this limitation in the previous version of pg_timetable, we suggest using psql copy command, which performs a frontend (client) copy. This is an operation that runs an SQL COPY
command, but instead of the server reading or writing the specified file, psql reads or writes the file and routes the data between the server and the local file system. This means that file accessibility and privileges are those of the local user, not the server, and no SQL superuser privileges are required.
From now on you don't need any additional programs to perform client copy. One can use the new built-in CopyFromFile
task. You may find the full example in the manual. Below you will notice that the task accepts JSON parameters in the form of an object, e.g.
1 2 3 4 |
{ 'sql': 'COPY location FROM STDIN WITH DELIMITER '|' NULL '<NULL>'', 'filename': 'download/orte_ansi.txt' } |
sql
is the correct COPY .. FROM STDIN
command. One can use any of the standard clauses controlling file specification, e.g. file encoding, condition for insert, escaping, quoting, etc.filename
is the correct path (absolute or relative to pg_timetable) of the source CSV file.This task perfectly works for all formats supported by PostgreSQL: text
, csv
and binary
.
One of the key architectural decisions made for pg_timetable was the database-driven concept. Meaning scheduler sessions are fully controlled by the database and every piece of the output is stored back in the database for further analysis.
That said, people were using output redirection, if they wanted to have logs stored in files, e.g.
1 |
pg_timetable -c loader -u scheduler -d timetable > pgtt.log |
The drawback of this approach is that information is stored as plain text, making it hard for machine processing. In the new version, we have introduced the ability to duplicate logs to a file system using --log-file
command line parameter. Moreover, the user can choose the exact format of the underlying file specifying --log-file-format
: json
or text
.
Another bottleneck we were trying to fix was the database logging. Heavy clients were producing a lot of information to be stored by separate INSERT
statements, causing intensive use of connections. We have come up with an outstanding log hook implementation for the famous logrus package.
The key benefits are:
COPY
machinery;I personally highly encourage you to use our log hook in your projects. Let us know if you think we should make it a standalone project.
pg_timetable has been storing output for program tasks from the very beginning. It's useful for debugging and back-in-time incident analysis. We never thought this feature could be applied to the built-in SQL tasks until Nikolay Samokhvalov proposed it. So now one can have SQL task with SELECT
statement to determine which output will be saved upon execution, e.g.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
my_database=> timetable=> SELECT timetable.add_job('log-sessions', '* * * * *', 'SELECT * FROM timetable.active_session'); add_job --------- 1 (1 row) ... my_database=> SELECT chain_id, finished, output FROM timetable.execution_log; chain_id | finished | output ----------+-------------------------------+---------- 1 | 2021-08-12 15:40:46.064795+02 | SELECT 2 (1 row) |
The output of the SQL task is the tag of the command executed and the number of rows returned or affected. We are not storing the result set since commands can be arbitrarily complicated or rather huge. Users should take care of this and explicitly store anything important, e.g.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
my_database=> CREATE TABLE history_session ( ts timestamptz DEFAULT now(), LIKE timetable.active_session); CREATE TABLE my_database=> SELECT timetable.add_job('dump-sessions', '* * * * *', 'INSERT INTO history_session SELECT now(), * FROM timetable.active_session'); add_job --------- 3 (1 row) ... my_database=> SELECT * FROM history_session; ts | client_pid | client_name | server_pid -------------------------------+------------+-------------+------------ 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 15676 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 16436 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 14892 2021-08-12 15:52:02.378981+02 | 10440 | loader-foo | 25480 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 21876 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 11852 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 10052 2021-08-12 15:53:02.446911+02 | 19664 | writer-boo | 27396 (8 rows) my_database=> SELECT chain_id, finished, output FROM timetable.execution_log; chain_id | finished | output ----------+-------------------------------+------------ 1 | 2021-08-12 15:40:46.064795+02 | SELECT 2 1 | 2021-08-12 15:52:02.380711+02 | SELECT 4 3 | 2021-08-12 15:52:02.382296+02 | INSERT 0 4 1 | 2021-08-12 15:52:07.34872+02 | SELECT 3 3 | 2021-08-12 15:52:07.448914+02 | INSERT 0 4 (5 rows) |
We want our scheduler to be as flexible as possible. That's why we have introduced the options category for resource management:
--cron-workers
: Number of parallel workers for scheduled chains (default: 16);--interval-workers
: Number of parallel workers for interval chains (default: 16);--chain-timeout
: Abort any chain that takes more than the specified number of milliseconds;--task-timeout
: Abort any task within a chain that takes more than the specified numberThis was the first in a series of posts dedicated to the new pg_timetable v4 features. Stay tuned for the coolest features to be highlighted.
You can find previous publications following the tag pg_timetable.
Stay safe, healthy, and wealthy!
Be happy! Peace! Love! ❤
by Kaarel Moppel
When I recently stumbled on an article comparing some main properties, and also the read-write performance of two very popular pieces of caching software, often used to speed up applications heavy on database queries, I immediately felt an itch - how would it actually look for PostgreSQL vs Redis vs Memcached on the performance side? Meaning, if one would just skip the cache and hit the database directly.
Especially after the first look, I wasn’t actually too impressed with the numbers presented for Redis and Memcache. It could have also been due to the fact that the actual test data, extracted from a linked paper, was from 2016 - that’s surely ages in “internet years”. Either way, I was curious to get some kind of a comparison point and quickly threw together a small benchmarking script in Python and let my workstation get to work.
Why Python you might ask, if the original test used Java? Well, Java is just one of my least favorite languages for database projects and I somehow have the impression that Python is very popular both for the Web and also ad-hoc database scripting, so probably a good fit here - might be wrong of course on that.
Some characteristics on my test setup:
1 2 3 4 5 6 7 8 9 10 11 12 |
CREATE UNLOGGED TABLE kv_test(key text, value int); CREATE INDEX ON kv_test (key); -- pseudo-code from the Python script for $ROWS in [1000, 10000, 100000, 1000000]: truncate kv_test generate $ROWS random values for $i := 1 .. $ROWS: insert $key into kv_test ($rand[i], $rand[i]) vacuum analyze the table for $i := 1 .. $ROWS: select * from kv_test where key = $rand[i] |
Represented in a similar way to the original numbers. The calculated time to write key-value pairs is in milliseconds.
NUMBER OF RECORDS | ||||
Database | 1,000 | 10,000 | 100,000 | 1,000,000 |
Redis (v3.0.7) | 34 | 214 | 1,666 | 14,638 |
Memcached (v1.4.14) | 23 | 100 | 276 | 2,813 |
PostgreSQL (v13.3) | 29.6 | 304 | 2,888 | 31,230 |
The calculated time to read key-value pairs (ms).
NUMBER OF RECORDS | ||||
Database | 1,000 | 10,000 | 100,000 | 1,000,000 |
Redis (v3.0.7) | 8 | 6 | 8 | 8 |
Memcached (v1.4.14) | 9 | 14 | 14 | 30 |
PostgreSQL (v13.3) | 0.026 | 0.028 | 0.027 | 0.029 |
In short - the numbers looked surprisingly/suspiciously good for the reading test for Postgres! I can’t imagine how the original test managed to get such high single-digit millisecond results for random key reads across the whole dataset. For my test, I only managed to see 1ms+ worst cases for the biggest rowcount. This data can, by the way, be also looked up on the “pg_stat_statements” snapshots table called “results”.
So sadly, I still cannot possibly get too ecstatic as there was a lot of information missing on the details of exactly how the original tests were performed, so it might have been a bit of an “apples to oranges” situation still in the end, I’m afraid. The average key readout times for Redis/Memcached seemed just way too slow in comparison to Postgres. I suspect they used a remote machine still for the cache, although the paper didn’t mention it and talked about a single Core i7 node.
But about the writing speed of key-value data - well: Postgres doesn’t really compete on higher row counts. ???? But this was also more or less expected! Why? A full-blown relational database engine like Postgres goes to great lengths to ensure we cannot insert invalid data violating some constraints, plus the WAL writing (minimal though for unlogged tables but still) and the on-disk-format overhead - internal columns, alignment, some index bloat on page splits, etc. That all amplifies writes a lot! So basically, I think the results still aren’t bad in the end. Only ca 2x slower than Redis for 100k and 1M row counts. The main idea of caches is that they’re only useful if we read much more from them compared to writing/updating anyways!
But be it how it is with the comparison to the other DBs in absolute numbers, it was good to see that the relative stability of Postgres responses to growing datasets was very-very good! And even beating Memcached which deteriorated 2x on 1M rows! This all probably shows that the selected algorithms for PostgreSQL are mathematically sound and well implemented!
But what can be definitely said - Postgres performance is definitely at least good enough for a “caching use case” for some smaller amount of rows. So with your next project it might be worth asking, do you really need another external component?
Or maybe just:
PostgreSQL is one of the best OLTP databases (OLTP = online transaction processing) in the world. However, it can do more than just OLTP. PostgreSQL offers many additional features relevant to a more OLAP-style workload. One of those features is called “GROUPING SETS”.
Before we dive into the details, I've compiled some sample data which you can easily load into your SQL database:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
CREATE TABLE t_sales ( country text, product_name text, year int, amount_sold numeric ); INSERT INTO t_sales VALUES ('Argentina', 'Shoes', 2020, 12), ('Argentina', 'Shoes', 2021, 14), ('Argentina', 'Hats', 2020, 54), ('Argentina', 'Hats', 2021, 57), ('Germany', 'Shoes', 2020, 34), ('Germany', 'Shoes', 2021, 29), ('Germany', 'Hats', 2020, 19), ('Germany', 'Hats', 2021, 22), ('USA', 'Shoes', 2020, 99), ('USA', 'Shoes', 2021, 103), ('USA', 'Hats', 2020, 81), ('USA', 'Hats', 2021, 90) ; |
Note that everything you are going to see in this blog is pretty SQL-standard compliant, so you can expect most of the stuff to work in other professional SQL databases as well.
Let's get started with a simple aggregation:
1 2 3 4 5 6 7 8 9 |
test=# SELECT country, sum(amount_sold) FROM t_sales GROUP BY 1; country | sum -----------+----- USA | 373 Germany | 104 Argentina | 137 (3 rows) |
There's not much to say here, apart from the fact that we will get one sum for each group. However, there's a bit of a philosophical discussion going on. “GROUP BY 1” basically means “GROUP BY country” which is the equivalent of the first column in the SELECT clause. Therefore “GROUP BY country” and “GROUP BY 1” are the same thing:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
test=# SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY 1, 2 ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | Hats | 111 Argentina | Shoes | 26 Germany | Hats | 41 Germany | Shoes | 63 USA | Hats | 171 USA | Shoes | 202 (6 rows) |
Of course, this works with more than one column as well. However, I want to point out something else. Consider the following example:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# SELECT CASE WHEN country = 'USA' THEN 'USA' ELSE 'non-US' END, sum(amount_sold) FROM t_sales GROUP BY 1; case | sum --------+----- USA | 373 non-US | 241 (2 rows) |
Most people group by a column. In some cases, it can make sense to group by an expression. In my case, we are forming groups on the fly (= one group for the US and one for non-US sales). This feature is often underappreciated. However, it is useful in many real-world scenarios. Keep in mind that all the things you are going to see also work with expressions, meaning more flexible grouping is possible.
GROUP BY will turn every distinct entry in a column into a group. Sometimes you might want to do more grouping at once. Why is that necessary? Suppose you are processing a 10 TB table. Clearly, reading this data is usually the limiting factor in terms of performance. So reading the data once and producing more results at once is appealing. That's exactly what you can do with GROUP BY GROUP SETS. Suppose we want to produce two results at once:
Here's how it works:
1 2 3 4 5 6 7 8 9 10 11 12 |
test=# SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY GROUPING SETS ((1), (2)) ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | | 137 Germany | | 104 USA | | 373 | Hats | 323 | Shoes | 291 (5 rows) |
In this case, PostgreSQL simply appends the results. The first three lines represent “GROUP BY country”. The next two lines contain the result of “GROUP BY product_name”. Logically, it's the equivalent of the following query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
test=# SELECT NULL AS country , product_name, sum(amount_sold) FROM t_sales GROUP BY 1, 2 UNION ALL SELECT country, NULL, sum(amount_sold) FROM t_sales GROUP BY 1, 2 ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | | 137 Germany | | 104 USA | | 373 | Hats | 323 | Shoes | 291 (5 rows) |
However, the GROUPING SETS version is ways more efficient because it only has to read the data once.
When creating reports, you will often need the “bottom line” which sums up what has been shown in the table. The way to do that in SQL is to use “GROUP BY ROLLUP”:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
test=# SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY ROLLUP (1, 2) ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | Hats | 111 Argentina | Shoes | 26 Argentina | | 137 Germany | Hats | 41 Germany | Shoes | 63 Germany | | 104 USA | Hats | 171 USA | Shoes | 202 USA | | 373 | | 614 (10 rows) |
PostgreSQL will inject a couple of rows into the result. As you can see, “Argentina” returns 3 and not just 2 rows. The “product_name = NULL” entry was added by ROLLUP. It contains the sum of all argentinian sales (116 + 27 = 137). Additional rows are injected for both other countries. Finally, a row is added for the overall sales worldwide.
Often those NULL entries are not what people want to see, thus it can make sense to replace them with some other kind of entry. The way to do that is to use a subselect which checks for the NULL entry and does the replacement. Here's how it works:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
test=# SELECT CASE WHEN country IS NULL THEN 'TOTAL' ELSE country END, CASE WHEN product_name IS NULL THEN 'TOTAL' ELSE product_name END, sum FROM (SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY ROLLUP (1, 2) ORDER BY 1, 2 ) AS x; country | product_name | sum -----------+--------------+----- Argentina | Hats | 111 Argentina | Shoes | 26 Argentina | TOTAL | 137 Germany | Hats | 41 Germany | Shoes | 63 Germany | TOTAL | 104 USA | Hats | 171 USA | Shoes | 202 USA | TOTAL | 373 TOTAL | TOTAL | 614 (10 rows) |
As you can see, all NULL entries have been replaced with “TOTAL”, which in many cases is the more desirable way to display this data.
ROLLUP is useful if you want to add the “bottom line”. However, you often want to see all combinations of countries and products. GROUP BY CUBE will do exactly that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
test=# SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY CUBE (1, 2) ORDER BY 1, 2; country | product_name | sum -----------+--------------+----- Argentina | Hats | 111 Argentina | Shoes | 26 Argentina | | 137 Germany | Hats | 41 Germany | Shoes | 63 Germany | | 104 USA | Hats | 171 USA | Shoes | 202 USA | | 373 | Hats | 323 | Shoes | 291 | | 614 (12 rows) |
In this case, we've got all the combinations. Technically, it's the same as: GROUP BY country + GROUP BY product_name + GROUP BY country_product_name + GROUP BY (). We could do that using more than just one statement, but doing it at once is easier - and a lot more efficient.
Again, NULL values have been added to indicate various aggregation levels. For a tutorial, check out my YouTube video about GROUP BY CUBE.
Grouping sets don’t just simply rewrite the query to turn it into a UNION ALL - there is actually specific code in the database engine to perform those aggregations.
What you will see is a “MixedAggregate” which is capable of aggregating at various levels at once. Here is an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
test=# explain SELECT country, product_name, sum(amount_sold) FROM t_sales GROUP BY CUBE (1, 2) ORDER BY 1, 2; QUERY PLAN ----------------------------------------------------------- Sort (cost=64.15..65.65 rows=601 width=96) Sort Key: country, product_name -> MixedAggregate (cost=0.00..36.41 rows=601 width=96) Hash Key: country, product_name Hash Key: country Hash Key: product_name Group Key: () -> Seq Scan on t_sales ... (8 rows) |
Looking at the MixedAggregate also reveals which aggregations are performed as part of the grouping set.
In general, grouping sets are a really cool feature which is often unknown or overlooked. We highly recommend making use of this awesome stuff to speed up your aggregations. It's particularly useful if you are dealing with a large data set.
If you want to know more about PostgreSQL and SQL in general, you might also like my post about “string encoding using SQL".