CYBERTEC Logo

PostgreSQL grouping sets: ROLLUP & CUBE

08.2021 / Category: / Tags: | |

PostgreSQL is one of the best OLTP databases (OLTP = online transaction processing) in the world. However, it can do more than just OLTP. PostgreSQL offers many additional features relevant to a more OLAP-style workload. One of those features is called “GROUPING SETS”. 

Before we dive into the details, I've compiled some sample data which you can easily load into your SQL database: 

Note that everything you are going to see in this blog is pretty SQL-standard compliant, so you can expect most of the stuff to work in other professional SQL databases as well. 

Let's get started with a simple aggregation:

There's not much to say here, apart from the fact that we will get one sum for each group. However, there's a bit of a philosophical discussion going on. “GROUP BY 1” basically means “GROUP BY country” which is the equivalent of the first column in the SELECT clause. Therefore “GROUP BY country” and “GROUP BY 1” are the same thing:

Of course, this works with more than one column as well. However, I want to point out something else. Consider the following example: 

Most people group by a column. In some cases, it can make sense to group by an expression. In my case, we are forming groups on the fly (= one group for the US and one for non-US sales). This feature is often underappreciated. However, it is useful in many real-world scenarios. Keep in mind that all the things you are going to see also work with expressions, meaning more flexible grouping is possible. 

GROUPING SETS: The basic building blocks

GROUP BY will turn every distinct entry in a column into a group. Sometimes you might want to do more grouping at once. Why is that necessary? Suppose you are processing a 10 TB table. Clearly, reading this data is usually the limiting factor in terms of performance. So reading the data once and producing more results at once is appealing. That's exactly what you can do with GROUP BY GROUP SETS. Suppose we want to produce two results at once:

  • GROUP BY country
  • GROUP BY product_name

Here's how it works: 

In this case, PostgreSQL simply appends the results. The first three lines represent “GROUP BY country”. The next two lines contain the result of “GROUP BY product_name”. Logically, it's the equivalent of the following query:

However, the GROUPING SETS version is ways more efficient because it only has to read the data once. 

ROLLUP: Adding the “bottom line”

When creating reports, you will often need the “bottom line” which sums up what has been shown in the table. The way to do that in SQL is to use “GROUP BY ROLLUP”:

PostgreSQL will inject a couple of rows into the result. As you can see, “Argentina” returns 3 and not just 2 rows. The “product_name = NULL” entry was added by ROLLUP. It contains the sum of all argentinian sales (116 + 27 = 137). Additional rows are injected for both other countries. Finally, a row is added for the overall sales worldwide. 

Often those NULL entries are not what people want to see, thus it can make sense to replace them with some other kind of entry. The way to do that is to use a subselect which checks for the NULL entry and does the replacement.  Here's how it works:

As you can see, all NULL entries have been replaced with “TOTAL”, which in many cases is the more desirable way to display this data.

CUBE: Creating data cubes in PostgreSQL efficiently

ROLLUP is useful if you want to add the “bottom line”. However, you often want to see all combinations of countries and products. GROUP BY CUBE will do exactly that:

In this case, we've got all the combinations. Technically, it's the same as: GROUP BY country + GROUP BY product_name + GROUP BY country_product_name + GROUP BY (). We could do that using more than just one statement, but doing it at once is easier - and a lot more efficient. 

Again, NULL values have been added to indicate various aggregation levels. For a tutorial, check out my YouTube video about GROUP BY CUBE.

Grouping sets: Execution plans

Grouping sets don’t just simply rewrite the query to turn it into a UNION ALL - there is actually specific code in the database engine to perform those aggregations. 

What you will see is a “MixedAggregate” which is capable of aggregating at various levels at once. Here is an example: 

Looking at the MixedAggregate also reveals which aggregations are performed as part of the grouping set. 

Finally ...

In general, grouping sets are a really cool feature which is often unknown or overlooked. We highly recommend making use of this awesome stuff to speed up your aggregations. It's particularly useful if you are dealing with a large data set. 

If you want to know more about PostgreSQL and SQL in general, you might also like my post about “string encoding using SQL".

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram