SQL trickery: Hypothetical aggregates

06.2020

Category: How To

“If we had this data what would it mean?” - these kinds of questions can be answered using plain SQL. The technique you will need in PostgreSQL is a “hypothetical aggregate” which is of course part of the ANSI SQL standard. This post will show what an hypothetical aggregate is good for and how it works.

Table of Contents

Advanced GROUP BY statements

Just like min, max, sum, count, and so on a hypothetical aggregate is used in conjunction with a GROUP BY clause. The GROUP BY will split the data into small portions and the aggregation will be applied on each of those groups. To show how this works I have created 10 rows and split them into “even” and “off” numbers. The way to do it is “modulo 2”. Here is an example:

test=# SELECT x % 2 AS grp, array_agg(x) 
       FROM   generate_series(1, 10) AS x GROUP BY x % 2;
 grp | array_agg
-----+--------------
   1 | {1,3,5,7,9}
   0 | {2,4,6,8,10}
(2 rows)

1

2

3

4

5

6

7

test=# SELECT x % 2 AS grp, array_agg(x)

FROM generate_series(1, 10) AS x GROUP BY x % 2;

grp | array_agg

-----+--------------

1 | {1,3,5,7,9}

0 | {2,4,6,8,10}

(2 rows)

As you can see PostgreSQL has stuffed all the data into two arrays. There is nothing fancy about it. We can use any other aggregate such as sum, avg, min, max, etc. However, I used array_agg to show you which values are taken into consideration to calculate the aggregate here.

The main question now is: What is a hypothetical aggregate? Suppose we take the list of data in each group sort the values. Then we assume a hypothetical value (“as if it was there”) and see at which position it would end up. Here is an example:

test=# SELECT x % 2 AS grp, array_agg(x), 
              rank(3.5) WITHIN GROUP (ORDER BY x) 
       FROM   generate_series(1, 10) AS x 
       GROUP BY x % 2;
 grp |  array_agg   | rank
-----+--------------+------
   0 | {10,2,4,6,8} | 2
   1 |  {9,7,3,1,5} | 3
(2 rows)

1

2

3

4

5

6

7

8

9

test=# SELECT x % 2 AS grp, array_agg(x),

rank(3.5) WITHIN GROUP (ORDER BY x)

FROM generate_series(1, 10) AS x

GROUP BY x % 2;

grp | array_agg | rank

-----+--------------+------

0 | {10,2,4,6,8} | 2

1 | {9,7,3,1,5} | 3

(2 rows)

If we take even numbers (2, 4, 6, 8, 10) in sorted 3.5 would be the 2nd entry. If we take odd numbers and sort them 3.5 would be the third value in the list. This is exactly what an hypothetical aggregate does.

Practical use cases of hypothetical aggregates

Why would anybody want to use hypothetical aggregates in PostgreSQL? Here is an example: Suppose there is a sports event, and somebody is on the racing track. You want to know: If this racer reaches the finish line in 54 minutes, is he going to win? Finish as 10th? Or maybe last? A hypothetical aggregate will tell you that before the final score is inserted.

There are many more uses cases, but ranking is by far the most common and most widespread one.

Finally...

If you want to know more about advanced SQL check out what we have to say about other fancy SQL stuff. Here is a post about speeding up aggregations by changing column orders.

One response to “SQL trickery: Hypothetical aggregates”

Emil Marcetta says:

June 18, 2020 at 8:28 pm

And we assume the order of array_agg elements has changed because of the quicksort/binary search?

GroupAggregate (cost=62.33..75.83 rows=200 width=44) (actual time=0.060..0.070 rows=2 loops=1) Group Key: ((x % 2)) -> Sort (cost=62.33..64.83 rows=1000 width=8) (actual time=0.022..0.024 rows=10 loops=1) Sort Key: ((x % 2)) Sort Method: quicksort Memory: 25kB -> Function Scan on generate_series x (cost=0.00..12.50 rows=1000 width=8) (actual time=0.010..0.012 rows=10 loops=1) Planning Time: 0.081 ms Execution Time: 0.122 ms

Reply

SQL trickery: Hypothetical aggregates

Advanced GROUP BY statements

Practical use cases of hypothetical aggregates

Finally...

One response to “SQL trickery: Hypothetical aggregates”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts