“If we had this data what would it mean?” – these kinds of questions can be answered using plain SQL. The technique you will need in PostgreSQL is a “hypothetical aggregate” which is of course part of the ANSI SQL standard. This post will show what an hypothetical aggregate is good for and how it works.

hypothetical aggregates in postgresql

Advanced GROUP BY statements

Just like min, max, sum, count, and so on a hypothetical aggregate is used in conjunction with a GROUP BY clause. The GROUP BY will split the data into small portions and the aggregation will be applied on each of those groups. To show how this works I have created 10 rows and split them into “even” and “off” numbers. The way to do it is “modulo 2”. Here is an example:

test=# SELECT x % 2 AS grp, array_agg(x) 
       FROM   generate_series(1, 10) AS x GROUP BY x % 2;
 grp | array_agg
   1 | {1,3,5,7,9}
   0 | {2,4,6,8,10}
(2 rows)

As you can see PostgreSQL has stuffed all the data into two arrays. There is nothing fancy about it. We can use any other aggregate such as sum, avg, min, max, etc. However, I used array_agg to show you which values are taken into consideration to calculate the aggregate here.

The main question now is: What is a hypothetical aggregate? Suppose we take the list of data in each group sort the values. Then we assume a hypothetical value (“as if it was there”) and see at which position it would end up. Here is an example:

test=# SELECT x % 2 AS grp, array_agg(x), 
              rank(3.5) WITHIN GROUP (ORDER BY x) 
       FROM   generate_series(1, 10) AS x 
       GROUP BY x % 2;
 grp |  array_agg   | rank
   0 | {10,2,4,6,8} | 2
   1 |  {9,7,3,1,5} | 3
(2 rows)

If we take even numbers (2, 4, 6, 8, 10) in sorted 3.5 would be the 2nd entry. If we take odd numbers and sort them 3.5 would be the third value in the list. This is exactly what an hypothetical aggregate does.

Practical use cases of hypothetical aggregates

Why would anybody want to use hypothetical aggregates in PostgreSQL? Here is an example: Suppose there is a sports event, and somebody is on the racing track. You want to know: If this racer reaches the finish line in 54 minute, is he going to win? Finish as 10th? Or maybe last? A hypothetical aggregate will tell you already before the final score is inserted.

There are many more uses cases, but ranking is by far the most common and most widespread one.

Finally …

If you want to know more about advanced SQL check out what we have to say about other fancy SQL stuff. Here is a post about speeding up aggregations by changing column orders.