# SQL trickery: Hypothetical aggregates

06.2020 / Category: / Tags:

“If we had this data what would it mean?” - these kinds of questions can be answered using plain SQL. The technique you will need in PostgreSQL is a “hypothetical aggregate” which is of course part of the ANSI SQL standard. This post will show what an hypothetical aggregate is good for and how it works.

Just like min, max, sum, count, and so on a hypothetical aggregate is used in conjunction with a GROUP BY clause. The GROUP BY will split the data into small portions and the aggregation will be applied on each of those groups. To show how this works I have created 10 rows and split them into “even” and “off” numbers. The way to do it is “modulo 2”. Here is an example:

As you can see PostgreSQL has stuffed all the data into two arrays. There is nothing fancy about it. We can use any other aggregate such as sum, avg, min, max, etc. However, I used array_agg to show you which values are taken into consideration to calculate the aggregate here.

The main question now is: What is a hypothetical aggregate? Suppose we take the list of data in each group sort the values. Then we assume a hypothetical value (“as if it was there”) and see at which position it would end up. Here is an example:

If we take even numbers (2, 4, 6, 8, 10) in sorted 3.5 would be the 2nd entry. If we take odd numbers and sort them 3.5 would be the third value in the list. This is exactly what an hypothetical aggregate does.

### Practical use cases of hypothetical aggregates

Why would anybody want to use hypothetical aggregates in PostgreSQL? Here is an example: Suppose there is a sports event, and somebody is on the racing track. You want to know: If this racer reaches the finish line in 54 minutes, is he going to win? Finish as 10th? Or maybe last? A hypothetical aggregate will tell you that before the final score is inserted.

There are many more uses cases, but ranking is by far the most common and most widespread one.

### Finally …

If you want to know more about advanced SQL check out what we have to say about other fancy SQL stuff. Here is a post about speeding up aggregations by changing column orders.

Article Rating
Subscribe
Notify of
1 Comment
Inline Feedbacks
Emil Marcetta
4 years ago

And we assume the order of array_agg elements has changed because of the quicksort/binary search?

```GroupAggregate (cost=62.33..75.83 rows=200 width=44) (actual time=0.060..0.070 rows=2 loops=1) Group Key: ((x % 2)) -> Sort (cost=62.33..64.83 rows=1000 width=8) (actual time=0.022..0.024 rows=10 loops=1) Sort Key: ((x % 2)) Sort Method: quicksort Memory: 25kB -> Function Scan on generate_series x (cost=0.00..12.50 rows=1000 width=8) (actual time=0.010..0.012 rows=10 loops=1) Planning Time: 0.081 ms Execution Time: 0.122 ms```

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Support Platform
Get the newest PostgreSQL Info & Tools