CYBERTEC Logo

PERFORMANCE TUNING: MAX AND GROUP BY

04.2024 / Category: / Tags:

These days everybody is talking about time series, time series analysis and alike for performance tuning. Analyzing time series data in PostgreSQL can provide valuable insights, help in making informed decisions and understanding data more deeply. By utilizing PostgreSQL’s powerful features, we can efficiently query all types of measurement data to track trends, patterns, and anomalies over time. However, often there is a tiny little requirement, people are struggling to understand. Consider the following data:

What we see here is a classical simple table containing time series data in PostgreSQL. For a couple of days we store 1 measurement value for each of those 100 sensors. Overall this provides us with roughly 43 million rows of data.

Common time series questions

Here is one of the most typical questions somebody might have given this data set:

  • Find the highest value for each sensor

In this post we will discuss how this can be done in the most efficient way possible.

Find the highest value for each group

This is most likely the most common use case we have seen. For each sensor in our series we want to see the highest value in the data set. Sounds easy? Well, it is … Let us take a look at the query:

What happens here is that PostgreSQL will read ALL the data and run aggregation. To speed things up we see a parallel query using multiple CPU cores at a time. However, the core problem is: We got to read all the data. Of course that leads to long runtimes which keep growing the more data we have. 

Let us try to deploy and index and see if it changes anything:

Running the query again will not lead to a better execution plan. The way to address this problem is to simulate something that is generally known as “skip scan”. What does it mean? For each incarnation of sensor values we find the maximum. The system can find the maximum value for a single sensor quite quickly.

Wow, we can find a single value in a fraction of a millisecond. Unfortunately we have to apply some trickery to do this for more than just a single sensor. There are two ways to emulate a skip scan:

  • Use a table to store a list of sensor ideas and use a LATERAL join
  • Run a recursive query

The following example shows how we can use a lateral join to solve the problem:

What is a LATERAL? Let us step back a bit and see SQL from a more philosophical point of view:

In code this means:

One can see a FROM clause as some kind of loop. Now: If we want to find the max for a list of values what we really need is a loop around that loop. That is exactly what LATERAL does for us: For each value returned from the generate_series call (= generates the list of sensors) we run the LATERAL part of the query. The problem is: We are generating the list of sensors which makes the query a bit too “static” so it is better to fetch the sensor list from a second relation (normalization).

However, we can also achieve the same thing using a simple recursion:

The core idea behind the query is as follows: We find the smallest sensor_id in the table which is really fast. Then we look for the next smallest value larger than the one we found before. We keep doing this in the recursion until we have quickly compiled a unique list of sensor_id. The recursion is indeed the fastest way to find a list of DISTINCT sensor_id without having to read the entire table. Essentially this is what it means to simulate a “skip scan” as mentioned before. Finally we can use this list of unique sensor IDS (“x”) to find the maximum value for each of them which is of course a fast index lookup.

Finally…

If you want to know more about time series in general you might want to check out my post about simple pattern matching and string encoding in PostgreSQL which is available on our website as well.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on TwitterFacebook, or LinkedIn.

5 1 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram