SQL trickery: Configuring window functions

Generating simple data sets

Before we get started, I want to introduce my favorite set-returning functions which can help you to generate sample data:

Table of Contents

test=# SELECT * FROM generate_series(1, 10) AS x;
  x
----
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
(10 rows)

test=# SELECT * FROM generate_series(1, 10) AS x;

----

(10 rows)

All we do here is to generate a list from 1 to 10 and print it on the screen. Let us play around with window functions a bit now: There are two cases we need to keep in mind. If the OVER-clause is empty it means that the entire data set is used. If we use ORDER BY, it is only the data set up to the current row in the sorted list. The following listing contains an example:

test=# SELECT *,
array_agg(x) OVER (),
array_agg(x) OVER (ORDER BY x)
FROM generate_series(1, 10) AS x;
  x |        array_agg       |        array_agg
----+------------------------+------------------------
  1 | {1,2,3,4,5,6,7,8,9,10} | {1}
  2 | {1,2,3,4,5,6,7,8,9,10} | {1,2}
  3 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3}
  4 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4}
  5 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5}
  6 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6}
  7 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7}
  8 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8}
  9 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9}
 10 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10}
(10 rows)

test=# SELECT *,

array_agg(x) OVER (),

array_agg(x) OVER (ORDER BY x)

FROM generate_series(1, 10) AS x;

x | array_agg | array_agg

----+------------------------+------------------------

1 | {1,2,3,4,5,6,7,8,9,10} | {1}

2 | {1,2,3,4,5,6,7,8,9,10} | {1,2}

3 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3}

4 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4}

5 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5}

6 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6}

7 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7}

8 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8}

9 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9}

10 | {1,2,3,4,5,6,7,8,9,10} | {1,2,3,4,5,6,7,8,9,10}

(10 rows)

As you can see, the last column keeps accumulating more values.

PostgreSQL: ROWS BETWEEN … PRECEDING …. AND … FOLLOWING

Often it is necessary to limit the set of data (the window) used by the window function. ROWS BETWEEN … PRECEDING … AND … FOLLOWING allows you to do exactly that. The following example shows how this works:

test=# SELECT *, array_agg(x) OVER (ORDER BY x ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM generate_series(1, 10) AS x;
  x | array_agg
----+-----------
  1 | {1,2}
  2 | {1,2,3}
  3 | {2,3,4}
  4 | {3,4,5}
  5 | {4,5,6}
  6 | {5,6,7}
  7 | {6,7,8}
  8 | {7,8,9}
  9 | {8,9,10}
 10 | {9,10}
(10 rows)

test=# SELECT *, array_agg(x) OVER (ORDER BY x ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) FROM generate_series(1, 10) AS x;

x | array_agg

----+-----------

1 | {1,2}

2 | {1,2,3}

3 | {2,3,4}

4 | {3,4,5}

5 | {4,5,6}

6 | {5,6,7}

7 | {6,7,8}

8 | {7,8,9}

9 | {8,9,10}

10 | {9,10}

(10 rows)

What you see is that the data fed to array_agg is seriously restricted. But the restriction we are using here is a static one. The constants are hardwired. In some cases, you might need more flexibility.

Configuring window functions and analytics

More often than not, configuration has to be determined on the fly. The beauty is that in PostgreSQL you can use a subselect as part of the OVER-clause, which gives you a lot of flexibility.

Before we move on to a demo, we need to create a configuration table:

test=# CREATE TABLE t_config (key text, val int);
CREATE TABLE
test=# INSERT INTO t_config VALUES ('before', 1), ('after', 2);
INSERT 0 2

test=# CREATE TABLE t_config (key text, val int);

CREATE TABLE

test=# INSERT INTO t_config VALUES ('before', 1), ('after', 2);

INSERT 0 2

To make it simple, I've simply created two entries. The following SELECT statement uses those configuration parameters to do its magic. Here is how it works:

test=# SELECT *,
              array_agg(x) OVER (ORDER BY x
                                 ROWS BETWEEN (SELECT val
                                               FROM t_config
                                               WHERE key = 'before') PRECEDING
                                          AND (SELECT val
                                               FROM t_config
                                               WHERE key = 'after') FOLLOWING)
FROM generate_series(1, 10) AS x;

  x |  array_agg
----+------------
  1 | {1,2,3}
  2 | {1,2,3,4}
  3 | {2,3,4,5}
  4 | {3,4,5,6}
  5 | {4,5,6,7}
  6 | {5,6,7,8}
  7 | {6,7,8,9}
  8 | {7,8,9,10}
  9 | {8,9,10}
 10 | {9,10}
(10 rows)

test=# SELECT *,

array_agg(x) OVER (ORDER BY x

ROWS BETWEEN (SELECT val

FROM t_config

WHERE key = 'before') PRECEDING

AND (SELECT val

FROM t_config

WHERE key = 'after') FOLLOWING)

FROM generate_series(1, 10) AS x;

x | array_agg

----+------------

1 | {1,2,3}

2 | {1,2,3,4}

3 | {2,3,4,5}

4 | {3,4,5,6}

5 | {4,5,6,7}

6 | {5,6,7,8}

7 | {6,7,8,9}

8 | {7,8,9,10}

9 | {8,9,10}

10 | {9,10}

(10 rows)

As you can see, the query performs as expected and can be configured dynamically.

PARTITION BY and expressions

Another important note: PARTITION BY can take not only a column, but also an expression, to split the data set. Many people are not aware of this feature, which is actually quite useful. Here is an example:

test=# SELECT *, array_agg(x) OVER (PARTITION BY x % 2) FROM generate_series(1, 10) AS x;
  x |  array_agg
----+--------------
 10 | {10,2,4,6,8}
  2 | {10,2,4,6,8}
  4 | {10,2,4,6,8}
  6 | {10,2,4,6,8}
  8 | {10,2,4,6,8}
  9 | {9,7,3,1,5}
  7 | {9,7,3,1,5}
  3 | {9,7,3,1,5}
  1 | {9,7,3,1,5}
  5 | {9,7,3,1,5}
(10 rows)

test=# SELECT *, array_agg(x) OVER (PARTITION BY x % 2) FROM generate_series(1, 10) AS x;

x | array_agg

----+--------------

10 | {10,2,4,6,8}

2 | {10,2,4,6,8}

4 | {10,2,4,6,8}

6 | {10,2,4,6,8}

8 | {10,2,4,6,8}

9 | {9,7,3,1,5}

7 | {9,7,3,1,5}

3 | {9,7,3,1,5}

1 | {9,7,3,1,5}

5 | {9,7,3,1,5}

(10 rows)

In this case, we had no problem splitting the data into odd and even numbers. What I want to point out here is that PostgreSQL offers a lot of flexibility. We encourage you to test it out for yourself.

Finally...

Window functions are super important if you need to relate the rows in a result set to each other. You order them, you partition them, and then you define a window from which you can compute additional result columns.

Sometimes, you want to find out more about a timeseries. One thing we have seen quite often recently is to count how often somebody was active for a certain amount of time. “Detecting continuous periods of activity” will show you how to calculate these things in PostgreSQL easily.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

SQL trickery: Configuring window functions

Generating simple data sets

PostgreSQL: ROWS BETWEEN … PRECEDING …. AND … FOLLOWING

Configuring window functions and analytics

PARTITION BY and expressions

Finally...

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts