CYBERTEC Logo

Window function: Why first_value and last_value are not bugs

04.2018 / Category: / Tags: |

SQL is a must, if you want to be a Data Analyst or a Data Scientist. However, every once in a while people wonder why a result is the way it is. While on the road in Berlin (Germany) the other day, I found a fairly interesting window function scenario which is pretty counter-intuitive to most people, and which might be worth sharing.

Window function: What PostgreSQL does and why

PostgreSQL has provided window functions and analytics for quite some time now and this vital feature has been widely adopted by users, who are using PostgreSQL or SQL in general for more than just trivial queries. A modern database is just so much more than a simple data store and window functions are therefore certainly something to look into.

So, what is this post all about? Here is an example:

What we want, is the first value in our data set. The ORDER BY clause will ensure that data is fed to first_value in the right order. The result is therefore not surprising.

The same applies if we add DESC to our ORDER BY. The result is totally obvious:

Window function - last_value: Unexpected results

However, what if we use last_value. Here is an example:

What you can see here is that both columns will return the SAME data – regardless of the different sort order provided by the ORDER BY clause. That comes as a surprise to most people. Actually most people would accept one column to contain only “5” and the other column to contain only “1”.

Why is that not the case? Here is the answer:

Let us take a look at which values last_value will actually see: array_agg will simply put them all into an array so that we can expect things in detail. As you can see, the last value in the array is identical in both cases, which means that both columns will produce exactly the identical output.

0 0 votes
Article Rating
Subscribe
Notify of
guest
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
taitmann
taitmann
5 years ago

This ia good tutorial. Thanks. But i have a different situation as follows;
I want to select and get Aquila with 20 as the last value under the marks colum and omit the null rows.
How do i archive this? Please help.

My table looks like this.
.....................................................................
Student --------------Mark
.....................................................................

1. John ----------------25

2. Peter ---------------30

3. Jack -------------- Null

4. Aquila --------------20

5. Judy ----------------null

6. Kessy --------------null

Colin 't Hart
6 years ago

You should mention the frame_clause, and how it defaults to RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. The reason that this is the default is to facilitate running totals and the like, which are very common uses for windowing functions.

Making the default RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING would have been far less useful and made more common cases require far more verbose SQL.

Chad Capra
Chad Capra
5 years ago
Reply to  Colin 't Hart

Hans, this is a great article and what you explain is "spot on," which is great! However, Colin's response is EXACTLY the "reason" behind why postgres behaves this way. I wish the default was "UNBOUND PRECEDING AND UNBOUNDED FOLLOWING" because having windows functions "choose" default filtering is actually more confusing, IMO.

The postgres team made a decision based on what their experience has been instead of sticking to the facts that a window function is like a "join" to table itself and in other "JOINs," the table/view is not automatically filtered. I cannot think of the exact scenario, but I feel like this would be similar to filtering NULLs in a LEFT JOIN (because it is a "common" thing to do).

No disrespect to the postgres team as they have made an amazing product that I will continue to use for a long time, but this decision to implement a pattern that by default excludes rows from a table/view is certainly confusing because other patterns do not seem to go down this path.

In any case, this article lead me to the explanation of my exact problem (i.e. attempting to find the latest appointment date for a client using windows functions).

For that, I applaud you both!!

Dmitry Dorofeev
Dmitry Dorofeev
6 years ago

Funny enough:

SELECT *,
last_value(x) OVER (ORDER BY x DESC),
last_value(x) OVER (ORDER BY x)
FROM generate_series(1, 5) AS x;

would result with:

x | last_value | last_value
---+------------+------------
1 | 1 | 1
2 | 2 | 2
3 | 3 | 3
4 | 4 | 4
5 | 5 | 5
(5 rows)

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    4
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram