Timeseries are an increasingly important topic – not just in PostgreSQL. Recently I gave a presentation @AGIT in Salzburg about timeseries and I demonstrated some super simple examples. The presentation was well received, so I decided to share this stuff in the form of a blog PostgreSQL, so that more people can learn about windowing functions and SQL in general. A link to the video is available at the end of the post so that you can listen to the original material in German.

Loading timeseries data the easy way

To show how data can be loaded, I compiled a simple dataset, which can be found on my website. Here is how it works:

test=# CREATE TABLE t_oil 
	region 		text, 
	country 	text, 
	year 		int, 
	production 	int, 
	consumption 	int
test=# COPY t_oil FROM 
	PROGRAM 'curl https://www.cybertec-postgresql.com/secret/oil_ext.txt';
COPY 644

The cool thing is that if you happen to be a superuser, you can easily load the data from the web directly. COPY FROM PROGRAM allows you to execute code on the server and pipe it directly to PostgreSQL, which is super simple. But keep in mind: This only works if you are a PostgreSQL superuser (for security reasons).

lag: The backbone of timeseries analysis

If you are dealing with timeseries, calculating the difference to the previous period is really important and is needed in many cases. Fortunately SQL allows you to do that pretty easily. Here is how it works:

test=# SELECT year, production, 
		lag(production, 1) OVER (ORDER BY year) 
	FROM 	t_oil 
	WHERE 	country = 'USA' 
 year | production |  lag  
 1965 |       9014 |      
 1966 |       9579 |  9014
 1967 |      10219 |  9579
 1968 |      10600 | 10219
 1969 |      10828 | 10600
(5 rows)

The lag functions takes two parameters: The first column defines the column, which should be used in this case. The second parameter is optional. If you skip it, the expression will be equivalent to lag(production, 1). In my example the lag column will be off by one. However, you can use any integer number to move data up or down given the order defined in the OVER clause.

What we got so far is the value of the previous period. Let us calculate the difference next:

test=# SELECT year, production, 
		production - lag(production, 1) OVER (ORDER BY year) AS diff 
	FROM 	t_oil 
	WHERE 	country = 'USA' 
 year | production | diff 
 1965 |       9014 |     
 1966 |       9579 |  565
 1967 |      10219 |  640
 1968 |      10600 |  381
 1969 |      10828 |  228
(5 rows)

That was easy. All we have to do is to take the current row and subtract the previous row.
Windowing functions are way more powerful than shown here but maybe this example is good to getting you started in the first place.

Calculating correlations using PostgreSQL

Sometimes you might want to calculate the correlation between columns. PostgreSQL offers the “corr” function to do exactly that. The following listing shows a simple example:

test=# SELECT country, corr(production, consumption) 
	FROM 	t_oil 
       country        |        corr        
 Mexico               |  0.962790640608018
 Canada               |  0.932931452462893
 Qatar                |  0.925552359601189
 United Arab Emirates |  0.882953285119214
 Saudi Arabien        |  0.642815458284221

As you can see the correlation in Mexico and Canada are highest.

Checking for continuous activity

In the past we presented other examples related to timeseries and analysis in general. One of the most interesting posts is found here.

If you want to see the entire short presentation in German consider checking out the following video.