Timeseries are an increasingly important topic – not just in PostgreSQL. Recently I gave a presentation @AGIT in Salzburg about timeseries and I demonstrated some super simple examples. The presentation was well received, so I decided to share this stuff in the form of a blog, so that more people can learn about window functions and SQL in general. A link to the video is available at the end of the post so that you can listen to the original material in German.

Loading timeseries data the easy way

To show how data can be loaded, I compiled a basic dataset which can be found on my website. Here is how it works:

test=# CREATE TABLE t_oil 
	region 		text, 
	country 	text, 
	year 		int, 
	production 	int, 
	consumption 	int
test=# COPY t_oil FROM 
	PROGRAM 'curl https://www.cybertec-postgresql.com/secret/oil_ext.txt';
COPY 644

The cool thing is that if you happen to be a superuser, you can easily load the data from the web directly. COPY FROM PROGRAM allows you to execute code on the server and pipe it directly to PostgreSQL, which is super simple. Keep in mind: that only works if you are a PostgreSQL superuser (for security reasons).

lag: The backbone of timeseries analysis

If you are dealing with timeseries, calculating the difference to the previous period is really important. Fortunately, SQL allows you to do that pretty easily. Here is how it works:

test=# SELECT year, production, 
		lag(production, 1) OVER (ORDER BY year) 
	FROM 	t_oil 
	WHERE 	country = 'USA' 
 year | production |  lag  
 1965 |       9014 |      
 1966 |       9579 |  9014
 1967 |      10219 |  9579
 1968 |      10600 | 10219
 1969 |      10828 | 10600
(5 rows)

The lag functions takes two parameters: The first column defines the column, which should be used in this case. The second parameter is optional. If you skip it, the expression will be equivalent to lag(production, 1). In my example, the lag column will be off by one. However, you can use any integer number to move data up or down, given the order defined in the OVER clause.

What we have so far is the value of the previous period. Let us calculate the difference next:

test=# SELECT year, production, 
		production - lag(production, 1) OVER (ORDER BY year) AS diff 
	FROM 	t_oil 
	WHERE 	country = 'USA' 
 year | production | diff 
 1965 |       9014 |     
 1966 |       9579 |  565
 1967 |      10219 |  640
 1968 |      10600 |  381
 1969 |      10828 |  228
(5 rows)

That was easy. All we have to do is to take the current row and subtract the previous row.
Window functions are far more powerful than shown here, but maybe this example will help to get you started in the first place.

Calculating correlations using PostgreSQL

You may want to calculate the correlation between columns. PostgreSQL offers the “corr” function to do exactly that. The following listing shows a simple example:

test=# SELECT country, corr(production, consumption) 
	FROM 	t_oil 
       country        |        corr        
 Mexico               |  0.962790640608018
 Canada               |  0.932931452462893
 Qatar                |  0.925552359601189
 United Arab Emirates |  0.882953285119214
 Saudi Arabien        |  0.642815458284221

As you can see, the correlation in Mexico and Canada are highest.

Checking for continuous activity

In the past we presented other examples related to timeseries and analysis in general. One of the most interesting posts is found here.

If you want to see the entire short presentation in German consider checking out the following video.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.