Understanding Multiranges

Range types have been around in PostgreSQL for quite some time and are successfully used by developers to store various kinds of intervals with upper and lower bounds. However, in PostgreSQL 14 a major new feature has been added to the database which makes this feature even more powerful: multiranges. To help you understand multiranges, I have compiled a brief introduction outlining the basic concepts and most important features.

Getting started with range types

Before we dive into multiranges, I want to quickly introduce basic range types so that you can compare what was previously available with what is available now. Here’s a simple example:

test=# SELECT int4range(10, 20);
 int4range
-----------
   [10,20)
(1 row)

The important thing to note here is that ranges can be formed on the fly. In this case I’ve created an int4 range. The ranges starts at 10 (which is included) and ends at 20 (which is not included anymore). Basically, every numeric data type that can be sorted can be turned into a range.

What’s important to see is that PostgreSQL will ensure that the range is indeed valid which means that the upper bound has to be higher than the lower bound and so on:

test=# SELECT int4range(432, 20);
ERROR: range lower bound must be less than or equal to range upper bound

What is the purpose of a range type in the first place?

Well, in addition to validation, PostgreSQL provides a rich set of operators. One of the basic operations you’ll often need is a way to check whether a value is inside a range or not. Here’s how it works:

test=# SELECT 17 <@ int4range(10, 20); 
 ?column?
----------
 t
(1 row)

You can see from the above that 17 is indeed between 10 and 20 in the table. It’s a simple example, but this is also very useful in more complex cases.

It’s also possible to use ranges as data types which can be part of a table:

test=# CREATE TABLE t_range (id serial, t tstzrange);
CREATE TABLE
test=# INSERT INTO t_range (t)
          VALUES ('["2022-01-01", "2022-12-31"]') RETURNING *;
 id | t
----+-----------------------------------------------------
  1 | ["2022-01-01 00:00:00-05","2022-12-31 00:00:00-05"]
(1 row)
INSERT 0 1

In this case, I’ve used the tstzrange (“timestamp with time zone range”), and I’ve successfully inserted a value.

The beauty here is that the operators are available are consistent and work the same way for all range types. This makes them relatively easy to use.

Multiranges: Range types of steroids

So what are multiranges? The idea is simple: A multirange is a compilation of ranges. Instead of storing just two values, you can store as many pairs as you want. Let ‘s take a look at a practical example:

test=# SELECT int4multirange(int4range(10, 20), int4range(40, 50));
   int4multirange
-------------------
 {[10,20),[40,50)}
(1 row)

In this case, the multirange has been formed on the fly using two ranges. However, you can stuff in as many value pairs as you want:

test=# SELECT int4multirange(
                  int4range(10, 20),
                  int4range(40, 50),
                  int4range(56, 62), 
                  int4range(80, 90)
);
           int4multirange
-----------------------------------
 {[10,20),[40,50),[56,62),[80,90)}
(1 row)

PostgreSQL has a really nice features which can be seen when ranges are formed. Consider the following example:

test=# SELECT int4multirange(
                  int4range(10, 20),
                  int4range(12, 25),
                  int4range(38, 42)
);
  int4multirange
-------------------
 {[10,25),[38,42)}
(1 row)

As you can see, multiple ranges are folded together to form a set of non-overlapping parts. This happens automatically and greatly reduces complexity.

A multirange can also be used as a data type and stored inside a table just like any other value:

test=# CREATE TABLE t_demo (id int, r int4multirange);
CREATE TABLE
test=# INSERT INTO t_demo
          SELECT 1, int4multirange(
                       int4range(10, 20),
                       int4range(12, 25),
                       int4range(38, 42))
       RETURNING *;
 id |        r
----+-------------------
  1 | {[10,25),[38,42)}
(1 row)
INSERT 0 1

The way you query this type of column is pretty straightforward and works as follows:


test=# SELECT * FROM t_demo WHERE 17 <@ r;
 id |        r
----+-------------------
  1 | {[10,25),[38,42)}

You can simply apply the operator on the column, and you’re good to do. In our example, one row is returned which is exactly what we expect.

Handling “infinity” inside ranges

So far, you’ve seen that ranges have a beginning and a clearly defined end. However, there is more.

In PostgreSQL, a range is aware of the concept of “infinity”.

Here is an example:

test=# SELECT
           numrange(NULL, 10);
 numrange
----------
    (,10)
(1 row)

This range starts with -INFINITY and ends at 10 (not included).

Again, you’re able to fold single ranges into a bigger block:

test=# SELECT numrange(NULL, 10), numrange(8, NULL);
 numrange | numrange
----------+----------
    (,10) | [8,)
(1 row)

test=# SELECT nummultirange(
                numrange(NULL, 10),
                numrange(8, NULL)
);
 nummultirange
---------------
 {(,)}
(1 row)

The example shows a range than spans all numbers (= – INFINITY all the way to +INFINITY). The following query is proof of this fact:

test=# SELECT 165.4 <@ nummultirange(
                          numrange(NULL, 10),
                          numrange(8, NULL)
                       );
 ?column?
---------- 
 t
(1 row)

The result is true: 165.4 is indeed within the range of valid numbers.

Performing basic calculations

That’s not all: you can perform basic operations using these ranges. Now just imagine what it would take to do such calculations by hand in some application. It would be slow, error-prone and in general, pretty cumbersome:


test=# SELECT nummultirange(numrange(1, 20)) - nummultirange(numrange(4, 6));
    ?column?
----------------
 {[1,4),[6,20)}
(1 row)

You can deduct ranges from each other and form a multirange, which represents two ranges and a gap in between. The next example shows how intersections can be calculated:

test=# SELECT nummultirange(numrange(1, 20)) * nummultirange(numrange(4, 6));
 ?column?
----------
  {[4,6)}
(1 row)

There are many more of these kinds of operations you can use to work with ranges. A complete list of all functions and operators is available in the PostgreSQL documentation.

Multiranges: aggregating ranges

The final topic I want to present is how to aggregate ranges into bigger blocks. Often, data is available in a standard form, and you want to do something fancy with it. Let ‘s create a table:


test=# CREATE TABLE t_holiday (name text, h_start date, h_end date);
CREATE TABLE
test=# INSERT INTO t_holiday
         VALUES ('hans', '2020-01-10', '2020-01-15');
INSERT 0 1
test=# INSERT INTO t_holiday
         VALUES ('hans', '2020-03-04', '2020-03-12');
INSERT 0 1
test=# INSERT INTO t_holiday
         VALUES ('joe', '2020-01-04', '2020-03-02');
INSERT 0 1

I’ve created a table and added 3 rows. Note that data is stored in the traditional way (“from / to”).

What you can do now is to form ranges on the fly and create a multirange using the range_agg function:

test=# SELECT range_agg(daterange(h_start, h_end)) FROM t_holiday;
                       range_agg
---------------------------------------------------
 {[2020-01-04,2020-03-02),[2020-03-04,2020-03-12)}
(1 row)

What’s important here is that you can do more than create a multirange – you can also unnest such a data type and convert it into single entries:

test=# SELECT x, lower(x), upper(x)
       FROM   unnest((SELECT range_agg(daterange(h_start, h_end))
                      FROM t_holiday)) AS x;
            x            |    lower   | upper
-------------------------+------------+------------
 [2020-01-04,2020-03-02) | 2020-01-04 | 2020-03-02
 [2020-03-04,2020-03-12) | 2020-03-04 | 2020-03-12
(2 rows)

Let’s think about this query: Basically, aggregation and unnesting allow you to answer questions such as: “How many continuous periods of activity did we see?” That can be really useful if you are, for example, running data warehouses.

Finally…

In general, multiranges are a really valuable extension on top of what PostgreSQL already has to offer. This new feature in  PostgreSQL v14 allows us to offload many things to the database which would have otherwise been painful to do and painful to implement on your own.

If you want to learn more about PostgreSQL data types, take a look at Laurenz Albe’s recent blog on Query Parameter Data Types and Performance, and if you are interested in data type abstraction check out my blog about CREATE DOMAIN.