A “materialized view” is a database object which stores the result of a precalculated database query and makes it easy to refresh this result as needed. Materialized views are an integral feature of pretty much all advanced database systems. Naturally, PostgreSQL also provides support for materialized views, and offers the end-user a powerful tool to handle more time-consuming requests.

The main questions are now: What are the pitfalls, and how can you make use of materialized views in the first place? Let’s dive in and find out.

Creating a materialized view

Before we can actually take a look at materialized views and figure out how they work, we have to import some sample data which we can use as the basis for our calculations:

demo=# CREATE TABLE t_demo (grp int, data numeric);
CREATE TABLE
demo=# INSERT INTO t_demo SELECT 1, random()
FROM generate_series(1, 5000000);
INSERT 0 5000000
demo=# INSERT INTO t_demo SELECT 2, random()
FROM generate_series(1, 5000000);
INSERT 0 5000000

We have created 10 million rows organized in 2 groups. To create a materialized view in PostgreSQL, we can make use of the following syntax specification:

demo=# \h CREATE MATERIALIZED VIEW
Command:   CREATE MATERIALIZED VIEW
Description: define a new materialized view
Syntax:
CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] table_name
     [ (column_name [, ...] ) ]
     [ USING method ]
     [ WITH ( storage_parameter [= value] [, ... ] ) ]
     [ TABLESPACE tablespace_name ]
     AS query
     [ WITH [ NO ] DATA ]

Basically, a materialized view has a name, some parameters, and is based on a query. Here is an example:

demo=# CREATE MATERIALIZED VIEW mat_view AS
           SELECT   grp, avg(data), count(*)
           FROM     t_demo
           GROUP BY 1;
SELECT 2

What is really important to note here is the size of the materialized view compared to the underlying table:

demo=# \d+
List of relations
Schema | Name     | Type              | Owner | Persistence | Size   | Description
-------+----------+-------------------+-------+-------------+--------+-------------
public | mat_view | materialized view | hs    | permanent   | 16 kB  |
public | t_demo   | table             | hs    | permanent   | 423 MB |
(2 rows)

423 MB vs. 16 KB is a real difference. If you want to query the materialized view, you can do it just like you would query a normal PostgreSQL table:

demo=# SELECT * FROM mat_view;
 grp |       	avg       	|  count  
-----+--------------------------+---------
   1 | 0.500091604165533141153  | 5000000
   2 | 0.499893365960709086730  | 5000000
(2 rows)

demo=# explain SELECT * FROM mat_view;
                      QUERY PLAN                     	 
-------------------------------------------------------------
 Seq Scan on mat_view  (cost=0.00..21.30 rows=1130 width=44)
(1 row)

Looking at the execution plan (explain) shows that there is no difference between a normal table and a materialized view. We can simply query the data.
At this point, the only suboptimal thing is the query optimizer’s estimate (= 1130 rows). However, that can easily be fixed. The process is the same as for any other table:

demo=# ANALYZE;
ANALYZE
demo=# explain SELECT * FROM mat_view;
                      QUERY PLAN 
---------------------------------------------------------
  Seq Scan on mat_view (cost=0.00..1.02 rows=2 width=26)
 (1 row)

ANALYZE has recreated the new optimizer statistics and fixed the estimate nicely. If you want to know more about ANALYZE in general, we recommend our posts about PostgreSQL optimizer statistics and autovacuum. Autovacuum is also able to recreate optimizer statistics.

CREATE MATERIALIZED VIEW … USING …

However, there is more. PostgreSQL offers some additional important features which are relevant in this field. One of those features is the USING clause. What is the purpose of this one? At the moment, PostgreSQL supports only one storage format (heap). However, in the future, we hope to see more storage formats such as zheap or zedstore.

The idea of USING is to have the ability to select the best storage format for the given requirement. At the moment, heap is the default format, and that’s perfectly fine.

CREATE MATERIALIZED VIEW … WITH …

The WITH-clause is the same as in CREATE TABLE. It allows you to define storage parameters such as autovacuum behavior, FILLFACTOR and so on. Here’s an example:

demo=# CREATE MATERIALIZED VIEW mat_view_2 
           WITH (autovacuum_enabled = false) AS 
           SELECT   grp, avg(data), count(*) 
           FROM     t_demo 
           GROUP BY 1;
           SELECT 2

In general, a materialized view is stored just like a table, so simply follow the same recommendations as you would follow when creating a standard table.

CREATE MATERIALIZED VIEW … TABLESPACE …

The same is true for the TABLESPACE keyword. In the default setup, the materialized view will be stored in the default tablespace which is $PGDATA:

demo=# SHOW data_directory;
     data_directory 
------------------
   /Users/hs//db13
(1 row)

SHOW is an easy way to figure out where that is. If you want to find out more about tablespaces, consider checking out our blog post about tablespaces in PostgreSQL.

CREATE MATERIALIZED VIEW … DATA vs NO DATA …

By default, a materialized view is created in a way that it contains the result. However, if the object is created using the NO DATA option, it is empty. Only the definition is created.

One has to be aware of the fact that in this case, the view cannot be queried until it has been refreshed to contain data. We have not seen the NO DATA option being used too frequently in database support. It’s still worth mentioning that this option does exist.

Modifying a materialized view in PostgreSQL

Sometimes a materialized view has to be modified later on. A classical example would be to rename a column of the view without recreating it. Let’s take a look at the definition of the view:

demo=# \d mat_view
     Materialized view "public.mat_view"
 Column |  Type   | Collation | Nullable | Default
--------+---------+-----------+----------+---------
 grp    | integer |           |          |
 avg    | numeric |           |          |
 count  | bigint  |           |          |

Modifying a materialized view is done using the ALTER MATERIALIZED VIEW command which is a powerful tool to facilitate all kinds of changes. The syntax is pretty similar to what ALTER TABLE can do for you:

demo=# \h ALTER MATERIALIZED VIEW
Command: ALTER MATERIALIZED VIEW
Description: change the definition of a materialized view
Syntax:
ALTER MATERIALIZED VIEW [ IF EXISTS ] name
     action [, ... ]
ALTER MATERIALIZED VIEW name
     DEPENDS ON EXTENSION extension_name
ALTER MATERIALIZED VIEW [ IF EXISTS ] name
     RENAME [ COLUMN ] column_name TO new_column_name
ALTER MATERIALIZED VIEW [ IF EXISTS ] name
     RENAME TO new_name
ALTER MATERIALIZED VIEW [ IF EXISTS ] name
     SET SCHEMA new_schema
ALTER MATERIALIZED VIEW ALL IN TABLESPACE name
     [ OWNED BY role_name [, ... ] ]
     SET TABLESPACE new_tablespace [ NOWAIT ]

where action is one of:

     ALTER [ COLUMN ] column_name SET STATISTICS integer
     ALTER [ COLUMN ] column_name SET
          ( attribute_option = value [, ... ] )
     ALTER [ COLUMN ] column_name RESET ( attribute_option [, ... ] )
     ALTER [ COLUMN ] column_name SET STORAGE
          { PLAIN | EXTERNAL | EXTENDED | MAIN }
     CLUSTER ON index_name
     SET WITHOUT CLUSTER
     SET ( storage_parameter [= value] [, ... ] )
     RESET ( storage_parameter [, ... ] )
     OWNER TO { new_owner | CURRENT_USER | SESSION_USER }

What is noteworthy here is that you can also do things such as CLUSTER (= index organize), SET STATISTICS (= adjust size of histogram for the PostgreSQL optimizer) and so on. The process is the same as for a normal table.

Let’s try it out and simply rename a column:

demo=# ALTER MATERIALIZED VIEW mat_view 
     RENAME COLUMN avg TO average;
ALTER MATERIALIZED VIEW
demo=# \d mat_view
     Materialized view "public.mat_view"
 Column  |  Type   | Collation | Nullable | Default
---------+---------+-----------+----------+---------
 grp 	 | integer |           |      	  |
 average | numeric |           |      	  |
 count   | bigint  |           |       	  |

Before we move on to refreshing, I want to focus your attention on indexing:

demo=# CREATE INDEX idx_average ON mat_view (average);
CREATE INDEX

It is a good idea to index a materialized view to ensure fast querying. Again, the same rules as for a table exist. Simply index entries which are diverse and offer good selectivity when the materialized view is read.

Refresh materialized view

A materialized view contains a snapshot of the query result. It is not updated periodically, unless the user forces PostgreSQL to do so. In Oracle, materialized view support is a bit more sophisticated. Hopefully, PostgreSQL will soon catch up in this area a bit.

However, at the moment,a materialized view in PostgreSQL has to be refreshed manually using the REFRESH MATERIALIZED VIEW command:

demo=# \h REFRESH MATERIALIZED VIEW
Command: 	REFRESH MATERIALIZED VIEW
Description: replace the contents of a materialized view
Syntax:
REFRESH MATERIALIZED VIEW [ CONCURRENTLY ] name
	[ WITH [ NO ] DATA ]

Running the refresh is simple:

demo=# REFRESH MATERIALIZED VIEW mat_view;
REFRESH MATERIALIZED VIEW

PostgreSQL knows the query which has to be re-executed to refresh the content of the materialized view. There is just one downside: PostgreSQL has to lock the object for the time of the refresh, which means that nobody can access it while it is refreshed.

Refreshing materialized views concurrently

Sometimes it is simply not acceptable to tolerate a locked materialized view. In that case, concurrent updates are needed. However, there is a problem:

demo=# REFRESH MATERIALIZED VIEW CONCURRENTLY mat_view;
ERROR:  cannot refresh materialized view "public.mat_view" concurrently
HINT:  Create a unique index with no WHERE clause on one or more columns of the materialized view.

The materialized view needs a unique column to support concurrent refreshing. That means you have to determine a “primary key” and create a unique index:

demo=# CREATE UNIQUE INDEX idx_grp ON mat_view (grp);
CREATE INDEX
demo=# REFRESH MATERIALIZED VIEW CONCURRENTLY mat_view;
REFRESH MATERIALIZED VIEW

What is important to note here is that you need to be a bit careful, and make sure that table bloat does not take over. Technically, it is a DELETE / INSERT, which is prone to cause table bloat. Make sure a proper VACUUMpolicy is in place.

Finally …

Materialized views are an important feature in most databases, including PostgreSQL. They can help to speed up large calculations – or at least to cache them.

If you want to make sure that your materialized views are up to date and if you want to read more about PostgreSQL right now, check out our blog about pg_timetable which shows you how to schedule jobs in PostgreSQL. Why is pg_timetable so useful? Our scheduler makes sure that identical jobs cannot overlap, and that they simply don’t execute again, in case the same job is already running. In case of long jobs, this is super important – especially if you want to use materialized views.