Edgar Allan Poe on view dependencies
© Laurenz Albe 2018

 

We all know that in PostgreSQL we cannot drop an object if there are view dependencies on it:

CREATE TABLE t (id integer PRIMARY KEY);

CREATE VIEW v AS SELECT * FROM t;

DROP TABLE t;
ERROR:  cannot drop table t because other objects depend on it
DETAIL:  view v depends on table t
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

ALTER TABLE t DROP id;
ERROR:  cannot drop column id of table t because other objects depend on it
DETAIL:  view v depends on column id of table t
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

Some people like it because it keeps the database consistent; some people hate it because it makes schema modifications more difficult. But that’s the way it is.

In this article I want to explore the mechanics behind view dependencies and show you how to track what views depend on a certain PostgreSQL object.

Why would I need that?

Imagine you want to modify a table, e.g. change a column’s data type from integer to bigint because you realize you will need to store bigger numbers.
However, you cannot do that if there are views that use the column. You first have to drop those views, then change the column and then run all the CREATE VIEW statements to create the views again.

As the example shows, editing tables can be quite a challenge if there is a deep hierarchy of views, because you have to create the views in the correct order. You cannot create a view unless all the objects it requires are present.

Best practices with views

Before I show you how to untangle the mess, I’d like to tell you what mistakes to avoid when you are using views in your database design (excuse my switching to teacher mode; I guess holding courses has that effect on you).

Views are good for two things:

  • They allow you to have a recurring SQL query or expression in one place for easy reuse.
  • They can be used as an interface to abstract from the actual table definitions, so that you can reorganize the tables without having to modify the interface.

Neither of these applications require  you to “stack” views, that is, define views over views.

There are two patterns of using views that tend to be problematic, and they both stem from the mistaken idea that a view works exactly like a table, just because it looks like one:

  • Defining many layers of views so that your final queries look deceptively simple.
    However, when you try to unravel the views, for example by looking at the execution plan, the query turns out to be so complicated that it is almost impossible to understand what is really going on and how to improve it.
  • Defining a denormalized “world view” which is just a join of all your database tables and using that for all of your queries.
    People who do that tend to be surprised when certain WHERE conditions work well, but others take impossibly long.

Never forget that a view is just a “crystallized” SQL statement and gets replaced by its definition when the query is executed.

How are views stored in PostgreSQL?

A view in PostgreSQL is not that different from a table: it is a “relation”, that is “something with columns”.
All such objects are stored in the catalog table pg_class.

As the documentation states, a view is almost the same as a table, with a few exceptions:

  • it has no data file (because it holds no data)
  • its relkind is “v” rather than “r
  • it has an ON SELECT rule called “_RETURN

This “query rewrite rule” contains the definition of the view and is stored in the ev_action column of the pg_rewrite catalog table.

Note that the view definition is not stored as a string, but in the form of a “query parse tree”. Views are parsed when they are created, which has several consequences:

  • Object names are resolved during CREATE VIEW, so the current setting of search_path applies.
  • Objects are referred to by their internal immutable “object ID” rather than by their name. Consequently, it is no problem to rename an object or column used in a view definition.
  • PostgreSQL knows exactly which objects are used in the view definition, so it can add dependencies on them.

Note that the way PostgreSQL handles views quite different from the way PostgreSQL handles functions: function bodies are stored as strings and not parsed when they are created. Consequently, PostgreSQL cannot know on which objects a given function depends.

How are the dependencies stored?

All dependencies (except those on “shared objects”) are stored in the catalog table pg_depend:

  • classid stores the object ID of the catalog table containing the dependent object
  • objid stores the ID of the dependent object
  • objsubid stores the column number if the dependency is for a column
  • refclassid, refobjid and refobjsubid are like the three columns above, but describe the object referenced by the dependency
  • deptype describes the kind of dependency

It is important to notice that there is no direct dependency of a view on the objects it uses: the dependent object is actually the view’s rewrite rule. That adds another layer of indirection.

A simple example

In the following, I’ll use this schema to test my queries:

CREATE TABLE t1 (
   id integer PRIMARY KEY,
   val text NOT NULL
);

INSERT INTO t1 VALUES
   (1, 'one'),
   (2, 'two'),
   (3, 'three');

CREATE FUNCTION f() RETURNS text
   LANGUAGE sql AS 'SELECT ''suffix''';

CREATE VIEW v1 AS
SELECT max(id) AS id
FROM t1;

CREATE VIEW v2 AS
SELECT t1.val
FROM t1 JOIN v1 USING (id);

CREATE VIEW v3 AS
SELECT val || f()
FROM t1;

I have thrown in a function, just to show that a view can depend on objects other than tables.

In the following I will concentrate on tables and columns, but the queries will work for functions too, if you replace the catalog pg_class that contains tables with the catalog pg_proc that contains functions.

Finding direct view dependencies on a table

To find out which views directly depend on table t1, you would query like this:

SELECT v.oid::regclass AS view
FROM pg_depend AS d      -- objects that depend on the table
   JOIN pg_rewrite AS r  -- rules depending on the table
      ON r.oid = d.objid
   JOIN pg_class AS v    -- views for the rules
      ON v.oid = r.ev_class
WHERE v.relkind = 'v'    -- only interested in views
  -- dependency must be a rule depending on a relation
  AND d.classid = 'pg_rewrite'::regclass
  AND d.refclassid = 'pg_class'::regclass
  AND d.deptype = 'n'    -- normal dependency
  AND d.refobjid = 't1'::regclass;

 view 
------
 v2
 v1
 v3
 v2
(4 rows)

To find views with direct dependencies on the function f, simply replace “d.refclassid = 'pg_class'::regclass” with “d.refclassid = 'pg_proc'::regclass” and “refobjid = 't1'::regclass” with “refobjid = 'f'::regproc”.

Actually, the views will usually not depend on the table itself, but on the columns of the table (the exception is if a so-called “whole-row reference” is used in the view). That is why the view v2 shows up twice in the above list. You can remove those duplicates using DISTINCT.

Finding direct dependencies on a table column

We can modify the above query slightly to find those views that depend on a certain table column, which can be useful if you are planning to drop a column (adding a column to the base table is never a problem).

The following query finds the views that depend on the column val of table t1:

SELECT v.oid::regclass AS view
FROM pg_attribute AS a   -- columns for the table
   JOIN pg_depend AS d   -- objects that depend on the column
      ON d.refobjsubid = a.attnum AND d.refobjid = a.attrelid
   JOIN pg_rewrite AS r  -- rules depending on the column
      ON r.oid = d.objid
   JOIN pg_class AS v    -- views for the rules
      ON v.oid = r.ev_class
WHERE v.relkind = 'v'    -- only interested in views
  -- dependency must be a rule depending on a relation
  AND d.classid = 'pg_rewrite'::regclass
  AND d.refclassid = 'pg_class'::regclass 
  AND d.deptype = 'n'    -- normal dependency
  AND a.attrelid = 't1'::regclass
  AND a.attname = 'val';

 view 
------
 v3
 v2
(2 rows)

Recursively finding all dependent views

Now if you haven’t heeded the advice I gave above and you went ahead and defined a complicated hierarchy of views, it doesn’t stop with direct dependencies.
Rather, you need to recursively go through the whole hierarchy.

For example, let’s assume that you want to DROP and re-create the table t1 from our example and you need the CREATE VIEW statements to re-create the views once you are done (dropping them won’t be a problem if you use DROP TABLE t1 CASCADE).

Then you need to use the above queries in a recursive “common table expression” (CTE). The CTE is for tracking recursive view dependencies and can be reused for all such requirements; the only difference will be in the main query.

WITH RECURSIVE views AS (
   -- get the directly depending views
   SELECT v.oid::regclass AS view,
          v.relkind = 'm' AS is_materialized,
          1 AS level
   FROM pg_depend AS d
      JOIN pg_rewrite AS r
         ON r.oid = d.objid
      JOIN pg_class AS v
         ON v.oid = r.ev_class
   WHERE v.relkind IN ('v', 'm')
     AND d.classid = 'pg_rewrite'::regclass
     AND d.refclassid = 'pg_class'::regclass
     AND d.deptype = 'n'
     AND d.refobjid = 't1'::regclass
UNION
   -- add the views that depend on these
   SELECT v.oid::regclass,
          v.relkind = 'm',
          views.level + 1
   FROM views
      JOIN pg_depend AS d
         ON d.refobjid = views.view
      JOIN pg_rewrite AS r  
         ON r.oid = d.objid
      JOIN pg_class AS v
         ON v.oid = r.ev_class
   WHERE v.relkind IN ('v', 'm')
     AND d.classid = 'pg_rewrite'::regclass
     AND d.refclassid = 'pg_class'::regclass
     AND d.deptype = 'n'
     AND v.oid <> views.view  -- avoid loop
)
SELECT format('CREATE%s VIEW %s AS%s',
              CASE WHEN is_materialized
                   THEN ' MATERIALIZED'
                   ELSE ''
              END,
              view,
              pg_get_viewdef(view))
FROM views
GROUP BY view, is_materialized
ORDER BY max(level);

                  format                   
-------------------------------------------
 CREATE VIEW v3 AS SELECT (t1.val || f()) +
    FROM t1;
 CREATE VIEW v1 AS SELECT max(t1.id) AS id+
    FROM t1;
 CREATE VIEW v2 AS SELECT t1.val          +
    FROM (t1                              +
      JOIN v1 USING (id));
(3 rows)

We need the GROUP BY because a view may depend on an object in more than one ways: in our example, v2 depends on t1 twice: once directly, and once indirectly via v1.

Have questions? Need PostgreSQL support? You can reach us here.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.