In SQL, the concept of foreign keys is an important one that can be found in all professional databases used in the industry. The core idea is to prevent your PostgreSQL database from storing inconsistent data by enforcing constraints ensuring the correctness of your tables (at least as far as relations between objects are concerned). Referential integrity is therefore one of the most important concepts ever invented in IT.

However, foreign keys will introduce some issues which you have to take care of when writing applications. If there are no foreign keys, you can insert data into any table in any order. PostgreSQL does not care. However, if a foreign key is in place, order starts to matter (at least in a typical scenario but more on that later).

Foreign keys and order

To show the importance of order, we have to create a data model first:

CREATE TABLE t_currency
(
	id		int,
	shortcut	char (3),
	PRIMARY KEY (id)
);

CREATE TABLE t_location
(
	id		int,
	location_name	text,
	PRIMARY KEY (id)
);

CREATE TABLE t_product
(
	id		int,
	name		text,
	currency_id	int	REFERENCES t_currency (id),
	PRIMARY KEY (id)
);

CREATE TABLE t_product_desc
(
	id		int,
	product_id	int	REFERENCES t_product (id),
	description	text,
	PRIMARY KEY (id)
);

CREATE TABLE t_product_stock
(
	product_id	int	REFERENCES t_product (id),
	location_id	int	REFERENCES t_location (id),
	amount		numeric	CHECK (amount >= 0)
);

We want to store currencies, products, as well as product descriptions. Basically it is a very simple data model. Let us see if we happen to insert into the product table:

test=# INSERT INTO t_product VALUES (1, 'PostgreSQL consulting', 1);
ERROR:  insert or update on table "t_product" violates foreign key constraint "t_product_currency_id_fkey"
DETAIL:  Key (currency_id)=(1) is not present in table "t_currency".
test=# INSERT INTO t_product VALUES (1, 'PostgreSQL consulting', NULL);
INSERT 0 1

Logically the first INSERT is going to fail because currency number 1 does not exist yet. If we want to INSERT, we have to use a NULL value (= unknown currency). In order words: We have to fill the currency table first, then insert locations, and so on. The order does matter in the default case.

Determining the correct insertion order for foreign keys

If you have to start using an existing data model, it can be a bit hard to wrap your head around this stuff. Populating an empty data model can be a bit tricky. So why not write a query telling us the order in which we are supposed to insert data?

Well, here is that magic query…

WITH RECURSIVE fkeys AS (
   /* source and target tables for all foreign keys */
   SELECT conrelid AS source,
          confrelid AS target
   FROM pg_constraint
   WHERE contype = 'f'
),
tables AS (
      (   /* all tables ... */
          SELECT oid AS table_name,
                 1 AS level,
                 ARRAY[oid] AS trail,
                 FALSE AS circular
          FROM pg_class
          WHERE relkind = 'r'
            AND NOT relnamespace::regnamespace::text LIKE ANY
                    (ARRAY['pg_catalog', 'information_schema', 'pg_temp_%'])
       EXCEPT
          /* ... except the ones that have a foreign key */
          SELECT source,
                 1,
                 ARRAY[ source ],
                 FALSE
          FROM fkeys
      )
   UNION ALL
      /* all tables with a foreign key pointing a table in the working set */
      SELECT fkeys.source,
             tables.level + 1,
             tables.trail || fkeys.source,
             tables.trail @> ARRAY[fkeys.source]
      FROM fkeys
         JOIN tables ON tables.table_name = fkeys.target
      /*
       * Stop when a table appears in the trail the third time.
       * This way, we get the table once with "circular = TRUE".
       */
      WHERE cardinality(array_positions(tables.trail, fkeys.source)) < 2
),
ordered_tables AS (
   /* get the highest level per table */
   SELECT DISTINCT ON (table_name)
          table_name,
          level,
          circular
   FROM tables
   ORDER BY table_name, level DESC
)
SELECT table_name::regclass,
       level
FROM ordered_tables
WHERE NOT circular
ORDER BY level, table_name;

The query is not trivial to read, but I have done my best to document it a bit. Basically, the PostgreSQL system tables have all the information we need to determine the correct order. Here is the output:


   table_name    | level
-----------------+-------
 t_currency      |     1
 t_location      |     1
 t_product       |     2
 t_product_desc  |     3
 t_product_stock |     3
(5 rows)

As you can see, the query has correctly given us the tables in the desired order. First, we have to insert into all tables at level one and so on. If we stick to this order, referential integrity will always be ensured (assuming the data is correct).

Making use of “initially deferred” constraints

In some cases, the insertion order can be a nasty thing to deal with. What if we had the means to tell PostgreSQL to ignore the order and check integrity on commit instead? This is exactly what “initially deferred” does. Here is how it works:

BEGIN;

CREATE TABLE t_currency
(
	id            int,
	shortcut      char (3),
	PRIMARY KEY (id)
);

CREATE TABLE t_product
(
	id            int,
	name          text,
	currency_id   int	REFERENCES t_currency (id)
				INITIALLY DEFERRED,
	PRIMARY KEY (id)
);

INSERT INTO t_product VALUES (1, 'PostgreSQL support', 1);
INSERT INTO t_currency VALUES (1, 'EUR');

COMMIT;

In this case, we can modify data in any order we want. As long as integrity is guaranteed to be intact at the end of the transaction, PostgreSQL is perfectly fine. PostgreSQL will postpone the constraint check and take some burden off the developer.

Finally …

If you want to learn more about advanced SQL, you might want to take a look at my blog about some more advanced windowing functions (with ties). So put on your tie and read to learn more.