A foreign key is no wrong key!
© Laurenz Albe 2018

 

Foreign key constraints are an important tool to keep your database consistent while also documenting relationships between tables.

A fact that is often ignored is that foreign keys need proper indexing to perform well.

This article will explain that and show you how to search for missing indexes.

Index at the target of a foreign key

In the following, I will call the table on which the foreign key constraint is defined the source table and the referenced table the target table.

The referenced columns in the target table must have a primary key or unique constraint. Such constraints are implemented with unique indexes in PostgreSQL. Consequently, the target side of a foreign key is automatically indexed.

This is required so that there is always a well-defined row to which the foreign key points. The index also comes handy if you want to find the row in the target table that matches a row in the source table.

Index at the source of a foreign key

In contrast to the above, PostgreSQL requires no index at the source of a foreign key.

However, such an index is quite useful for finding all source rows that reference a target row. The typical cases where you need that are:

1. You perform a join between the two tables where you explicitly search for the source rows referencing one or a few target rows. If there is an index on the columns at the source, PostgreSQL can use an efficient nested loop join.

This is well known and pretty obvious.

2. You delete rows or update key columns in the target table.

Then PostgreSQL has to check if the foreign key constraint is still satisfied. It does so by searching if there are rows in the source table that would become orphaned by the data modification. Without an index, this requires a sequential scan of the source table.

An example

Let’s build a source and a target table:

-- to make the plans look simpler
SET max_parallel_workers_per_gather = 0;
-- to speed up CREATE INDEX
SET maintenance_work_mem = '512MB';

CREATE TABLE target (
   t_id integer NOT NULL,
   t_name text NOT NULL
);
INSERT INTO target (t_id, t_name)
   SELECT i, 'target ' || i
   FROM generate_series(1, 500001) AS i;

ALTER TABLE target
   ADD PRIMARY KEY (t_id);

CREATE INDEX ON target (t_name);

/* set hint bits and collect statistics */
VACUUM (ANALYZE) target;

CREATE TABLE source (
   s_id integer NOT NULL,
   t_id integer NOT NULL,
   s_name text NOT NULL
);
INSERT INTO source (s_id, t_id, s_name)
   SELECT i, (i - 1) % 500000 + 1, 'source ' || i
   FROM generate_series(1, 1000000) AS i;

ALTER TABLE source
   ADD PRIMARY KEY (s_id);

ALTER TABLE source
   ADD FOREIGN KEY (t_id) REFERENCES target;

/* set hint bits and collect statistics */
VACUUM (ANALYZE) source;

Query time without an index

Looking up source rows via the link to target and deleting rows from target are unreasonably slow:

EXPLAIN (ANALYZE)
SELECT source.s_name
FROM source
   JOIN target USING (t_id)
WHERE target.t_name = 'target 42';

                          QUERY PLAN
----------------------------------------------------------------
 Hash Join  (cost=8.45..19003.47 rows=2 width=13)
            (actual time=0.150..360.920 rows=2 loops=1)
   Hash Cond: (source.t_id = target.t_id)
   ->  Seq Scan on source
                    (cost=0.00..16370.00 rows=1000000 width=17)
              (actual time=0.011..155.964 rows=1000000 loops=1)
   ->  Hash  (cost=8.44..8.44 rows=1 width=4)
             (actual time=0.111..0.111 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=4)
                      (actual time=0.105..0.107 rows=1 loops=1)
               Index Cond: (t_name = 'target 42'::text)
 Planning time: 0.701 ms
 Execution time: 360.982 ms
(9 rows)

EXPLAIN (ANALYZE)
DELETE FROM target
WHERE target.t_name = 'target 500001';

                          QUERY PLAN
----------------------------------------------------------------
 Delete on target  (cost=0.42..8.44 rows=1 width=6)
                   (actual time=0.178..0.178 rows=0 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=6)
                      (actual time=0.107..0.109 rows=1 loops=1)
         Index Cond: (t_name = 'target 500001'::text)
 Planning time: 0.165 ms
 Trigger for constraint source_t_id_fkey: time=153.804 calls=1
 Execution time: 154.030 ms
(6 rows)

Query time with an index

After creating the appropriate index:

CREATE INDEX source_t_id_idx ON source (t_id);

the queries are as fast as they should be:

EXPLAIN (ANALYZE)
SELECT source.s_name
FROM source
   JOIN target USING (t_id)
WHERE target.t_name = 'target 42';

                          QUERY PLAN
----------------------------------------------------------------
 Nested Loop  (cost=0.85..19.89 rows=2 width=13)
              (actual time=0.068..0.076 rows=2 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=4)
                      (actual time=0.048..0.049 rows=1 loops=1)
         Index Cond: (t_name = 'target 42'::text)
   ->  Index Scan using source_t_id_idx on source
                             (cost=0.42..11.43 rows=2 width=17)
                      (actual time=0.013..0.019 rows=2 loops=1)
         Index Cond: (t_id = target.t_id)
 Planning time: 1.238 ms
 Execution time: 0.147 ms
(7 rows)

EXPLAIN (ANALYZE)
DELETE FROM target
WHERE target.t_name = 'target 500001';

                           QUERY PLAN
----------------------------------------------------------------
 Delete on target  (cost=0.42..8.44 rows=1 width=6)
                   (actual time=0.135..0.135 rows=0 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=6)
                      (actual time=0.091..0.094 rows=1 loops=1)
         Index Cond: (t_name = 'target 500001'::text)
 Planning time: 0.151 ms
 Trigger for constraint source_t_id_fkey: time=0.557 calls=1
 Execution time: 0.751 ms
(6 rows)

How to check for missing indexes?

The following query will list all foreign key constraints in the database that do not have an index on the source columns:

SELECT c.conrelid::regclass AS "table",
       /* list of key column names in order */
       string_agg(a.attname, ',' ORDER BY x.n) AS columns,
       pg_catalog.pg_size_pretty(
          pg_catalog.pg_relation_size(c.conrelid)
       ) AS size,
       c.conname AS constraint,
       c.confrelid::regclass AS referenced_table
FROM pg_catalog.pg_constraint c
   /* enumerated key column numbers per foreign key */
   CROSS JOIN LATERAL
      unnest(c.conkey) WITH ORDINALITY AS x(attnum, n)
   /* name for each key column */
   JOIN pg_catalog.pg_attribute a
      ON a.attnum = x.attnum
         AND a.attrelid = c.conrelid
WHERE NOT EXISTS
        /* is there a matching index for the constraint? */
        (SELECT 1 FROM pg_catalog.pg_index i
         WHERE i.indrelid = c.conrelid
           /* the first index columns must be the same as the
              key columns, but order doesn't matter */
           AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1]
               @> c.conkey)
  AND c.contype = 'f'
GROUP BY c.conrelid, c.conname, c.confrelid
ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC;

The result is ordered by the size of the table, so that the tables where the missing index hurts most are listed on top.

Should I create indexes for all foreign keys?

If the source table is small, you don’t need the index, because then a sequential scan is probably cheaper than an index scan anyway.

Also, if you know that you never need the index for a join and you will never delete a row or update a key column in the target table, the index is unnecessary.

There is one simple way to proceed: create all missing indexes, wait a couple of days and then get rid of the indexes that were never used.