Foreign key indexing and Performance in PostgreSQL

A foreign key is no wrong key! — © Laurenz Albe 2018

Foreign key constraints are an important tool to keep your database consistent while also documenting relationships between tables.

Table of Contents

A fact that is often ignored is that foreign keys need proper indexing to perform well.

This article will explain that and show you how to search for missing indexes.

Index at the target of a foreign key

In the following, I will call the table, on which the foreign key constraint is defined, the source table and the referenced table the target table.

The referenced columns in the target table must have a primary key or unique constraint. Such constraints are implemented with unique indexes in PostgreSQL. Consequently, the target side of a foreign key is automatically indexed.

This is required so that there is always a well-defined row to which the foreign key points. The index also comes handy if you want to find the row in the target table that matches a row in the source table.

Index at the source of a foreign key

In contrast to the above, PostgreSQL requires no index at the source of a foreign key.

However, such an index is quite useful for finding all source rows that reference a target row. The typical cases where you need that are:

1. You perform a join between the two tables where you explicitly search for the source rows referencing one or a few target rows. If there is an index on the columns at the source, PostgreSQL can use an efficient nested loop join.

This is well known and pretty obvious.

2. You delete rows or update key columns in the target table.

Then PostgreSQL has to check if the foreign key constraint is still satisfied. It does so by searching if there are rows in the source table that would become orphaned by the data modification. Without an index, this requires a sequential scan of the source table.

An example

Let's build a source and a target table:

-- to make the plans look simpler
SET max_parallel_workers_per_gather = 0;
-- to speed up CREATE INDEX
SET maintenance_work_mem = '512MB';

CREATE TABLE target (
   t_id integer NOT NULL,
   t_name text NOT NULL
);
INSERT INTO target (t_id, t_name)
   SELECT i, 'target ' || i
   FROM generate_series(1, 500001) AS i;

ALTER TABLE target
   ADD PRIMARY KEY (t_id);

CREATE INDEX ON target (t_name);

/* set hint bits and collect statistics */
VACUUM (ANALYZE) target;

CREATE TABLE source (
   s_id integer NOT NULL,
   t_id integer NOT NULL,
   s_name text NOT NULL
);
INSERT INTO source (s_id, t_id, s_name)
   SELECT i, (i - 1) % 500000 + 1, 'source ' || i
   FROM generate_series(1, 1000000) AS i;

ALTER TABLE source
   ADD PRIMARY KEY (s_id);

ALTER TABLE source
   ADD FOREIGN KEY (t_id) REFERENCES target;

/* set hint bits and collect statistics */
VACUUM (ANALYZE) source;

-- to make the plans look simpler

SET max_parallel_workers_per_gather = 0;

-- to speed up CREATE INDEX

SET maintenance_work_mem = '512MB';

CREATE TABLE target (

t_id integer NOT NULL,

t_name text NOT NULL

);

INSERT INTO target (t_id, t_name)

SELECT i, 'target ' || i

FROM generate_series(1, 500001) AS i;

ALTER TABLE target

ADD PRIMARY KEY (t_id);

CREATE INDEX ON target (t_name);

/* set hint bits and collect statistics */

VACUUM (ANALYZE) target;

CREATE TABLE source (

s_id integer NOT NULL,

t_id integer NOT NULL,

s_name text NOT NULL

);

INSERT INTO source (s_id, t_id, s_name)

SELECT i, (i - 1) % 500000 + 1, 'source ' || i

FROM generate_series(1, 1000000) AS i;

ALTER TABLE source

ADD PRIMARY KEY (s_id);

ALTER TABLE source

ADD FOREIGN KEY (t_id) REFERENCES target;

/* set hint bits and collect statistics */

VACUUM (ANALYZE) source;

Query time without an index

Looking up source rows via the link to target and deleting rows from target is unreasonably slow:

EXPLAIN (ANALYZE)
SELECT source.s_name
FROM source
   JOIN target USING (t_id)
WHERE target.t_name = 'target 42';

                          QUERY PLAN
----------------------------------------------------------------
 Hash Join  (cost=8.45..19003.47 rows=2 width=13)
            (actual time=0.150..360.920 rows=2 loops=1)
   Hash Cond: (source.t_id = target.t_id)
   ->  Seq Scan on source
                    (cost=0.00..16370.00 rows=1000000 width=17)
              (actual time=0.011..155.964 rows=1000000 loops=1)
   ->  Hash  (cost=8.44..8.44 rows=1 width=4)
             (actual time=0.111..0.111 rows=1 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=4)
                      (actual time=0.105..0.107 rows=1 loops=1)
               Index Cond: (t_name = 'target 42'::text)
 Planning time: 0.701 ms
 Execution time: 360.982 ms
(9 rows)

EXPLAIN (ANALYZE)
DELETE FROM target
WHERE target.t_name = 'target 500001';

                          QUERY PLAN
----------------------------------------------------------------
 Delete on target  (cost=0.42..8.44 rows=1 width=6)
                   (actual time=0.178..0.178 rows=0 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=6)
                      (actual time=0.107..0.109 rows=1 loops=1)
         Index Cond: (t_name = 'target 500001'::text)
 Planning time: 0.165 ms
 Trigger for constraint source_t_id_fkey: time=153.804 calls=1
 Execution time: 154.030 ms
(6 rows)

EXPLAIN (ANALYZE)

SELECT source.s_name

FROM source

JOIN target USING (t_id)

WHERE target.t_name = 'target 42';

QUERY PLAN

----------------------------------------------------------------

Hash Join (cost=8.45..19003.47 rows=2 width=13)

(actual time=0.150..360.920 rows=2 loops=1)

Hash Cond: (source.t_id = target.t_id)

-> Seq Scan on source

(cost=0.00..16370.00 rows=1000000 width=17)

(actual time=0.011..155.964 rows=1000000 loops=1)

-> Hash (cost=8.44..8.44 rows=1 width=4)

(actual time=0.111..0.111 rows=1 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=4)

(actual time=0.105..0.107 rows=1 loops=1)

Index Cond: (t_name = 'target 42'::text)

Planning time: 0.701 ms

Execution time: 360.982 ms

(9 rows)

EXPLAIN (ANALYZE)

DELETE FROM target

WHERE target.t_name = 'target 500001';

QUERY PLAN

----------------------------------------------------------------

Delete on target (cost=0.42..8.44 rows=1 width=6)

(actual time=0.178..0.178 rows=0 loops=1)

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=6)

(actual time=0.107..0.109 rows=1 loops=1)

Index Cond: (t_name = 'target 500001'::text)

Planning time: 0.165 ms

Trigger for constraint source_t_id_fkey: time=153.804 calls=1

Execution time: 154.030 ms

(6 rows)

Query time with an index

After creating the appropriate index:

CREATE INDEX source_t_id_idx ON source (t_id);

1	CREATE INDEX source_t_id_idx ON source (t_id);

the queries are as fast as they should be:

EXPLAIN (ANALYZE)
SELECT source.s_name
FROM source
   JOIN target USING (t_id)
WHERE target.t_name = 'target 42';

                          QUERY PLAN
----------------------------------------------------------------
 Nested Loop  (cost=0.85..19.89 rows=2 width=13)
              (actual time=0.068..0.076 rows=2 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=4)
                      (actual time=0.048..0.049 rows=1 loops=1)
         Index Cond: (t_name = 'target 42'::text)
   ->  Index Scan using source_t_id_idx on source
                             (cost=0.42..11.43 rows=2 width=17)
                      (actual time=0.013..0.019 rows=2 loops=1)
         Index Cond: (t_id = target.t_id)
 Planning time: 1.238 ms
 Execution time: 0.147 ms
(7 rows)

EXPLAIN (ANALYZE)
DELETE FROM target
WHERE target.t_name = 'target 500001';

                           QUERY PLAN
----------------------------------------------------------------
 Delete on target  (cost=0.42..8.44 rows=1 width=6)
                   (actual time=0.135..0.135 rows=0 loops=1)
   ->  Index Scan using target_t_name_idx on target
                               (cost=0.42..8.44 rows=1 width=6)
                      (actual time=0.091..0.094 rows=1 loops=1)
         Index Cond: (t_name = 'target 500001'::text)
 Planning time: 0.151 ms
 Trigger for constraint source_t_id_fkey: time=0.557 calls=1
 Execution time: 0.751 ms
(6 rows)

EXPLAIN (ANALYZE)

SELECT source.s_name

FROM source

JOIN target USING (t_id)

WHERE target.t_name = 'target 42';

QUERY PLAN

----------------------------------------------------------------

Nested Loop (cost=0.85..19.89 rows=2 width=13)

(actual time=0.068..0.076 rows=2 loops=1)

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=4)

(actual time=0.048..0.049 rows=1 loops=1)

Index Cond: (t_name = 'target 42'::text)

-> Index Scan using source_t_id_idx on source

(cost=0.42..11.43 rows=2 width=17)

(actual time=0.013..0.019 rows=2 loops=1)

Index Cond: (t_id = target.t_id)

Planning time: 1.238 ms

Execution time: 0.147 ms

(7 rows)

EXPLAIN (ANALYZE)

DELETE FROM target

WHERE target.t_name = 'target 500001';

QUERY PLAN

----------------------------------------------------------------

Delete on target (cost=0.42..8.44 rows=1 width=6)

(actual time=0.135..0.135 rows=0 loops=1)

-> Index Scan using target_t_name_idx on target

(cost=0.42..8.44 rows=1 width=6)

(actual time=0.091..0.094 rows=1 loops=1)

Index Cond: (t_name = 'target 500001'::text)

Planning time: 0.151 ms

Trigger for constraint source_t_id_fkey: time=0.557 calls=1

Execution time: 0.751 ms

(6 rows)

How to check for missing indexes?

The following query will list all foreign key constraints in the database that do not have an index on the source columns:

SELECT c.conrelid::regclass AS "table",
       /* list of key column names in order */
       string_agg(a.attname, ',' ORDER BY x.n) AS columns,
       pg_catalog.pg_size_pretty(
          pg_catalog.pg_relation_size(c.conrelid)
       ) AS size,
       c.conname AS constraint,
       c.confrelid::regclass AS referenced_table
FROM pg_catalog.pg_constraint c
   /* enumerated key column numbers per foreign key */
   CROSS JOIN LATERAL
      unnest(c.conkey) WITH ORDINALITY AS x(attnum, n)
   /* name for each key column */
   JOIN pg_catalog.pg_attribute a
      ON a.attnum = x.attnum
         AND a.attrelid = c.conrelid
WHERE NOT EXISTS
        /* is there a matching index for the constraint? */
        (SELECT 1 FROM pg_catalog.pg_index i
         WHERE i.indrelid = c.conrelid
           /* it must not be a partial index */
           AND i.indpred IS NULL
           /* the first index columns must be the same as the
              key columns, but order doesn't matter */
           AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1]
               OPERATOR(pg_catalog.@>) c.conkey)
  AND c.contype = 'f'
GROUP BY c.conrelid, c.conname, c.confrelid
ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC;

SELECT c.conrelid::regclass AS "table",

/* list of key column names in order */

string_agg(a.attname, ',' ORDER BY x.n) AS columns,

pg_catalog.pg_size_pretty(

pg_catalog.pg_relation_size(c.conrelid)

) AS size,

c.conname AS constraint,

c.confrelid::regclass AS referenced_table

FROM pg_catalog.pg_constraint c

/* enumerated key column numbers per foreign key */

CROSS JOIN LATERAL

unnest(c.conkey) WITH ORDINALITY AS x(attnum, n)

/* name for each key column */

JOIN pg_catalog.pg_attribute a

ON a.attnum = x.attnum

AND a.attrelid = c.conrelid

WHERE NOT EXISTS

/* is there a matching index for the constraint? */

(SELECT 1 FROM pg_catalog.pg_index i

WHERE i.indrelid = c.conrelid

/* it must not be a partial index */

AND i.indpred IS NULL

/* the first index columns must be the same as the

key columns, but order doesn't matter */

AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1]

OPERATOR(pg_catalog.@>) c.conkey)

AND c.contype = 'f'

GROUP BY c.conrelid, c.conname, c.confrelid

ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC;

The result is ordered by the size of the table so that the tables, where the missing index hurts most, are listed on top.

Should I create indexes for `all` foreign keys?

If the source table is small, you don't need the index, because then a sequential scan is probably cheaper than an index scan anyway.

Also, if you know that you never need the index for a join and you will never delete a row or update a key column in the target table, the index is unnecessary.

There is one simple way to proceed: create all missing indexes, wait a couple of days and then get rid of the indexes that were never used.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on X, Facebook, or LinkedIn.

5 4 votes

Article Rating

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

stagei

3 years ago

Very good article. Did the same thing on Oracle some years ago but is a newbee in regards to Postgres 🙂
Thx

Omri Shani

3 years ago

I think it is better to filter out partial indices. I did it with where indpred is null inside the exists query on pg_index

Ivan Kleshnin

4 years ago

Can we update the "missing check" algorithm to not display indexes covered by compound key?
I mean if we have a purchase table:
purchase - "id" (PK) - "accountId" (FK) - "itemId" (FK)
and a compound unique constraint:
"purchase_accountId_courseId_key" UNIQUE CONSTRAINT, btree ("accountId", "itemId")
this index should cover all queries with accountId and with itemId so two separate indexes would only waste DB space.

laurenz

4 years ago

Reply to Ivan Kleshnin

But that would not be reported by my query, because the unique constraint would be in pg_index. Perhaps I misunderstand you, can you give me a complete example?

Nux

4 years ago

Nice article :-).

One note about that query for checking indexes -- it only works in PG SQL 9.4 and above. Mostly because of

WITH ORDINALITY and also the

cardinality function.

Rinat Mukhtarov

4 years ago

Sorry, but query from section "How to check for missing indexes?" throw an error:

[42725] ERROR: operator is not unique: smallint[] @> smallint[] Hint: Could not choose a best candidate operator. You might need to add explicit type casts. Position: 1000

PostgreSQL 10.5 (Debian 10.5-2.pgdg90 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18 deb9u1) 6.3.0 20170516, 64-bit

laurenz

4 years ago

Reply to Rinat Mukhtarov

That probably means that you have created additional operators or type casts. The documentation recommends:

The set of casts between built-in types has been carefully crafted and is best not altered.

Przemysław Fusik

4 years ago

Hi,

thanks for great article.

It explains a lot.

: 1:

Stay tuned with our

Foreign Key Indexing and Performance in PostgreSQL

Index at the target of a foreign key

Index at the source of a foreign key

An example

Query time without an index

Query time with an index

How to check for missing indexes?

Should I create indexes for `all` foreign keys?

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Foreign Key Indexing and Performance in PostgreSQL

Index at the target of a foreign key

Index at the source of a foreign key

An example

Query time without an index

Query time with an index

How to check for missing indexes?

Should I create indexes for all foreign keys?

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Should I create indexes for `all` foreign keys?