Foreign key constraints are an important tool to keep your database consistent while also documenting relationships between tables.
Table of Contents
A fact that is often ignored is that foreign keys need proper indexing to perform well.
This article will explain that and show you how to search for missing indexes.
In the following, I will call the table, on which the foreign key constraint is defined, the source table
and the referenced table the target table
.
The referenced columns in the target table must have a primary key or unique constraint. Such constraints are implemented with unique indexes in PostgreSQL. Consequently, the target side of a foreign key is automatically indexed.
This is required so that there is always a well-defined row to which the foreign key points. The index also comes handy if you want to find the row in the target table that matches a row in the source table.
In contrast to the above, PostgreSQL requires no index at the source of a foreign key.
However, such an index is quite useful for finding all source rows that reference a target row. The typical cases where you need that are:
1. You perform a join between the two tables where you explicitly search for the source rows referencing one or a few target rows. If there is an index on the columns at the source, PostgreSQL can use an efficient nested loop join
.
This is well known and pretty obvious.
2. You delete rows or update key columns in the target table.
Then PostgreSQL has to check if the foreign key constraint is still satisfied. It does so by searching if there are rows in the source table that would become orphaned
by the data modification. Without an index, this requires a sequential scan of the source table.
Let's build a source and a target table:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
-- to make the plans look simpler SET max_parallel_workers_per_gather = 0; -- to speed up CREATE INDEX SET maintenance_work_mem = '512MB'; CREATE TABLE target ( t_id integer NOT NULL, t_name text NOT NULL ); INSERT INTO target (t_id, t_name) SELECT i, 'target ' || i FROM generate_series(1, 500001) AS i; ALTER TABLE target ADD PRIMARY KEY (t_id); CREATE INDEX ON target (t_name); /* set hint bits and collect statistics */ VACUUM (ANALYZE) target; CREATE TABLE source ( s_id integer NOT NULL, t_id integer NOT NULL, s_name text NOT NULL ); INSERT INTO source (s_id, t_id, s_name) SELECT i, (i - 1) % 500000 + 1, 'source ' || i FROM generate_series(1, 1000000) AS i; ALTER TABLE source ADD PRIMARY KEY (s_id); ALTER TABLE source ADD FOREIGN KEY (t_id) REFERENCES target; /* set hint bits and collect statistics */ VACUUM (ANALYZE) source; |
Looking up source
rows via the link to target
and deleting rows from target
is unreasonably slow:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
EXPLAIN (ANALYZE) SELECT source.s_name FROM source JOIN target USING (t_id) WHERE target.t_name = 'target 42'; QUERY PLAN ---------------------------------------------------------------- Hash Join (cost=8.45..19003.47 rows=2 width=13) (actual time=0.150..360.920 rows=2 loops=1) Hash Cond: (source.t_id = target.t_id) -> Seq Scan on source (cost=0.00..16370.00 rows=1000000 width=17) (actual time=0.011..155.964 rows=1000000 loops=1) -> Hash (cost=8.44..8.44 rows=1 width=4) (actual time=0.111..0.111 rows=1 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Index Scan using target_t_name_idx on target (cost=0.42..8.44 rows=1 width=4) (actual time=0.105..0.107 rows=1 loops=1) Index Cond: (t_name = 'target 42'::text) Planning time: 0.701 ms Execution time: 360.982 ms (9 rows) EXPLAIN (ANALYZE) DELETE FROM target WHERE target.t_name = 'target 500001'; QUERY PLAN ---------------------------------------------------------------- Delete on target (cost=0.42..8.44 rows=1 width=6) (actual time=0.178..0.178 rows=0 loops=1) -> Index Scan using target_t_name_idx on target (cost=0.42..8.44 rows=1 width=6) (actual time=0.107..0.109 rows=1 loops=1) Index Cond: (t_name = 'target 500001'::text) Planning time: 0.165 ms Trigger for constraint source_t_id_fkey: time=153.804 calls=1 Execution time: 154.030 ms (6 rows) |
After creating the appropriate index:
1 |
CREATE INDEX source_t_id_idx ON source (t_id); |
the queries are as fast as they should be:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 |
EXPLAIN (ANALYZE) SELECT source.s_name FROM source JOIN target USING (t_id) WHERE target.t_name = 'target 42'; QUERY PLAN ---------------------------------------------------------------- Nested Loop (cost=0.85..19.89 rows=2 width=13) (actual time=0.068..0.076 rows=2 loops=1) -> Index Scan using target_t_name_idx on target (cost=0.42..8.44 rows=1 width=4) (actual time=0.048..0.049 rows=1 loops=1) Index Cond: (t_name = 'target 42'::text) -> Index Scan using source_t_id_idx on source (cost=0.42..11.43 rows=2 width=17) (actual time=0.013..0.019 rows=2 loops=1) Index Cond: (t_id = target.t_id) Planning time: 1.238 ms Execution time: 0.147 ms (7 rows) EXPLAIN (ANALYZE) DELETE FROM target WHERE target.t_name = 'target 500001'; QUERY PLAN ---------------------------------------------------------------- Delete on target (cost=0.42..8.44 rows=1 width=6) (actual time=0.135..0.135 rows=0 loops=1) -> Index Scan using target_t_name_idx on target (cost=0.42..8.44 rows=1 width=6) (actual time=0.091..0.094 rows=1 loops=1) Index Cond: (t_name = 'target 500001'::text) Planning time: 0.151 ms Trigger for constraint source_t_id_fkey: time=0.557 calls=1 Execution time: 0.751 ms (6 rows) |
The following query will list all foreign key constraints in the database that do not have an index on the source columns:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
SELECT c.conrelid::regclass AS "table", /* list of key column names in order */ string_agg(a.attname, ',' ORDER BY x.n) AS columns, pg_catalog.pg_size_pretty( pg_catalog.pg_relation_size(c.conrelid) ) AS size, c.conname AS constraint, c.confrelid::regclass AS referenced_table FROM pg_catalog.pg_constraint c /* enumerated key column numbers per foreign key */ CROSS JOIN LATERAL unnest(c.conkey) WITH ORDINALITY AS x(attnum, n) /* name for each key column */ JOIN pg_catalog.pg_attribute a ON a.attnum = x.attnum AND a.attrelid = c.conrelid WHERE NOT EXISTS /* is there a matching index for the constraint? */ (SELECT 1 FROM pg_catalog.pg_index i WHERE i.indrelid = c.conrelid /* it must not be a partial index */ AND i.indpred IS NULL /* the first index columns must be the same as the key columns, but order doesn't matter */ AND (i.indkey::smallint[])[0:cardinality(c.conkey)-1] OPERATOR(pg_catalog.@>) c.conkey) AND c.contype = 'f' GROUP BY c.conrelid, c.conname, c.confrelid ORDER BY pg_catalog.pg_relation_size(c.conrelid) DESC; |
The result is ordered by the size of the table so that the tables, where the missing index hurts most, are listed on top.
all
foreign keys?If the source table is small, you don't need the index, because then a sequential scan is probably cheaper than an index scan anyway.
Also, if you know that you never need the index for a join and you will never delete a row or update a key column in the target table, the index is unnecessary.
There is one simple way to proceed: create all
missing indexes, wait a couple of days and then get rid of the indexes that were never used.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on X, Facebook, or LinkedIn.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Very good article. Did the same thing on Oracle some years ago but is a newbee in regards to Postgres ๐
Thx
I think it is better to filter out partial indices. I did it with
where indpred is null
inside the exists query on pg_indexCan we update the "missing check" algorithm to not display indexes covered by compound key?
I mean if we have a
purchase
table:purchase
- "id" (PK)
- "accountId" (FK)
- "itemId" (FK)
and a compound unique constraint:
"purchase_accountId_courseId_key" UNIQUE CONSTRAINT, btree ("accountId", "itemId")
this index should cover all queries with
accountId
and withitemId
so two separate indexes would only waste DB space.But that would not be reported by my query, because the unique constraint would be in
pg_index
. Perhaps I misunderstand you, can you give me a complete example?Nice article :-).
One note about that query for checking indexes -- it only works in PG SQL 9.4 and above. Mostly because of
WITH ORDINALITY and also the
cardinality function.
Sorry, but query from section "How to check for missing indexes?" throw an error:
[42725] ERROR: operator is not unique: smallint[] @> smallint[]
Hint: Could not choose a best candidate operator. You might need to add explicit type casts.
Position: 1000
PostgreSQL 10.5 (Debian 10.5-2.pgdg90 1) on x86_64-pc-linux-gnu, compiled by gcc (Debian 6.3.0-18 deb9u1) 6.3.0 20170516, 64-bit
That probably means that you have created additional operators or type casts. The documentation recommends:
Hi,
thanks for great article.
It explains a lot.
: 1: