ANALYZE a PostgreSQL table to collect value distribution statistics, you will find the gathered statistics for each column in the
pg_stats system view. This article will explain the meaning of the
correlation column and its impact on index scans.
Physical vs. logical ordering
Most common PostgreSQL data types have an ordering: they support the operators
Such data types can be used with a B-tree index (the “standard” index type).
The values in a column of such a type provide a logical ordering of the table rows. An index on this column will be sorted according to that ordering.
A PostgreSQL table consists of one or more files of 8KB blocks. The order in which the rows are stored in the file is the physical ordering.
You can examine the physical ordering of the rows by selecting the
ctid system column: it contains the block number and the item number inside the block, which describe the physical location of the table row.
The correlation for a column is a value between -1 and 1. It tells how good the match between logical and physical ordering is.
- If the correlation is 1, the rows are stored in the table file in ascending column order; if it is -1, they are stored in descending order.
- Values between -1 and 1 mean a less perfect match.
- A value of 0 means that there is no connection between the physical and the logical order.
Why should I care?
You will create indexes on your tables for faster access (but not too many!).
The correlation of a column has an impact on the performance of an index scan.
During an index scan, the whole index or part of it are read in index sequential order. For each entry that is found, the corresponding row is fetched from the table (this is skipped in an “index only scan”, but that is a different story).
If the correlation of the indexed column is close to zero, the fetched rows will be from all over the table. This will result in many randomly distributed reads of many different table blocks.
However, if the correlation is close to 1 or -1, the next row fetched during the index scan tends to be in the same or the next table block as the previous row.
High correlation has two advantages:
- Blocks read by the database are cached in shared memory. Consequently, if many of the table rows fetched during the index scan are located in the same table block, only few blocks have to be read from storage.
- The blocks that have to be read from storage are next to each other. This leads to sequential I/O, which on spinning disks is substantially faster than random I/O.
Let’s create two tables with identical content, but different correlation:
CREATE TABLE corr (id, val) AS SELECT i, 'some text ' || i FROM generate_series(1, 100000) AS i; CREATE INDEX corr_idx ON corr (id); VACUUM (ANALYZE) corr; SELECT correlation FROM pg_stats WHERE tablename = 'corr' AND attname = 'id'; correlation ------------- 1 (1 row) CREATE TABLE uncorr AS SELECT * FROM corr ORDER BY random(); CREATE INDEX uncorr_idx ON uncorr (id); VACUUM (ANALYZE) uncorr; SELECT correlation FROM pg_stats WHERE tablename = 'uncorr' AND attname = 'id'; correlation ------------- -0.00522369 (1 row)
We disable bitmap index scans so that we can compare index scans on both tables.
Then we check how index scans perform:
SET enable_bitmapscan = off; EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM corr WHERE id BETWEEN 1001 AND 1300; QUERY PLAN --------------------------------------------------- Index Scan using corr_idx on corr (cost=0.29..15.23 rows=297 width=19) (actual time=0.108..0.732 rows=300 loops=1) Index Cond: ((id >= 1001) AND (id <= 1300)) Buffers: shared hit=6 Planning time: 0.456 ms Execution time: 1.049 ms (5 rows) EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM uncorr WHERE id BETWEEN 1001 AND 1300; QUERY PLAN --------------------------------------------------- Index Scan using uncorr_idx on uncorr (cost=0.29..978.15 rows=298 width=19) (actual time=0.105..2.352 rows=300 loops=1) Index Cond: ((id >= 1001) AND (id <= 1300)) Buffers: shared hit=303 Planning time: 0.548 ms Execution time: 2.736 ms (5 rows)
Now 2.7 milliseconds is not so bad, but that is only because all blocks were already in shared buffers.
If a part of these blocks has to be read from disk, the 303 blocks from the second query will do much worse than the 6 from the first!
In the second query, each result row was found in a different table block. This caused 300 blocks to be touched. The remaining three blocks are index blocks.
The first query touches only three table blocks:
SELECT ctid, id FROM corr WHERE id BETWEEN 1001 AND 1300; ctid | id ---------+------ (6,58) | 1001 (6,59) | 1002 (6,60) | 1003 (6,61) | 1004 (6,62) | 1005 (6,63) | 1006 (6,64) | 1007 ... (8,37) | 1294 (8,38) | 1295 (8,39) | 1296 (8,40) | 1297 (8,41) | 1298 (8,42) | 1299 (8,43) | 1300 (300 rows)
Indeed, all rows are contained in the table blocks 6, 7 and 8!
Correlation and the optimizer
The PostgreSQL optimizer estimates the cost of the possible ways to execute an SQL statement.
With the use of the correlation it can give better estimates of the cost of an index scan, leading to better plan choices.
The PostgreSQL optimizer will prefer index scans if the correlation is close to 1 or -1.
Correlation and BRIN indexes
PostgreSQL 9.5 introduced the BRIN index (block range index).
This index works be storing the minimum and maximum of all values for ranges of table blocks. It is only useful for columns with perfect correlation. Its advantage over the B-tree index is its much smaller size, which makes it an interesting option for large tables.
How to make use of correlation?
If you need to efficiently scan bigger portions of an index, it is good to keep the table in index order.
There are no “index ordered tables” in PostgreSQL.
Still, high correlation for a column can be maintained in two ways:
If the table rows are inserted in logical column order and there are no updates or deletes on the table, the physical ordering will be identical to the logical ordering. Good examples for that are primary key columns generated by sequences or measurements with a timestamp.
Since correlation is always perfect in this case, a BRIN index can be an interesting option.
If you want to remove old data from a table without disrupting the physical ordering, you can use table partitioning.
The SQL statement
CLUSTERcan be used to rewrite a table so that the physical ordering is identical to the logical ordering of an index.
However, subsequent modifications of the table will reduce the correlation again. Because of that, you need to re-cluster the table regularly to maintain high correlation. This is annoying, because
CLUSTERblocks all concurrent access to the table.