Partitioned table statistics

A misunderstanding about partitioned table statistics: when asked about them, a man answers that they have 5 partitioned tables with 80 partitions each, but wonders why that is important for the optimizer
© Laurenz Albe 2025

Table of Contents

I recently helped a customer with a slow query. Eventually, an ANALYZE on a partitioned table was enough to fix the problem. This came as a surprise for the customer, since autovacuum was enabled. So I decided to write an article on how PostgreSQL collects partitioned table statistics and how they affect PostgreSQL's estimates.

What do I mean by “partitioned table statistics”?

I am talking about optimizer statistics as defined in my article about preserving statistics during an upgrade. These are the data that the query optimizer uses to estimate the cost of execution plans. You can view the optimizer statistics for table columns in the system view pg_stats.

In PostgreSQL, a partitioned table is a logical construct, somewhat similar to a view. A partitioned table does not hold any data itself. The data reside in the partitions of the partitioned table. Inserting a row into a partitioned table actually inserts the row into the appropriate partition. In PostgreSQL, partitions are ordinary tables that just have a “side job” as a partition of a partitioned table.

When I talk about partitioned table statistics, I mean the optimizer statistics for the partitioned table itself, as opposed to the optimizer statistics of its partitions.

An example for statistics on a partitioned table

To demonstrate statistics on a partitioned table, I am going to use a simple example:

CREATE TABLE tab (
   pk integer NOT NULL
) PARTITION BY LIST ((pk % 2));

CREATE UNLOGGED TABLE tab_even
PARTITION OF tab FOR VALUES IN (0);

CREATE UNLOGGED TABLE tab_odd
PARTITION OF tab FOR VALUES IN (1);

CREATE TABLE tab (

pk integer NOT NULL

) PARTITION BY LIST ((pk % 2));

CREATE UNLOGGED TABLE tab_even

PARTITION OF tab FOR VALUES IN (0);

CREATE UNLOGGED TABLE tab_odd

PARTITION OF tab FOR VALUES IN (1);

I used unlogged tables for better performance, because we don't need crash recovery. Each of the partitions will receive half of the rows I INSERT into the table:

INSERT INTO tab
SELECT * FROM generate_series(1, 1000000);

1 2	INSERT INTO tab SELECT * FROM generate_series(1, 1000000);

Partitioned table statistics and autovacuum

With the default configuration, the autovacuum launcher will sleep for up to a minute before becoming active. Then it will process our new tables and run VACUUM and ANALYZE. So let's wait for two minutes before we take a look at the statistics:

SELECT *
FROM pg_stats
WHERE tablename = 'tab_even'
  AND attname = 'pk';

-[ RECORD 1 ]----------+----------------------
schemaname             | laurenz
tablename              | tab_even
attname                | pk
inherited              | f
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {82,10896,21078,...}
correlation            | 1
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
range_length_histogram | 
range_empty_frac       | 
range_bounds_histogram |

SELECT *
FROM pg_stats
WHERE tablename = 'tab'
  AND attname = 'pk';

(0 rows)

SELECT *

FROM pg_stats

WHERE tablename = 'tab_even'

AND attname = 'pk';

-[ RECORD 1 ]----------+----------------------

schemaname | laurenz

tablename | tab_even

attname | pk

inherited | f

null_frac | 0

avg_width | 4

n_distinct | -1

most_common_vals |

most_common_freqs |

histogram_bounds | {82,10896,21078,...}

correlation | 1

most_common_elems |

most_common_elem_freqs |

elem_count_histogram |

range_length_histogram |

range_empty_frac |

range_bounds_histogram |

SELECT *

FROM pg_stats

WHERE tablename = 'tab'

AND attname = 'pk';

(0 rows)

PostgreSQL has statistics for the partitions, but no partitioned table statistics! We can find the explanation in the documentation:

Partitioned tables do not directly store tuples and consequently are not processed by autovacuum. (Autovacuum does process table partitions just like other tables.) Unfortunately, this means that autovacuum does not run ANALYZE on partitioned tables, and this can cause suboptimal plans for queries that reference partitioned table statistics. You can work around this problem by manually running ANALYZE on partitioned tables when they are first populated, and again whenever the distribution of data in their partitions changes significantly.

The importance of statistics on a partitioned table

So we see that we don't get statistics on a partitioned table automatically. The question is how important these statistics are. After all, the partitioned table contains no data, and we have statistics on the partitions. For many SQL statements, that is good enough. Here is an example:

EXPLAIN (ANALYZE) SELECT * FROM tab WHERE pk < 1000;

                                                              QUERY PLAN                                                               
---------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1000.00..12872.14 rows=913 width=4) (actual time=0.558..27.241 rows=999.00 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4426
   ->  Parallel Append  (cost=0.00..11780.84 rows=380 width=4) (actual time=6.016..20.488 rows=333.00 loops=3)
         Buffers: shared hit=4426
         ->  Parallel Seq Scan on tab_even tab_1  (cost=0.00..5889.47 rows=249 width=4) (actual time=3.035..9.128 rows=166.33 loops=3)
               Filter: (pk < 1000)
               Rows Removed by Filter: 166500
               Buffers: shared hit=2213
         ->  Parallel Seq Scan on tab_odd tab_2  (cost=0.00..5889.47 rows=288 width=4) (actual time=6.741..16.962 rows=250.00 loops=2)
               Filter: (pk < 1000)
               Rows Removed by Filter: 249750
               Buffers: shared hit=2213
 Planning Time: 0.199 ms
 Execution Time: 27.359 ms

EXPLAIN (ANALYZE) SELECT * FROM tab WHERE pk < 1000;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------

Gather (cost=1000.00..12872.14 rows=913 width=4) (actual time=0.558..27.241 rows=999.00 loops=1)

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=4426

-> Parallel Append (cost=0.00..11780.84 rows=380 width=4) (actual time=6.016..20.488 rows=333.00 loops=3)

Buffers: shared hit=4426

-> Parallel Seq Scan on tab_even tab_1 (cost=0.00..5889.47 rows=249 width=4) (actual time=3.035..9.128 rows=166.33 loops=3)

Filter: (pk < 1000)

Rows Removed by Filter: 166500

Buffers: shared hit=2213

-> Parallel Seq Scan on tab_odd tab_2 (cost=0.00..5889.47 rows=288 width=4) (actual time=6.741..16.962 rows=250.00 loops=2)

Filter: (pk < 1000)

Rows Removed by Filter: 249750

Buffers: shared hit=2213

Planning Time: 0.199 ms

Execution Time: 27.359 ms

As you can see, the estimates are fairly correct.

However, PostgreSQL uses partitioned table statistics to estimate the result row count of a join. Look at the following example:

EXPLAIN (ANALYZE)
SELECT *
FROM generate_series(1, 100) AS gs(id)
  JOIN tab ON gs.id = tab.pk;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=2.25..28178.25 rows=500000 width=8) (actual time=0.164..145.455 rows=100.00 loops=1)
   Hash Cond: (tab.pk = gs.id)
   Buffers: shared hit=4426
   ->  Append  (cost=0.00..19426.00 rows=1000000 width=4) (actual time=0.018..84.210 rows=1000000.00 loops=1)
         Buffers: shared hit=4426
         ->  Seq Scan on tab_even tab_1  (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.017..21.568 rows=500000.00 loops=1)
               Buffers: shared hit=2213
         ->  Seq Scan on tab_odd tab_2  (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.019..16.421 rows=500000.00 loops=1)
               Buffers: shared hit=2213
   ->  Hash  (cost=1.00..1.00 rows=100 width=4) (actual time=0.119..0.120 rows=100.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 12kB
         ->  Function Scan on generate_series gs  (cost=0.00..1.00 rows=100 width=4) (actual time=0.058..0.071 rows=100.00 loops=1)
 Planning:
   Buffers: shared hit=73
 Planning Time: 0.743 ms
 Execution Time: 145.540 ms

EXPLAIN (ANALYZE)

SELECT *

FROM generate_series(1, 100) AS gs(id)

JOIN tab ON gs.id = tab.pk;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

Hash Join (cost=2.25..28178.25 rows=500000 width=8) (actual time=0.164..145.455 rows=100.00 loops=1)

Hash Cond: (tab.pk = gs.id)

Buffers: shared hit=4426

-> Append (cost=0.00..19426.00 rows=1000000 width=4) (actual time=0.018..84.210 rows=1000000.00 loops=1)

Buffers: shared hit=4426

-> Seq Scan on tab_even tab_1 (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.017..21.568 rows=500000.00 loops=1)

Buffers: shared hit=2213

-> Seq Scan on tab_odd tab_2 (cost=0.00..7213.00 rows=500000 width=4) (actual time=0.019..16.421 rows=500000.00 loops=1)

Buffers: shared hit=2213

-> Hash (cost=1.00..1.00 rows=100 width=4) (actual time=0.119..0.120 rows=100.00 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 12kB

-> Function Scan on generate_series gs (cost=0.00..1.00 rows=100 width=4) (actual time=0.058..0.071 rows=100.00 loops=1)

Planning:

Buffers: shared hit=73

Planning Time: 0.743 ms

Execution Time: 145.540 ms

All the row count estimates are correct — except the one for the hash join: it is off by a factor of 5000! Let's ANALYZE the partitioned table and see if that improves the estimate:

ANALYZE tab;

EXPLAIN (ANALYZE)
SELECT *
FROM generate_series(1, 100) AS gs(id)
  JOIN tab ON gs.id = tab.pk;
                                                                    QUERY PLAN                                                                     
---------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=1002.25..14966.87 rows=100 width=8) (actual time=0.382..57.992 rows=100.00 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=4426
   ->  Hash Join  (cost=2.25..13956.87 rows=42 width=8) (actual time=16.681..52.161 rows=33.33 loops=3)
         Hash Cond: (tab.pk = gs.id)
         Buffers: shared hit=4426
         ->  Parallel Append  (cost=0.00..12391.69 rows=416668 width=4) (actual time=0.008..29.572 rows=333333.33 loops=3)
               Buffers: shared hit=4426
               ->  Parallel Seq Scan on tab_even tab_1  (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.004..6.317 rows=166666.67 loops=3)
                     Buffers: shared hit=2213
               ->  Parallel Seq Scan on tab_odd tab_2  (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.007..10.296 rows=250000.00 loops=2)
                     Buffers: shared hit=2213
         ->  Hash  (cost=1.00..1.00 rows=100 width=4) (actual time=0.058..0.058 rows=100.00 loops=3)
               Buckets: 1024  Batches: 1  Memory Usage: 12kB
               ->  Function Scan on generate_series gs  (cost=0.00..1.00 rows=100 width=4) (actual time=0.027..0.036 rows=100.00 loops=3)
 Planning:
   Buffers: shared hit=12
 Planning Time: 0.237 ms
 Execution Time: 58.041 ms

ANALYZE tab;

EXPLAIN (ANALYZE)

SELECT *

FROM generate_series(1, 100) AS gs(id)

JOIN tab ON gs.id = tab.pk;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------

Gather (cost=1002.25..14966.87 rows=100 width=8) (actual time=0.382..57.992 rows=100.00 loops=1)

Workers Planned: 2

Workers Launched: 2

Buffers: shared hit=4426

-> Hash Join (cost=2.25..13956.87 rows=42 width=8) (actual time=16.681..52.161 rows=33.33 loops=3)

Hash Cond: (tab.pk = gs.id)

Buffers: shared hit=4426

-> Parallel Append (cost=0.00..12391.69 rows=416668 width=4) (actual time=0.008..29.572 rows=333333.33 loops=3)

Buffers: shared hit=4426

-> Parallel Seq Scan on tab_even tab_1 (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.004..6.317 rows=166666.67 loops=3)

Buffers: shared hit=2213

-> Parallel Seq Scan on tab_odd tab_2 (cost=0.00..5154.18 rows=294118 width=4) (actual time=0.007..10.296 rows=250000.00 loops=2)

Buffers: shared hit=2213

-> Hash (cost=1.00..1.00 rows=100 width=4) (actual time=0.058..0.058 rows=100.00 loops=3)

Buckets: 1024 Batches: 1 Memory Usage: 12kB

-> Function Scan on generate_series gs (cost=0.00..1.00 rows=100 width=4) (actual time=0.027..0.036 rows=100.00 loops=3)

Planning:

Buffers: shared hit=12

Planning Time: 0.237 ms

Execution Time: 58.041 ms

That did the trick! Now we can also see data in pg_stats:

SELECT *
FROM pg_stats
WHERE tablename = 'tab'
  AND attname = 'pk';

-[ RECORD 1 ]----------+----------------------
schemaname             | laurenz
tablename              | tab
attname                | pk
inherited              | t
null_frac              | 0
avg_width              | 4
n_distinct             | -1
most_common_vals       | 
most_common_freqs      | 
histogram_bounds       | {29,10972,20818,...}
correlation            | 0.5027541
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   | 
range_length_histogram | 
range_empty_frac       | 
range_bounds_histogram |

SELECT *

FROM pg_stats

WHERE tablename = 'tab'

AND attname = 'pk';

-[ RECORD 1 ]----------+----------------------

schemaname | laurenz

tablename | tab

attname | pk

inherited | t

null_frac | 0

avg_width | 4

n_distinct | -1

most_common_vals |

most_common_freqs |

histogram_bounds | {29,10972,20818,...}

correlation | 0.5027541

most_common_elems |

most_common_elem_freqs |

elem_count_histogram |

range_length_histogram |

range_empty_frac |

range_bounds_histogram |

How to collect partitioned table statistics?

It has become clear that we need to collect partitioned table statistics — at least for certain queries. On the other hand, autovacuum doesn't do the job for us. So we'll have to schedule explicit ANALYZE runs for our partitioned tables. We can use an operating system scheduler like cron, a PostgreSQL extension like pg_timetable or something you build into your application.

How often should you ANALYZE your partitioned tables? That depends on your use case, but I'd expect that once a day or once a week should be good enough in most cases. After all, you normally partition large tables, and it is unlikely that a big fraction of these data change in a day or a week.

Why are partitioned tables excluded from autovacuum?

The answer is probably that nobody has got around to implementing that yet.

For ordinary tables, PostgreSQL uses the monitoring statistics in pg_stat_all_tables.n_mod_since_analyze to determine if a table needs autoanalyze. Since a partitioned table does not contain any data itself, this number is always zero for a partitioned table. There are two possible approaches for autoanalyze of partitioned tables:

update the count of the partitioned table whenever you update n_mod_since_analyze for a partition
when autovacuum determines whether to ANALYZE a partitioned table, add up n_mod_since_analyze from all partitions

The first approach would burden all data modifying statements, so I'd say that the second approach is more promising. Perhaps some reader will feel inspired to improve PostgreSQL!

Conclusion

We have seen that autovacuum does not collect partitioned table statistics. We have to collect them with an explicit ANALYZE if we want to have good execution plans.

I am indebted to Andrey Lepikhov for guidance where such statistics actually matter!

Partitioned table statistics

What do I mean by “partitioned table statistics”?

An example for statistics on a partitioned table

Partitioned table statistics and autovacuum

The importance of statistics on a partitioned table

How to collect partitioned table statistics?

Why are partitioned tables excluded from autovacuum?

Conclusion

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts