CYBERTEC PostgreSQL Logo

Partitioned table statistics

09.2025
Category: 

A misunderstanding about partitioned table statistics: when asked about them, a man answers that they have 5 partitioned tables with 80 partitions each, but wonders why that is important for the optimizer
© Laurenz Albe 2025

I recently helped a customer with a slow query. Eventually, an ANALYZE on a partitioned table was enough to fix the problem. This came as a surprise for the customer, since autovacuum was enabled. So I decided to write an article on how PostgreSQL collects partitioned table statistics and how they affect PostgreSQL's estimates.

What do I mean by “partitioned table statistics”?

I am talking about optimizer statistics as defined in my article about preserving statistics during an upgrade. These are the data that the query optimizer uses to estimate the cost of execution plans. You can view the optimizer statistics for table columns in the system view pg_stats.

In PostgreSQL, a partitioned table is a logical construct, somewhat similar to a view. A partitioned table does not hold any data itself. The data reside in the partitions of the partitioned table. Inserting a row into a partitioned table actually inserts the row into the appropriate partition. In PostgreSQL, partitions are ordinary tables that just have a “side job” as a partition of a partitioned table.

When I talk about partitioned table statistics, I mean the optimizer statistics for the partitioned table itself, as opposed to the optimizer statistics of its partitions.

An example for statistics on a partitioned table

To demonstrate statistics on a partitioned table, I am going to use a simple example:

I used unlogged tables for better performance, because we don't need crash recovery. Each of the partitions will receive half of the rows I INSERT into the table:

Partitioned table statistics and autovacuum

With the default configuration, the autovacuum launcher will sleep for up to a minute before becoming active. Then it will process our new tables and run VACUUM and ANALYZE. So let's wait for two minutes before we take a look at the statistics:

PostgreSQL has statistics for the partitions, but no partitioned table statistics! We can find the explanation in the documentation:

Partitioned tables do not directly store tuples and consequently are not processed by autovacuum. (Autovacuum does process table partitions just like other tables.) Unfortunately, this means that autovacuum does not run ANALYZE on partitioned tables, and this can cause suboptimal plans for queries that reference partitioned table statistics. You can work around this problem by manually running ANALYZE on partitioned tables when they are first populated, and again whenever the distribution of data in their partitions changes significantly.

The importance of statistics on a partitioned table

So we see that we don't get statistics on a partitioned table automatically. The question is how important these statistics are. After all, the partitioned table contains no data, and we have statistics on the partitions. For many SQL statements, that is good enough. Here is an example:

As you can see, the estimates are fairly correct.

However, PostgreSQL uses partitioned table statistics to estimate the result row count of a join. Look at the following example:

All the row count estimates are correct — except the one for the hash join: it is off by a factor of 5000! Let's ANALYZE the partitioned table and see if that improves the estimate:

That did the trick! Now we can also see data in pg_stats:

How to collect partitioned table statistics?

It has become clear that we need to collect partitioned table statistics — at least for certain queries. On the other hand, autovacuum doesn't do the job for us. So we'll have to schedule explicit ANALYZE runs for our partitioned tables. We can use an operating system scheduler like cron, a PostgreSQL extension like pg_timetable or something you build into your application.

How often should you ANALYZE your partitioned tables? That depends on your use case, but I'd expect that once a day or once a week should be good enough in most cases. After all, you normally partition large tables, and it is unlikely that a big fraction of these data change in a day or a week.

Why are partitioned tables excluded from autovacuum?

The answer is probably that nobody has got around to implementing that yet.

For ordinary tables, PostgreSQL uses the monitoring statistics in pg_stat_all_tables.n_mod_since_analyze to determine if a table needs autoanalyze. Since a partitioned table does not contain any data itself, this number is always zero for a partitioned table. There are two possible approaches for autoanalyze of partitioned tables:

  • update the count of the partitioned table whenever you update n_mod_since_analyze for a partition
  • when autovacuum determines whether to ANALYZE a partitioned table, add up n_mod_since_analyze from all partitions

The first approach would burden all data modifying statements, so I'd say that the second approach is more promising. Perhaps some reader will feel inspired to improve PostgreSQL!

Conclusion

We have seen that autovacuum does not collect partitioned table statistics. We have to collect them with an explicit ANALYZE if we want to have good execution plans.

I am indebted to Andrey Lepikhov for guidance where such statistics actually matter!

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2025
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram