Changing histogram sizes

10.2013 / Category: / Tags: |

Histograms are a common way for a relational database to store statistical information about data. What is this kind of statistical information good for? Well, statistics are the rocket fuel behind performance. If statistics are bad, the optimizer will come up with bad decisions and poor execution plans will be the result. The PostgreSQL optimizer is a damn sophisticated piece of software and it provides end users with good plans in general – still, without statistics even the best SQL optimizer will yield bad performance.

The good news is that when things are a little tight, the end user can have some influence on the way PostgreSQL stores statistics. The ALTER TABLE ... SET STATISTICS command will allow users to define the size of the internal histograms storing statistical information.

Before we dig into those details we create a simple table and add some data to it:

This will create a table containing one million numbers. To make sure that the optimizer will know about the content of the table, we can run ANALYZE (a command in charge of creating statistics):

How statistics is used

Whenever PostgreSQL has to plan a query, it will rely on the statistics we have made just before. Let us take a look at a very simplistic plan:

We try to find all values lower than 10.000. The planner estimates that we will find roughly 9.100 rows. Overall this is pretty good and totally sufficient to plan the query efficiently. Remember, estimating statistics is not about having a final answer right away – it is about having a reasonably well estimate.

Let us see, which statistics PostgreSQL has made. To do so we can take a look at a system view called pg_stats:

If you are interested in the content of pg_stats we recommend to take a look at the PostgreSQL documentation. It outlines nicely how things work in this area:

Changing the size of the histogram

Sometimes it can be quite beneficial to change the size of a column's histogram. Reducing the size if the histogram will make estimates a lot less precise – but, it will reduce the time needed to plan a query. If you are dealing with a table with 1 billion rows and all you do is hitting the primary key to fetch exactly one row at a time, a narrow histogram is definitely not a big issue.

However, if you are dealing with more complex operations, keeping the default to 100 or to raise this value can be quite beneficial.

This is how you can change the size of the histogram of a specific column:

Once this has been done, we can recreate the stats:

Querying the stats table reveals what PostgreSQL knows about the column:

Note that the histogram consists of just 11 values now (which gives us 10 bars).

The impact of a change

We can toy around with this setting now and see what happens to the plan made by PostgreSQL once the statistics target has been risen to 1.000:

In our example the estimate is a little bit more precise than initially. However, this comes at a cost. Planning time has risen – the plan is still the same (because there is no other choice for the planner anyway).

Btw, keep in mind that those values should not be changed on a daily basis and in many cases it is not even necessary to change this stuff at all – it is nice to have the opportunity to do so, however, when beneficial.

Visit us on facebook:

0 0 votes
Article Rating
Notify of
Inline Feedbacks
View all comments
CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf

+43 (0) 2622 93022-0

Get the newest PostgreSQL Info & Tools

    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    CYBERTEC PostgreSQL International GmbH
    Would love your thoughts, please comment.x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram