CYBERTEC PostgreSQL Logo

effective_cache_size: A practical example

07.2020 / Category: / Tags: |

A lot has been written about effective_cache_size in postgresql.conf and about PostgreSQL performance in general. However, few people know what this famous parameter really does. Let me share some more insights.

What the PostgreSQL optimizer does

The idea behind SQL is actually quite simple: The end user sends a query and the optimizer is supposed to find the best strategy to execute this query. The output of the optimizer is what people call an “execution plan”. The question now is: What makes one execution plan better than some other plan? What is it that makes a strategy greater than some other strategy? In PostgreSQL everything boils down to the concept of “costs”. The planner will assign costs to every operation. At the end of the day the cheapest plan is selected and executed.

The magic is therefore in the way the optimizer handles costs. That is exactly what effective_cache_size is all about.

Understanding effective_cache_size

To achieve good performance it is important to figure out whether to use an index or not. A question often asked is: Why not always use an index? Traversing and index might not be cheap at and using an index does not mean that there is no need to touch the table as well. The optimizer therefore has to decide whether to go for an index or not.

The way costs are estimated depend on various factors: Amount of I/O needed, number of operators called, number of tuples processed, selectivity, and a lot more. However, what is the cost of I/O? Obviously it makes a difference if data is already in cache or if data has to be read from disk. That brings us to the idea behind effective_cache_size which tells the optimizer how much cache to expect in the system. The important part is that “cache” is not only the amount of memory knows about (this part is pretty clear). The system also has to consider the size of the filesystem cache, CPU caches, and so on. effective_cache_size is the sum of all those caching components. What you will learn in this post is how the optimizer uses this kind of information.

effective_cache_size in action

Before we lose ourselves in theoretical explanation it makes sense to dig into a practical example. For the purpose of this blog I have generated two tables:

Mind that both tables contain the same set of data. One table is ordered - the other one is not. Let us set the effective_cache_size to a really small value. The optimizer will assume that there is really not much memory in the system:

Normally PostgreSQL would go for a bitmap index scan, but we want to see what happens in case of an index scan. Therefore we turn bitmap scans off (= making them insanely expensive to the optimizer).

Let us compare the plan with the one we have just seen before:

As you can see, the price of the index scan has gone down. Why is that relevant? We have to see costs as “relative”. The absolute number is not important - it is important how expensive a plan is compared to some other plan. If the price of a sequential scan stays the same and the price of an index scan goes down relative to a seq scan PostgreSQL will favor indexing more often than it otherwise would. This is exactly what effective_cache_size at its core is all about: Making index scans more likely if there is a lot of RAM around.

Limitations and special cases

When people talk about postgresql.conf and effective_cache_size they are often not aware of the fact that the parameter does not always work miracles. The following scenario shows when there is no impact:

The table statistics used by the optimizer contain information about physical “correlation”. If correlation is 1 (= all data is sorted perfectly on disk) effective_cache_size will NOT change anything.

The same is true if the is only one column as shown in the next example:

That comes as a surprise to most users and therefore I considered it worth mentioning.

Tuning effective_cache_size in PostgreSQL

I found it useful to use a simple formula to get a rough estimate for a good setting:

effective_cache_size = RAM * 0.7

Some people have also used 0.8 successfully. Of course this is only true when we are talking about a dedicated database server. Feel free to experiment.

Finally...


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

5 1 vote
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
gunnar
gunnar
1 year ago

if effective_cache_size = RAM * 0.7 is TRUE for a dedicated server. One has to divide it by the amount of parralel postgres instances on a multi-instanze postgres server, correct?

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram