effective_cache_size: A practical example

07.2020 | Category: How To | Tags: administration, under the hood

07.2020

Category: How To

A lot has been written about effective_cache_size in postgresql.conf and about PostgreSQL performance in general. However, few people know what this famous parameter really does. Let me share some more insights.

Table of Contents

What the PostgreSQL optimizer does

The idea behind SQL is actually quite simple: The end user sends a query and the optimizer is supposed to find the best strategy to execute this query. The output of the optimizer is what people call an “execution plan”. The question now is: What makes one execution plan better than some other plan? What is it that makes a strategy greater than some other strategy? In PostgreSQL everything boils down to the concept of “costs”. The planner will assign costs to every operation. At the end of the day the cheapest plan is selected and executed.

The magic is therefore in the way the optimizer handles costs. That is exactly what effective_cache_size is all about.

Understanding effective_cache_size

To achieve good performance it is important to figure out whether to use an index or not. A question often asked is: Why not always use an index? Traversing and index might not be cheap at and using an index does not mean that there is no need to touch the table as well. The optimizer therefore has to decide whether to go for an index or not.

The way costs are estimated depend on various factors: Amount of I/O needed, number of operators called, number of tuples processed, selectivity, and a lot more. However, what is the cost of I/O? Obviously it makes a difference if data is already in cache or if data has to be read from disk. That brings us to the idea behind effective_cache_size which tells the optimizer how much cache to expect in the system. The important part is that “cache” is not only the amount of memory knows about (this part is pretty clear). The system also has to consider the size of the filesystem cache, CPU caches, and so on. effective_cache_size is the sum of all those caching components. What you will learn in this post is how the optimizer uses this kind of information.

effective_cache_size in action

Before we lose ourselves in theoretical explanation it makes sense to dig into a practical example. For the purpose of this blog I have generated two tables:

test=# CREATE TABLE t_random AS SELECT id, random() AS r 
          FROM generate_series(1, 1000000) AS id ORDER BY random();
SELECT 1000000
test=# CREATE TABLE t_ordered AS SELECT id, random() AS r 
          FROM generate_series(1, 1000000) AS id;
SELECT 1000000
test=# CREATE INDEX idx_random ON t_random (id);
CREATE INDEX
test=# CREATE INDEX idx_ordered ON t_ordered (id);
CREATE INDEX
test=# VACUUM ANALYZE ;
VACUUM

test=# CREATE TABLE t_random AS SELECT id, random() AS r

FROM generate_series(1, 1000000) AS id ORDER BY random();

SELECT 1000000

test=# CREATE TABLE t_ordered AS SELECT id, random() AS r

FROM generate_series(1, 1000000) AS id;

SELECT 1000000

test=# CREATE INDEX idx_random ON t_random (id);

CREATE INDEX

test=# CREATE INDEX idx_ordered ON t_ordered (id);

CREATE INDEX

test=# VACUUM ANALYZE ;

VACUUM

Mind that both tables contain the same set of data. One table is ordered - the other one is not. Let us set the effective_cache_size to a really small value. The optimizer will assume that there is really not much memory in the system:

test=# SET effective_cache_size TO '1 MB';
SET
test=# SET enable_bitmapscan TO off;
SET
test=# explain SELECT * FROM t_random WHERE id < 1000;
                                      QUERY PLAN
---------------------------------------------------------------------------------
  Index Scan using idx_random on t_random (cost=0.42..3611.96 rows=909 width=12)
  Index Cond: (id < 1000)
(2 rows)

test=# SET effective_cache_size TO '1 MB';

SET

test=# SET enable_bitmapscan TO off;

SET

test=# explain SELECT * FROM t_random WHERE id < 1000;

QUERY PLAN

---------------------------------------------------------------------------------

Index Scan using idx_random on t_random (cost=0.42..3611.96 rows=909 width=12)

Index Cond: (id < 1000)

(2 rows)

Normally PostgreSQL would go for a bitmap index scan, but we want to see what happens in case of an index scan. Therefore we turn bitmap scans off (= making them insanely expensive to the optimizer).

Let us compare the plan with the one we have just seen before:

test=# SET effective_cache_size TO '1000 GB';
SET
test=# explain SELECT * FROM t_random WHERE id < 1000;
                                    QUERY PLAN
--------------------------------------------------------------------------------
 Index Scan using idx_random on t_random (cost=0.42..3383.99 rows=909 width=12)
   Index Cond: (id < 1000)
(2 rows)

test=# SET effective_cache_size TO '1000 GB';

SET

test=# explain SELECT * FROM t_random WHERE id < 1000;

QUERY PLAN

--------------------------------------------------------------------------------

Index Scan using idx_random on t_random (cost=0.42..3383.99 rows=909 width=12)

Index Cond: (id < 1000)

(2 rows)

As you can see, the price of the index scan has gone down. Why is that relevant? We have to see costs as “relative”. The absolute number is not important - it is important how expensive a plan is compared to some other plan. If the price of a sequential scan stays the same and the price of an index scan goes down relative to a seq scan PostgreSQL will favor indexing more often than it otherwise would. This is exactly what effective_cache_size at its core is all about: Making index scans more likely if there is a lot of RAM around.

Limitations and special cases

When people talk about postgresql.conf and effective_cache_size they are often not aware of the fact that the parameter does not always work miracles. The following scenario shows when there is no impact:

test=# SET effective_cache_size TO '1 MB';
SET
test=# explain SELECT * FROM t_ordered WHERE id < 1000;
                                       QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)
       Index Cond: (id < 1000)
(2 rows)

test=# SET effective_cache_size TO '1000 GB';
SET
test=# explain SELECT * FROM t_ordered WHERE id < 1000;
                                     QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)
       Index Cond: (id < 1000)
(2 rows)

test=# SET effective_cache_size TO '1 MB';

SET

test=# explain SELECT * FROM t_ordered WHERE id < 1000;

QUERY PLAN

----------------------------------------------------------------------------------

Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)

Index Cond: (id < 1000)

(2 rows)

test=# SET effective_cache_size TO '1000 GB';

SET

test=# explain SELECT * FROM t_ordered WHERE id < 1000;

QUERY PLAN

----------------------------------------------------------------------------------

Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)

Index Cond: (id < 1000)

(2 rows)

The table statistics used by the optimizer contain information about physical “correlation”. If correlation is 1 (= all data is sorted perfectly on disk) effective_cache_size will NOT change anything.

The same is true if the is only one column as shown in the next example:

test=# ALTER TABLE t_random DROP COLUMN r;
ALTER TABLE
test=# SET effective_cache_size TO '1 MB';
SET
test=# explain SELECT * FROM t_ordered WHERE id < 1000;
                                       QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)
       Index Cond: (id < 1000)
(2 rows)

test=# SET effective_cache_size TO '1000 GB';
SET
test=# explain SELECT * FROM t_ordered WHERE id < 1000;
                                      QUERY PLAN
----------------------------------------------------------------------------------
 Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)
       Index Cond: (id < 1000)
(2 rows)

test=# ALTER TABLE t_random DROP COLUMN r;

ALTER TABLE

test=# SET effective_cache_size TO '1 MB';

SET

test=# explain SELECT * FROM t_ordered WHERE id < 1000;

QUERY PLAN

----------------------------------------------------------------------------------

Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)

Index Cond: (id < 1000)

(2 rows)

test=# SET effective_cache_size TO '1000 GB';

SET

test=# explain SELECT * FROM t_ordered WHERE id < 1000;

QUERY PLAN

----------------------------------------------------------------------------------

Index Scan using idx_ordered on t_ordered (cost=0.42..39.17 rows=1014 width=12)

Index Cond: (id < 1000)

(2 rows)

That comes as a surprise to most users and therefore I considered it worth mentioning.

Tuning effective_cache_size in PostgreSQL

I found it useful to use a simple formula to get a rough estimate for a good setting:

effective_cache_size = RAM * 0.7

Some people have also used 0.8 successfully. Of course this is only true when we are talking about a dedicated database server. Feel free to experiment.

Finally...

Read more about performance improvements in Laurenz Albe's blog Join Strategies and Performance in PostgreSQL
Learn more about PostgreSQL - I recommend a post written by my friend and workmate Florian Nadler who has written an excellent post about MobilityDB, a PostgreSQL extension optimized for GPS tracking data.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

One response to “effective_cache_size: A practical example”

gunnar says:

October 27, 2022 at 9:22 am

if effective_cache_size = RAM * 0.7 is TRUE for a dedicated server. One has to divide it by the amount of parralel postgres instances on a multi-instanze postgres server, correct?

Reply

effective_cache_size: A practical example

What the PostgreSQL optimizer does

Understanding effective_cache_size

effective_cache_size in action

Limitations and special cases

Tuning effective_cache_size in PostgreSQL

Finally...

One response to “effective_cache_size: A practical example”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts