CYBERTEC Logo

Tech preview: How PostgreSQL 12 handles prepared plans

06.2019 / Category: / Tags: |

PostgreSQL 12 is just around the corner and therefore we already want to present some of the new features we like. One important new feature gives users and devops the chance to control the behavior of the PostgreSQL optimizer. Prepared plans are always a major concern (especially people moving from Oracle seem to be most concerned) and therefore it makes sense to discuss the way plans are handled in PostgreSQL 12.

Firing up a PostgreSQL test database

To start I will create a simple table consisting of just two fields:

Then some data is loaded:

Note that 1 million names are identical (“hans”) and just two people are called “paul”. The distribution of data is therefore quite special, which has a major impact as you will see later in this post.

To show how plans can change depending on the setting, an index on “name” is defined as shown in the next listing:

The PostgreSQL query optimizer at work

Let us run a simple query and see what happens:

In this case PostgreSQL decided to ignore the index and go for a sequential scan. It has even seen that the table is already quite large and opted for a parallel query. Still, what we see is a sequential scan. All data in the table has to be processed. Why is that? Remember: Most people in the table have the same name. It is faster to read the entire table and kick out those other ones instead of having to read almost the entire index. The planner figures (correctly) that running a sequential scan will be faster.

What you can take away from this example is that an index is not used because it exists - PostgreSQL uses indexes when they happen to make sense. If we search for a less frequent value, PostgreSQL will decide on using the index and offer us the optimal plan shown in the next listing:

Optimizer statistics: Fueling good performance

If you are looking for good performance, keeping an eye on optimizer statistics is definitely a good idea. The main question now is: Which data does the optimizer keep? pg_stats contains information about each column:

PostgreSQL keeps track over the percentage of NULL entries in the table (null_frac). The average width of a column, the estimated number of distinct values (are all different, are all values the same). Then PostgreSQL keeps a list of the most frequent entries as well as their likelihood. The histogram_bounds column will contain the statistical distribution of data. In our example you will only find entries in this field if you are looking for the “id” column. There are only two names so keeping a histogram is basically pointless. The correlation column will tell us about the physical order of rows on disk. This field can be pretty important because it helps the optimizer to estimate the amount of I/O.

Preparing plans manually

If you send a query to PostgreSQL it is usually planned when the query is sent. However, if you explicitly want to prepare a query, you can make use of the PREPARE / EXECUTE commands. Here is how it works:

As you can see the following query will give us an indexscan:

If we fall back to the more common value we will again get a parallel sequential scan:

Why is that the case? In PostgreSQL life is not so straight forward. Even if we prepare explicitly we will still get a “fresh plan” before a generic plan is created. What is a generic plan? A generic plan is made assuming some constant parameters. The idea is to keep the plan and execute it multiple times in the hope that overall performance goes up due to lower planning overhead. Up to PostgreSQL 11 this process has been a bit “obscure” to most people.

Here is how the “obscure” thing works in detail. There are two ways PostgreSQL can choose for executing a prepared statement:

  • It could create a new plan for every execution that considers the current parameter value. That will lead to the best possible plan but having to plan the query for every execution can remove most of the benefit for an OLTP application, which is to avoid having to plan the same statement over and over again.
  •  It could create a "generic plan" that does not take the parameter values into account. That will avoid re-planning the statement every time, but it can lead to problems during execution if the best plan depends heavily on the parameter values.

By default, PostgreSQL chooses a "middle road": it will generate a "custom plan"
during the first 5 executions of the prepared statement that takes the parameter values
into account. From the sixth execution on, it will check if the generic plan would
have performed as well (by comparing the estimated execution costs of the custom and
the generic plan). If it thinks that the generic plan would have done just as well,
the prepared statement will always use the generic plan from that point on.
PostgreSQL 12 introduces a new variable, which allows users to control the behavior more explicitly. Let us try the same thing again and enforce a generic plan:

What you see here is that the plan is constant and PostgreSQL does not attempt replanning. Planning time will be cut BUT it does not necessarily mean that you always win. You might save on some CPU cycles to optimize the query but this of course means that the plan you are using is not necessarily optimal for your parameters.

plan_cache_mode: Valid parameters

If you want to play around with plan_cache_mode you can try the following values:

"auto", which is the default value, resembles the traditional behavior of letting
PostgreSQL choose whether to use a generic plan or not.
You might ask what good it could be to use a prepared statement with "force_custom_plan".
The main reason is that using prepared statements is the best way to prevent SQL injection
attacks, so it may be worth using them even if you don't save on planning time.
If you want to learn more about PostgreSQL 12, consider checking out our blog post about optimizer support functions.

0 0 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
satya prakash
satya prakash
3 years ago

HI, After 5 attempts, I see bind query is taking more time, what is the alternate solution apart from using force_custom_plan? Thanks

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    1
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram