CYBERTEC PostgreSQL Logo

Killing performance with PostgreSQL partitioning

07.2023 / Category: , / Tags:

PostgreSQL is a powerful database which supports partitioning. In contrast to Oracle partitioning is part of the PostgreSQL core engine and does not need any additional licensing or extensions. If you migrate from Oracle to PostgreSQL this will be a major advantage.

However, just like any other technology, partitioning is not without risk. While it might help to handle large quantities of data efficiently it can also have downsides which have to be taken into consideration when using this powerful feature.

Querying data without partitions

The most basic example involves a table consisting of two columns:

In this case we have created a table and loaded 10 million rows. The column we want to query is indexed. When looking for a handful of values we will end up with a single index scan:

This is not really a surprise. What is noteworthy here is that the query is executed in a fraction of a millisecond. PostgreSQL can execute thousands of such queries per second per CPU core. On a large server we can easily reach more than 1 million queries per second.

Using PostgreSQL partitioning to store data

Let's deploy the same data using partitions. To prove our point we'll use hash partitions for this purpose:

After deploying the parent table, we can create the partitions. For the sake of simplicity, I have created a set of only 8 partitions which is enough to make a point. The effect you are going to see is even larger if the number of partitions increases:

Once the data structure has been created, we can load the same data we loaded before:

Running the same query as before will result in a far more complex execution plan, which already points to the root cause we're going to discuss:

Wow, what an execution plan. Partitioning has not done us any favors here. The reason is that at this point, we have to scan every single partition to find the data. This causes a significantly higher runtime, since it is simply more work for the database engine.

Conclusion and takeaways

The takeaway is that if the partitioning criteria is not part of the query or if many partitions have to be touched in general, runtime will suffer, which is exactly what happened here. It is also important to note that using too many partitions will significantly increase the time the planner needs to do its job. Having hundreds of partitions can easily lead to a real disaster.

Partitioning has many advantages. However, it has to be used cleverly and not blindly. There is no feature which is always advantageous. The same is true for partitioning and it makes sense to take that into consideration.

For further blogs on the topic of partitioning and performance, see these posts:


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

6 responses to “Killing performance with PostgreSQL partitioning”

  1. The choice of partitioning strategy depends on the use case. For example, if you have a table with 2 billion rows and you mostly query it using the user_id column, then you might want to use hash partitioning. This is because hash partitioning will allow the database to quickly scan the relevant partition when you pass the user_id as a WHERE clause value.

    In your query, you used SELECT * wihtout any where clause, which means that the database will have to scan all of the rows in all the table. This will make the query slow, regardless of the partitioning strategy.

    • I guess you didn't read carefully enough - there is a WHERE condition. The query is not very slow, but quite a bit slower than on the unpartitioned table. That is normal.

  2. I'm kind of surprised that this happens with an equality test. For a range test, sure you can't eliminate any fragment; but the optimiser could (and IMHO should!) apply 454650%8 to the query plan, so for an equality test I would expect fragment elimination.

    Maybe in 16...?

    • The column used in the WHERE condition is not the partitioning column. How should PostgreSQL know in which partition a matching column lies?

Leave a Reply

Your email address will not be published. Required fields are marked *

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram