CYBERTEC Logo

Automatic partition creation in PostgreSQL

02.2022 / Category: / Tags: | | |
options for automatic partition creation
© Laurenz Albe 2022

 

Table partitioning is one of the best-liked features out of the more recent PostgreSQL developments. However, there is no support for automatic partition creation yet. This article shows what you can do to remedy that.

Use cases for automatic partition creation

There are essentially two use cases:

  • Create partitions triggered by time, for example for the next month at the end of the current month.
  • Create partitions on demand if a row is inserted that does not fit in any existing partition.

I will call the first option time-triggered partitioning and the latter on-demand partitioning.

Automatic partition creation for time-triggered partitioning

The lack of PostgreSQL support in this area is an inconvenience at worst. There are several ways to work around the limitation:

Using the operating system scheduler

You can use the operating system scheduler (cron on Unix-like systems or “Task Scheduler” on Windows). The job would connect to the database using psql and run the appropriate CREATE TABLE and ALTER TABLE ... ATTACH PARTITION statements.

Typically, you would want to schedule such a job directly on the database server machine. Sometimes you cannot do that, for example, because you are using a hosted database and have no access to the operating system. In that case, you can schedule the job on a different machine.

Using a job scheduler in the database

PostgreSQL doesn't have a built-in job scheduling system. However, there are extensions like pg_timetable or pg_cron that allow you to schedule database jobs. Installing a third-party extension adds another dependency to your software architecture, and it may not be possible on a hosted database. On the other hand, it will make your job independent from the operating system scheduler and is less likely to be forgotten when you move the database.

Using pg_partman

pg_partman is a PostgreSQL extension that specializes in partition management. It was useful and necessary in the bad old days before v10 introduced “declarative partitioning”: it created triggers and constraints for you. Today, most of this functionality is unnecessary, but the support for automatic partition creation can still come in handy.

Automatic partition creation for on-demand partitioning

This is more difficult, and here the lack of support from PostgreSQL core is more painful.

The naïve approach to try is to have a BEFORE INSERT trigger that creates and attaches a new partition. However, that will cause one of the following “object in use” errors:

or

Essentially, by the time the INSERT has started, you cannot modify the table definition any more.

LISTEN and NOTIFY as a workaround

What we need is asynchronous processing: you have to create the new partition in a different transaction. Since we certainly cannot create the partition before we INSERT, it has to be afterwards. However, the partition has to exist at the time of the INSERT, so we would seem to be at an impasse.

But there is another option: instead of creating the partition in a single step, we can first create a table and then attach it later, as a partition of the partitioned table. The sequence of events could then be like this:

  • in the BEFORE INSERT trigger, create a new table that is not yet a partition (skipped if the table already exists)
  • insert the new row into the prospective partition instead of into the partitioned table
  • at some later time, attach the newly-created table as a partition

The remaining question is how to perform the attachment “at some later time”. There is a convenient feature in PostgreSQL for asynchronous processing: LISTEN and NOTIFY. A session can register for notifications on a channel (often a table name) with LISTEN and will then be notified asynchronously whenever another backend calls NOTIFY for that channel. The sending backend can add a message string to the notification.

We can use this together with a row level BEFORE INSERT trigger as follows:

  1. create a new table that will become a new partition (if it does not yet exist)
  2. insert the new row into that table
  3. if a new table was created in the first step, NOTIFY with the new partition key as the argument
  4. skip the original INSERT, since we already inserted the row in the partition

 

Note that this requires that we have a default partition, otherwise inserting a row that does not match an existing partition will fail.

In a different session, a daemon process waits for notifications and attaches the new table to the partitioned table.

A sample implementation of automatic partition creation on demand

Here is the definition of the partitioned table:

I chose list partitioning to simplify the listener code, although that is not required.

The trigger function

The trigger looks like this:

To understand why the trigger function returns NULL, read my blog post on the topic. The WHEN clause in the trigger definition avoids infinite recursion: without it, inserting a row into a partition would call the trigger function again.

The listener

For the listener, we need to use a PostgreSQL client API that supports asynchronous notification. I chose to use C:

Discussion of performance and robustness

The sample code above should be free from race conditions. However, performance will not be good. Let's compare the speed of 100000 inserts with and without the trigger:

So while the code works, the trigger incurs an overhead of about 32 milliseconds per row, which is quite a lot. The whole operation takes 28 times as long as without the trigger, but that factor is only so high because there are no indexes on the table. It would be possible to reduce the overhead by writing the trigger function in C as well, but that would add still more complexity.

Conclusion

While it is fairly easy to work around the lack of time-triggered partition creation, it is more difficult to create partitions on demand when rows are inserted. We found a solution using triggers, LISTEN and NOTIFY, but the solution was complicated and the performance impact was severe. It would be good to get better support from PostgreSQL core for this!

If you are interested in partitioning, you may want to read our articles about partition management and partitioning live data on foot and with pg_rewrite.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes
Article Rating
Subscribe
Notify of
guest
14 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Vyacheslav
Vyacheslav
1 year ago

Great read, thanks Laurenz!

Vikram kumar
Vikram kumar
1 year ago

Hello ,
Can we do list partiioning based on a varchar column using pg_partman extension.
I am getting this
SQL Error [P0001]: ERROR: pg_partman only supports partitioning of data types that are integer or date/timestamp.
CONTEXT: PL/pgSQL function create_parent

Any workaround ,please let me know

laurenz
laurenz
1 year ago
Reply to  Vikram kumar

The error message seems pretty clear.
You should ask the developers of the software for workarounds.

Marco Pegoraro
1 year ago

You could try insert and fallback on creating the partition only when it fails. It will avoid trying to create a table each time.

Michael Podyachev
Michael Podyachev
1 year ago

Thanks for the article! We're trying to figure out a simple and robust way to automate range partition maintenance here and I'd be very curious to hear your thoughts.

First, it seems like a good idea to create one "spare" partition ahead of time for the next range of keys, so that when the "current" partition fills up, the rows can start flowing into the next one and it will already have the right constraints. At this point, we need to go create the new "next" partition.
For peace of mind, I'd like that to happen automatically within the database, not through an external cron job which might not be running properly. We'll probably create a "DEFAULT" partition as a fallback but we actually don't want it to ever contain any rows.

So for creating the next partition upon the first INSERT, maybe we can work around the problem of a trigger being unable to modify the definition of the table it's firing on by using a deferred constraint trigger.

Specifically, we define a CONSTRAINT trigger AFTER INSERT on the "next" partition DEFERRABLE INITIALLY DEFERRED FOR EACH ROW.
In the trigger function, it creates the new "next" partition, creates the same trigger on that, and drops itself from the current table. So in effect, the trigger only runs once when the partition becomes "current", and so it should have no performance impact.

Do you see any problems with this approach?
Thank you!

laurenz
laurenz
1 year ago

I didn't try your approach, but I doubt that it will work. Specifically, dropping a trigger on the table being modified should fail for the same reasons as creating a new partition.
Frankly, I doubt that your construction would be more robust and reliable than a cron job.
I also want to warn you not to use a default partition if you want to create new partitions in the future: a conflicting value in the default partition will prevent creating a new partition.

Michael Podyachev
Michael Podyachev
1 year ago
Reply to  laurenz

Thanks a lot for your reply! I didn't realize the DEFAULT partition would do that; but evidently it does and also seems to be triggering more exclusive locks than we'd want.

As for the constraint trigger, I think it works because by the time it fires (on COMMIT), the original INSERT statement has completed and released whatever "schema stability" lock it was holding on its table, and only holds on to the row locks on the inserted tuples.

laurenz
laurenz
1 year ago

You are probably right about the deferred constraint!

cmo
cmo
1 year ago

Could you simply create the partitions ahead of time, so that they are ready and waiting for the data?

laurenz
laurenz
1 year ago
Reply to  cmo

Sure, if you can do that, everything is fine.
My article explores what you can do if it is hard to figure out ahead of time which partitions will be needed.

Luca Ferrari
2 years ago

A clever solution, despite the problem of performance.
Just out of curiosity: is there a reason why you "EXECUTE" a "NOTIFY" instead of using pg_notify?

laurenz
laurenz
2 years ago
Reply to  Luca Ferrari

Simple: I didn't think of it. It should work just the same (and be more elegant).

PgM
PgM
4 months ago

Maybe another solution for on-demand partitioning that could work for some setups could be the following?

- Trigger on the default partition (not the whole partitioned table) that, if it receives a record, creates a new table (if a table for that partition does not exist yet) and puts a copy of that record into that table
- Scheduled job that every e.g. 5 minutes checks if there are records in the default partition and if yes truncates the default partition and attaches the tables that the trigger created

If the workload is not too bursty this might be a solution for some? Only drawback I can think of is the short interruptions that the job causes, but those should be super short? What do you think?

laurenz
laurenz
4 months ago
Reply to  PgM

Sounds complicated, but why not.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    14
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram