Automatic partition creation in PostgreSQL

02.2022 | Category: Performance | Tags: automation, partitioning, sql help, trigger

02.2022

Category: Performance

Tags: automation, partitioning, sql help, trigger

options for automatic partition creation — © Laurenz Albe 2022

Table of Contents

Table partitioning is one of the best-liked features out of the more recent PostgreSQL developments. However, there is no support for automatic partition creation yet. This article shows what you can do to remedy that.

Use cases for automatic partition creation

There are essentially two use cases:

Create partitions triggered by time, for example for the next month at the end of the current month.
Create partitions on demand if a row is inserted that does not fit in any existing partition.

I will call the first option time-triggered partitioning and the latter on-demand partitioning.

Automatic partition creation for time-triggered partitioning

The lack of PostgreSQL support in this area is an inconvenience at worst. There are several ways to work around the limitation:

Using the operating system scheduler

You can use the operating system scheduler (cron on Unix-like systems or “Task Scheduler” on Windows). The job would connect to the database using psql and run the appropriate CREATE TABLE and ALTER TABLE ... ATTACH PARTITION statements.

Typically, you would want to schedule such a job directly on the database server machine. Sometimes you cannot do that, for example, because you are using a hosted database and have no access to the operating system. In that case, you can schedule the job on a different machine.

Using a job scheduler in the database

PostgreSQL doesn't have a built-in job scheduling system. However, there are extensions like pg_timetable or pg_cron that allow you to schedule database jobs. Installing a third-party extension adds another dependency to your software architecture, and it may not be possible on a hosted database. On the other hand, it will make your job independent from the operating system scheduler and is less likely to be forgotten when you move the database.

Using pg_partman

pg_partman is a PostgreSQL extension that specializes in partition management. It was useful and necessary in the bad old days before v10 introduced “declarative partitioning”: it created triggers and constraints for you. Today, most of this functionality is unnecessary, but the support for automatic partition creation can still come in handy.

Automatic partition creation for on-demand partitioning

This is more difficult, and here the lack of support from PostgreSQL core is more painful.

The naïve approach to try is to have a BEFORE INSERT trigger that creates and attaches a new partition. However, that will cause one of the following “object in use” errors:

ERROR:  cannot CREATE TABLE .. PARTITION OF 'tab'
        because it is being used by active queries in this session

1 2	ERROR: cannot CREATE TABLE .. PARTITION OF 'tab' because it is being used by active queries in this session

or

ERROR:  cannot ALTER TABLE 'tab'
        because it is being used by active queries in this session

1 2	ERROR: cannot ALTER TABLE 'tab' because it is being used by active queries in this session

Essentially, by the time the INSERT has started, you cannot modify the table definition any more.

`LISTEN` and `NOTIFY` as a workaround

What we need is asynchronous processing: you have to create the new partition in a different transaction. Since we certainly cannot create the partition before we INSERT, it has to be afterwards. However, the partition has to exist at the time of the INSERT, so we would seem to be at an impasse.

But there is another option: instead of creating the partition in a single step, we can first create a table and then attach it later, as a partition of the partitioned table. The sequence of events could then be like this:

in the BEFORE INSERT trigger, create a new table that is not yet a partition (skipped if the table already exists)
insert the new row into the prospective partition instead of into the partitioned table
at some later time, attach the newly-created table as a partition

The remaining question is how to perform the attachment “at some later time”. There is a convenient feature in PostgreSQL for asynchronous processing: LISTEN and NOTIFY. A session can register for notifications on a channel (often a table name) with LISTEN and will then be notified asynchronously whenever another backend calls NOTIFY for that channel. The sending backend can add a message string to the notification.

We can use this together with a row level `BEFORE INSERT` trigger as follows:

create a new table that will become a new partition (if it does not yet exist)
insert the new row into that table
if a new table was created in the first step, NOTIFY with the new partition key as the argument
skip the original INSERT, since we already inserted the row in the partition

Note that this requires that we have a default partition, otherwise inserting a row that does not match an existing partition will fail.

In a different session, a daemon process waits for notifications and attaches the new table to the partitioned table.

A sample implementation of automatic partition creation on demand

Here is the definition of the partitioned table:

CREATE TABLE tab (
   id bigint GENERATED ALWAYS AS IDENTITY,
   ts timestamp NOT NULL,
   data text
) PARTITION BY LIST ((ts::date));

CREATE TABLE tab_def PARTITION OF tab DEFAULT;

1

2

3

4

5

6

7

CREATE TABLE tab (

id bigint GENERATED ALWAYS AS IDENTITY,

ts timestamp NOT NULL,

data text

) PARTITION BY LIST ((ts::date));

CREATE TABLE tab_def PARTITION OF tab DEFAULT;

I chose list partitioning to simplify the listener code, although that is not required.

The trigger function

The trigger looks like this:

CREATE FUNCTION part_trig() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   BEGIN
      /* try to create a table for the new partition */
      EXECUTE
         format(
            'CREATE TABLE %I (LIKE tab INCLUDING INDEXES)',
            'tab_' || to_char(NEW.ts, 'YYYY-MM-DD')
         );

      /*
       * tell listener to attach the partition
       * (only if a new table was created)
       */
      EXECUTE
         format(
            'NOTIFY tab, %L',
            to_char(NEW.ts, 'YYYY-MM-DD')
         );
   EXCEPTION
      WHEN duplicate_table THEN
         NULL;  -- ignore
   END;

   /* insert into the new partition */
   EXECUTE
      format(
         'INSERT INTO %I VALUES ($1.*)',
         'tab_' || to_char(NEW.ts, 'YYYY-MM-DD')
      )
      USING NEW;

   /* skip insert into the partitioned table */
   RETURN NULL;
END;$$;

CREATE TRIGGER part_trig
   BEFORE INSERT ON TAB FOR EACH ROW
   WHEN (pg_trigger_depth() < 1)
   EXECUTE FUNCTION part_trig();

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

CREATE FUNCTION part_trig() RETURNS trigger

LANGUAGE plpgsql AS

$$BEGIN

BEGIN

/* try to create a table for the new partition */

EXECUTE

format(

'CREATE TABLE %I (LIKE tab INCLUDING INDEXES)',

'tab_' || to_char(NEW.ts, 'YYYY-MM-DD')

);

/*

* tell listener to attach the partition

* (only if a new table was created)

*/

EXECUTE

format(

'NOTIFY tab, %L',

to_char(NEW.ts, 'YYYY-MM-DD')

);

EXCEPTION

WHEN duplicate_table THEN

NULL; -- ignore

END;

/* insert into the new partition */

EXECUTE

format(

'INSERT INTO %I VALUES ($1.*)',

'tab_' || to_char(NEW.ts, 'YYYY-MM-DD')

)

USING NEW;

/* skip insert into the partitioned table */

RETURN NULL;

END;$$;

CREATE TRIGGER part_trig

BEFORE INSERT ON TAB FOR EACH ROW

WHEN (pg_trigger_depth() < 1)

EXECUTE FUNCTION part_trig();

To understand why the trigger function returns NULL, read my blog post on the topic. The WHEN clause in the trigger definition avoids infinite recursion: without it, inserting a row into a partition would call the trigger function again.

The listener

For the listener, we need to use a PostgreSQL client API that supports asynchronous notification. I chose to use C:

#include <libpq-fe.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/select.h>
#include <errno.h>

static int attach_partition(PGconn *conn,
                            char *table,
                            char *datestr)
{
    PGresult *res;
    char *esc_tab, *esc_part, *esc_date;
    /* identifiers are at most 63 bytes long */
    char stmt[400], part[64];

    /* escape table and partition name */
    esc_tab = PQescapeIdentifier(conn, table, 63);
    snprintf(part, 63, '%s_%s', table, datestr);
    esc_part = PQescapeIdentifier(conn, part, 63);
    esc_date = PQescapeLiteral(conn, datestr, 10);

    /* construct statement */
    sprintf(stmt,
            'ALTER TABLE %s ATTACH PARTITION %s FOR VALUES IN (%s)',
            esc_tab,
            esc_part,
            esc_date);
    free(esc_tab);
    free(esc_part);
    free(esc_date);

    /* execute statement */
    if ((res = PQexec(conn, stmt)) == NULL)
    {
        fprintf(stderr,
                'Out of memory sending statementn');
        return 0;
    }
    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
        fprintf(stderr,
                'Error attaching partition: %sn',
                PQresultErrorMessage(res));
        return 0;
    }

    PQclear(res);

    return 1;
}

int main (int argc, char **argv)
{
    PGconn *conn;
    int sock;
    PGresult *res;

    /* connect to PostgreSQL with default parameters */
    if ((conn = PQconnectdb('application_name=listener')) == NULL)
    {
        fprintf(stderr,
                'Out of memory connecting to PostgreSQLn');
        return 1;
    }
    if (PQstatus(conn) != CONNECTION_OK)
    {
        fprintf(stderr,
                'Error connecting to PostgreSQL: %sn',
                PQerrorMessage(conn));
        PQfinish(conn);
        return 1;
    }

    /* get network socket for the connection */
    if ((sock = PQsocket(conn)) < 0)
    {
        fprintf(stderr,
                'Error getting connection network socketn');
        PQfinish(conn);
        return 1;
    }

    /* listen on a channel */
    if ((res = PQexec(conn, 'LISTEN tab')) == NULL)
    {
        fprintf(stderr,
                'Out of memory sending LISTENn');
        PQfinish(conn);
        return 1;
    }
    if (PQresultStatus(res) != PGRES_COMMAND_OK)
    {
        fprintf(stderr,
                'Error listening on channel: %sn',
                PQresultErrorMessage(res));
        PQfinish(conn);
        return 1;
    }
    PQclear(res);

    while(1)
    {
        fd_set sockets;
        struct pgNotify *note;

        /* block waiting for activity on the network socket */
        FD_ZERO(&sockets);
        FD_SET(sock, &sockets);
        errno = 0;
        if (select(sock + 1, &sockets, NULL, NULL, NULL) < 0)
        {
            perror('Error waiting for notification');
            PQfinish(conn);
            return 1;
        }

        /* consume the input */
        if (PQconsumeInput(conn) == 0)
        {
            fprintf(stderr,
                    'Error receiving data: %sn',
                    PQerrorMessage(conn));
            PQfinish(conn);
            return 1;
        }

        /* check for notifications */
        note = PQnotifies(conn);

        if (note != NULL)
        {
            if (!attach_partition(conn, note->relname, note->extra))
            {
                PQfinish(conn);
                return 1;
            }

            PQfreemem(note);
        }
    }
}

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

#include <libpq-fe.h>

#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#include <sys/select.h>

#include <errno.h>

static int attach_partition(PGconn *conn,

char *table,

char *datestr)

{

PGresult *res;

char *esc_tab, *esc_part, *esc_date;

/* identifiers are at most 63 bytes long */

char stmt[400], part[64];

/* escape table and partition name */

esc_tab = PQescapeIdentifier(conn, table, 63);

snprintf(part, 63, '%s_%s', table, datestr);

esc_part = PQescapeIdentifier(conn, part, 63);

esc_date = PQescapeLiteral(conn, datestr, 10);

/* construct statement */

sprintf(stmt,

'ALTER TABLE %s ATTACH PARTITION %s FOR VALUES IN (%s)',

esc_tab,

esc_part,

esc_date);

free(esc_tab);

free(esc_part);

free(esc_date);

/* execute statement */

if ((res = PQexec(conn, stmt)) == NULL)

{

fprintf(stderr,

'Out of memory sending statementn');

return 0;

}

if (PQresultStatus(res) != PGRES_COMMAND_OK)

{

fprintf(stderr,

'Error attaching partition: %sn',

PQresultErrorMessage(res));

return 0;

}

PQclear(res);

return 1;

}

int main (int argc, char **argv)

{

PGconn *conn;

int sock;

PGresult *res;

/* connect to PostgreSQL with default parameters */

if ((conn = PQconnectdb('application_name=listener')) == NULL)

{

fprintf(stderr,

'Out of memory connecting to PostgreSQLn');

return 1;

}

if (PQstatus(conn) != CONNECTION_OK)

{

fprintf(stderr,

'Error connecting to PostgreSQL: %sn',

PQerrorMessage(conn));

PQfinish(conn);

return 1;

}

/* get network socket for the connection */

if ((sock = PQsocket(conn)) < 0)

{

fprintf(stderr,

'Error getting connection network socketn');

PQfinish(conn);

return 1;

}

/* listen on a channel */

if ((res = PQexec(conn, 'LISTEN tab')) == NULL)

{

fprintf(stderr,

'Out of memory sending LISTENn');

PQfinish(conn);

return 1;

}

if (PQresultStatus(res) != PGRES_COMMAND_OK)

{

fprintf(stderr,

'Error listening on channel: %sn',

PQresultErrorMessage(res));

PQfinish(conn);

return 1;

}

PQclear(res);

while(1)

{

fd_set sockets;

struct pgNotify *note;

/* block waiting for activity on the network socket */

FD_ZERO(&sockets);

FD_SET(sock, &sockets);

errno = 0;

if (select(sock + 1, &sockets, NULL, NULL, NULL) < 0)

{

perror('Error waiting for notification');

PQfinish(conn);

return 1;

}

/* consume the input */

if (PQconsumeInput(conn) == 0)

{

fprintf(stderr,

'Error receiving data: %sn',

PQerrorMessage(conn));

PQfinish(conn);

return 1;

}

/* check for notifications */

note = PQnotifies(conn);

if (note != NULL)

{

if (!attach_partition(conn, note->relname, note->extra))

{

PQfinish(conn);

return 1;

}

PQfreemem(note);

}

}

}

Discussion of performance and robustness

The sample code above should be free from race conditions. However, performance will not be good. Let's compare the speed of 100000 inserts with and without the trigger:

test=# TRUNCATE tab;
TRUNCATE TABLE
test=# timing
Timing is on.
test=# INSERT INTO tab (ts, data)
       SELECT clock_timestamp(), 'something'
       FROM generate_series(1, 100000);
INSERT 0 0
Time: 3354.275 ms (00:03.354)

test=# ALTER TABLE tab DISABLE TRIGGER part_trig;
ALTER TABLE

test=# TRUNCATE tab;
TRUNCATE TABLE
Time: 20.005 ms
test=# INSERT INTO tab (ts, data)
       SELECT clock_timestamp(), 'something'
       FROM generate_series(1, 100000);
INSERT 0 100000
Time: 120.869 ms

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

test=# TRUNCATE tab;

TRUNCATE TABLE

test=# timing

Timing is on.

test=# INSERT INTO tab (ts, data)

SELECT clock_timestamp(), 'something'

FROM generate_series(1, 100000);

INSERT 0 0

Time: 3354.275 ms (00:03.354)

test=# ALTER TABLE tab DISABLE TRIGGER part_trig;

ALTER TABLE

test=# TRUNCATE tab;

TRUNCATE TABLE

Time: 20.005 ms

test=# INSERT INTO tab (ts, data)

SELECT clock_timestamp(), 'something'

FROM generate_series(1, 100000);

INSERT 0 100000

Time: 120.869 ms

So while the code works, the trigger incurs an overhead of about 32 milliseconds per row, which is quite a lot. The whole operation takes 28 times as long as without the trigger, but that factor is only so high because there are no indexes on the table. It would be possible to reduce the overhead by writing the trigger function in C as well, but that would add still more complexity.

Conclusion

While it is fairly easy to work around the lack of time-triggered partition creation, it is more difficult to create partitions on demand when rows are inserted. We found a solution using triggers, LISTEN and NOTIFY, but the solution was complicated and the performance impact was severe. It would be good to get better support from PostgreSQL core for this!

If you are interested in partitioning, you may want to read our articles about partition management and partitioning live data on foot and with pg_rewrite.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

14 responses to “Automatic partition creation in PostgreSQL”

Luca Ferrari says:

February 3, 2022 at 4:37 pm

A clever solution, despite the problem of performance.
Just out of curiosity: is there a reason why you "EXECUTE" a "NOTIFY" instead of using pg_notify?

Reply
- laurenz says:
  
  February 3, 2022 at 6:06 pm
  
  Simple: I didn't think of it. It should work just the same (and be more elegant).
  
  Reply
cmo says:

June 28, 2022 at 5:14 am

Could you simply create the partitions ahead of time, so that they are ready and waiting for the data?

Reply
- laurenz says:
  
  June 28, 2022 at 6:28 am
  
  Sure, if you can do that, everything is fine.
  My article explores what you can do if it is hard to figure out ahead of time which partitions will be needed.
  
  Reply
Michael Podyachev says:

July 5, 2022 at 6:52 pm

Thanks for the article! We're trying to figure out a simple and robust way to automate range partition maintenance here and I'd be very curious to hear your thoughts.

First, it seems like a good idea to create one "spare" partition ahead of time for the next range of keys, so that when the "current" partition fills up, the rows can start flowing into the next one and it will already have the right constraints. At this point, we need to go create the new "next" partition.
For peace of mind, I'd like that to happen automatically within the database, not through an external cron job which might not be running properly. We'll probably create a "DEFAULT" partition as a fallback but we actually don't want it to ever contain any rows.

So for creating the next partition upon the first INSERT, maybe we can work around the problem of a trigger being unable to modify the definition of the table it's firing on by using a deferred constraint trigger.

Specifically, we define a CONSTRAINT trigger AFTER INSERT on the "next" partition DEFERRABLE INITIALLY DEFERRED FOR EACH ROW.
In the trigger function, it creates the new "next" partition, creates the same trigger on that, and drops itself from the current table. So in effect, the trigger only runs once when the partition becomes "current", and so it should have no performance impact.

Do you see any problems with this approach?
Thank you!

Reply
- laurenz says:
  
  July 6, 2022 at 5:44 am
  
  I didn't try your approach, but I doubt that it will work. Specifically, dropping a trigger on the table being modified should fail for the same reasons as creating a new partition.
  Frankly, I doubt that your construction would be more robust and reliable than a cron job.
  I also want to warn you not to use a default partition if you want to create new partitions in the future: a conflicting value in the default partition will prevent creating a new partition.
  
  Reply
  - Michael Podyachev says:
    
    July 6, 2022 at 10:01 am
    
    Thanks a lot for your reply! I didn't realize the DEFAULT partition would do that; but evidently it does and also seems to be triggering more exclusive locks than we'd want.
    
    As for the constraint trigger, I think it works because by the time it fires (on COMMIT), the original INSERT statement has completed and released whatever "schema stability" lock it was holding on its table, and only holds on to the row locks on the inserted tuples.
    
    Reply
    - laurenz says:
      
      July 6, 2022 at 10:12 am
      
      You are probably right about the deferred constraint!
      
      Reply
Marco Pegoraro says:

July 9, 2022 at 4:16 pm

You could try insert and fallback on creating the partition only when it fails. It will avoid trying to create a table each time.

Reply
Vikram kumar says:

October 11, 2022 at 9:00 am

Hello ,
Can we do list partiioning based on a varchar column using pg_partman extension.
I am getting this
SQL Error [P0001]: ERROR: pg_partman only supports partitioning of data types that are integer or date/timestamp.
CONTEXT: PL/pgSQL function create_parent

Any workaround ,please let me know

Reply
- laurenz says:
  
  October 11, 2022 at 9:15 am
  
  The error message seems pretty clear.
  You should ask the developers of the software for workarounds.
  
  Reply
Vyacheslav says:

January 18, 2023 at 5:51 pm

Great read, thanks Laurenz!

Reply
PgM says:

December 14, 2023 at 7:47 am

Maybe another solution for on-demand partitioning that could work for some setups could be the following?

- Trigger on the default partition (not the whole partitioned table) that, if it receives a record, creates a new table (if a table for that partition does not exist yet) and puts a copy of that record into that table
- Scheduled job that every e.g. 5 minutes checks if there are records in the default partition and if yes truncates the default partition and attaches the tables that the trigger created

If the workload is not too bursty this might be a solution for some? Only drawback I can think of is the short interruptions that the job causes, but those should be super short? What do you think?

Reply
- laurenz says:
  
  December 14, 2023 at 7:52 am
  
  Sounds complicated, but why not.
  
  Reply

Leave a Reply Cancel reply

Stay tuned with our

NEWSLETTER

CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0

office@cybertec-postgresql.com

ISO_27001_Badge

Customer Support

Support Platform

Services

Support CYBERTEC Partner PostgreSQL Books

Company

STAY TUNED WITH OUR NEWSLETTER

Get the newest PostgreSQL Info & Tools

Data Protection Policy Terms and conditions Terms of Service Imprint

©

2025

CYBERTEC PostgreSQL International GmbH