pg_rewrite: PostgreSQL Table Partitioning

12.2021 | Category: How To | Tags: data, partitioning

12.2021

Category: How To

PostgreSQL table partitioning is by far one of the most popular new PostgreSQL features. Many people look to partitioned tables as a way to improve performance and broaden scalability. However, partitioning comes with a little problem: How can you partition an existing table without locking up the database? The answer is: pg_rewrite can help you with PostgreSQL table partitioning. Here you will learn how to implement pg_rewrite to help you solve partitioning problems in the most elegant way possible.

Table of Contents

Installing pg_rewrite

pg_rewrite is Open Source and can be quickly and easily downloaded from our Github profile. Cloning the repository works as shown in the next listing:

hs@fedora src]$ git clone https://github.com/cybertec-postgresql/pg_rewrite.git

Cloning into 'pg_rewrite'...
remote: Enumerating objects: 22, done.
remote: Counting objects: 100% (22/22), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 22 (delta 3), reused 22 (delta 3), pack-reused 0
Receiving objects: 100% (22/22), 44.51 KiB | 1.78 MiB/s, done.
Resolving deltas: 100% (3/3), done.

hs@fedora src]$ git clone https://github.com/cybertec-postgresql/pg_rewrite.git

Cloning into 'pg_rewrite'...

remote: Enumerating objects: 22, done.

remote: Counting objects: 100% (22/22), done.

remote: Compressing objects: 100% (17/17), done.

remote: Total 22 (delta 3), reused 22 (delta 3), pack-reused 0

Receiving objects: 100% (22/22), 44.51 KiB | 1.78 MiB/s, done.

Resolving deltas: 100% (3/3), done.

Once this is done, we can enter the directory:

[hs@fedora src]$ cd pg_rewrite/

1	[hs@fedora src]$ cd pg_rewrite/

To build the code, we have to make sure that the correct version of pg_config is in the path. In my case, I have a working binary in my home directory, so I can effortlessly build the code:

[hs@fedora pg_rewrite]$ which pg_config

~/pg14/bin/pg_config

[hs@fedora pg_rewrite]$ which pg_config

~/pg14/bin/pg_config

If pg_config is in the path, all you have to do is to run “make install” to compile and install the code:

[hs@fedora pg_rewrite]$ make install
gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla 
-Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -
Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format
-truncation -Wno-stringop-truncation -g -O2 -fPIC -I. -I./ 
-I/home/hs/pg14/include/postgresql/server -I/home/hs/pg14/include/postgresql/internal&nbsp; 
-D_GNU_SOURCE &nbsp; -c -o pg_rewrite.o pg_rewrite.c
...
/usr/bin/install -c -m 644 concurrent.bc '/home/hs/pg14/lib/postgresql/bitcode'/pg_rewrite/./
cd '/home/hs/pg14/lib/postgresql/bitcode' && /usr/bin/llvm-lto -thinlto -thinlto-
action=thinlink -o pg_rewrite.index.bc pg_rewrite/pg_rewrite.bc pg_rewrite/concurrent.bc

[hs@fedora pg_rewrite]$ make install

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla

-Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -

Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format

-truncation -Wno-stringop-truncation -g -O2 -fPIC -I. -I./

-I/home/hs/pg14/include/postgresql/server -I/home/hs/pg14/include/postgresql/internal 

-D_GNU_SOURCE   -c -o pg_rewrite.o pg_rewrite.c

...

/usr/bin/install -c -m 644 concurrent.bc '/home/hs/pg14/lib/postgresql/bitcode'/pg_rewrite/./

cd '/home/hs/pg14/lib/postgresql/bitcode' && /usr/bin/llvm-lto -thinlto -thinlto-

action=thinlink -o pg_rewrite.index.bc pg_rewrite/pg_rewrite.bc pg_rewrite/concurrent.bc

The next thing to do is to adjust postgresql.conf file. pg_rewrite has to be loaded as a library when the server starts. Otherwise, it won’t work. Configuring PostgreSQL to load pg_rewrite works as follows:

wal_level = logical
max_replication_slots = 1
# ... or add 1 to the current value.
shared_preload_libraries = 'pg_rewrite'

# ... or add the library to the existing ones.

wal_level = logical

max_replication_slots = 1

# ... or add 1 to the current value.

shared_preload_libraries = 'pg_rewrite'

# ... or add the library to the existing ones.

The wal_level has to be adjusted to make sure that the WAL contains enough information for logical decoding to work. On top of that, you'll need a sufficient number of replication slots to safely run logical decoding. pg_rewrite will need one slot to operate.

After these changes are done, restart your server and verify that the variables are properly set:

test=# SHOW shared_preload_libraries;

shared_preload_libraries
--------------------------
pg_rewrite

(1 row)

test=# SHOW shared_preload_libraries;

shared_preload_libraries

--------------------------

pg_rewrite

(1 row)

Finally, you have to enable the pg_rewrite extension to ensure that the partition_table function is available:

test=# CREATE EXTENSION pg_rewrite;
CREATE EXTENSION

test=# x
Expanded display is on.

test=# df *partition_table*
List of functions
-[ RECORD 1 ]-------+---------------------------------------------------
Schema              | public
Name                | partition_table
Result data type    | void
Argument data types | src_table text, dst_table text, src_table_new text
Type                | func

test=# CREATE EXTENSION pg_rewrite;

CREATE EXTENSION

test=# x

Expanded display is on.

test=# df *partition_table*

List of functions

-[ RECORD 1 ]-------+---------------------------------------------------

Schema | public

Name | partition_table

Result data type | void

Argument data types | src_table text, dst_table text, src_table_new text

Type | func

Voilà, you have a working version of pg_rewrite. Let's take a look at how we can use it.

Creating a sample table

After installing pg_rewrite, we can create a table - which we'll want to partition later on:

test=# CREATE TABLE t_number (x numeric);
CREATE TABLE

test=# INSERT INTO t_number SELECT random() - 0.5 FROM generate_series(1, 1000000);
INSERT 0 1000000

test=# CREATE TABLE t_number (x numeric);

CREATE TABLE

test=# INSERT INTO t_number SELECT random() - 0.5 FROM generate_series(1, 1000000);

INSERT 0 1000000

For the sake of simplicity, the table contains a couple of random values. Some are greater than zero and some are less than zero. Let's verify that:

test=# SELECT x < 0, count(*) FROM t_number GROUP BY 1;
 ?column? | count
----------+--------
 f        | 499729
 t        | 500271

(2 rows)

test=# SELECT x < 0, count(*) FROM t_number GROUP BY 1;

?column? | count

----------+--------

f | 499729

t | 500271

(2 rows)

The data seem correct, so we can move to the next step:

test=# ALTER TABLE t_number ADD PRIMARY KEY (x);
ALTER TABLE

1 2	test=# ALTER TABLE t_number ADD PRIMARY KEY (x); ALTER TABLE

PostgreSQL can only identify a row if there is a primary key. Otherwise, we'll have problems and the code won't work. So it's important to ensure that there are indeed primary keys.

Creating table partitions in PostgreSQL

Before we can partition the table, we have to come up with a partitioning scheme. We want to use that scheme for our database. In my example, all I'm doing is putting negative values into one table, and positive values into another. Here's how it works:

test=# CREATE TABLE t_part_number (x numeric PRIMARY KEY) PARTITION BY RANGE (x);
CREATE TABLE

test=# CREATE TABLE t_part_number_neg PARTITION OF t_part_number FOR VALUES FROM (MINVALUE) TO (0);
CREATE TABLE

test=# CREATE TABLE t_part_number_pos PARTITION OF t_part_number FOR VALUES FROM (0) TO (MAXVALUE);
CREATE TABLE

test=# CREATE TABLE t_part_number (x numeric PRIMARY KEY) PARTITION BY RANGE (x);

CREATE TABLE

test=# CREATE TABLE t_part_number_neg PARTITION OF t_part_number FOR VALUES FROM (MINVALUE) TO (0);

CREATE TABLE

test=# CREATE TABLE t_part_number_pos PARTITION OF t_part_number FOR VALUES FROM (0) TO (MAXVALUE);

CREATE TABLE

Make sure that the primary keys and constraints are identical. Otherwise, pg_rewrite will produce an error. Remember: we want to repartition the table - we don’t want anything else to happen.

PostgreSQL table partitioning with almost no locking

Now that all the ingredients are in place, we can rewrite the data:

test=# SELECT partition_table('t_number', 't_part_number', 't_old_number');
 partition_table
-----------------

(1 row)

test=# SELECT partition_table('t_number', 't_part_number', 't_old_number');

partition_table

-----------------

(1 row)

t_number is the source table which has to be rewritten. t_part_number is the freshly partitioned table which we want to use as intermediate storage. t_old_number is the name of the original table which will be renamed. In case something goes wrong, the old table will still be found. The advantage is that nothing can go wrong. The downside is that more storage is needed. However, that is the case DURING repartitioning anyway so it does not matter if the source table is deleted or not. We need twice the space anyway.

The final structure will look as follows:

test=# d+
                                                List of relations
 Schema |       Name        |       Type        | Owner | Persistence | Access method |  Size   | Description
--------+-------------------+-------------------+-------+-------------+---------------+---------+-------------
 public | t_number          | partitioned table | hs    | permanent   |               | 0 bytes |
 public | t_old_number      | table             | hs    | permanent   | heap          | 42 MB   |
 public | t_part_number_neg | table             | hs    | permanent   | heap          | 21 MB   |
 public | t_part_number_pos | table             | hs    | permanent   | heap          | 21 MB   |

(4 rows)

test=# d+

List of relations

--------+-------------------+-------------------+-------+-------------+---------------+---------+-------------

(4 rows)

As you can see, the original table is still in place using up 42 MB of storage. Our two partitions will need the same amount of space.

pg_rewrite is a good method to turn existing tables into PostgreSQL partitions. It only needs a short lock, which makes it superior to the long and extensive table locking needed by the PostgreSQL core.

Finally ...

GIS databases (PostGIS) are often very large and can make full use of partitioning. So we want to share some good news: We offer ready-made OSM data (OpenStreetMap) for PostgreSQL on our new GIS site. You will find ready made PostgreSQL dumps there which can be easily imported into your system.
For information on how to migrate partitioned data, see this informative blogpost.
And if you're interested in learning more about how to set PostgreSQL configuration parameters, see our blog here.

One response to “pg_rewrite: PostgreSQL Table Partitioning”

Lucio Chiessi says:

July 26, 2023 at 12:14 pm

This extension looks awesome!! Can we see it on cloud providers in the future? Thanks!

Reply

pg_rewrite: PostgreSQL Table Partitioning

Installing pg_rewrite

Creating a sample table

Creating table partitions in PostgreSQL

PostgreSQL table partitioning with almost no locking

The final structure will look as follows:

Finally ...

One response to “pg_rewrite: PostgreSQL Table Partitioning”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts