Speeding up the creation of Postgres replicas

09.2017 / Category: / Tags: |

While it's generally well known how to create physical replicas (i.e. block level copies), using the super simple and really effortless pg_basebackup tool, not so many know what can be done, when the process is painfully slow for bigger databases. The reason for slowness can be typically linked to machine hardware (slow disks, high CPU load) or more commonly slow network between the master and replica machines – think disaster recovery, with DBs in different datacenters. So what can be done?

high performance replicas

Tackling the problem

In the first case (slow systems) there's not too much magic that can be done... hardware has its limits after all. One could try to review/optimize the configuration, hot-swap disks to SSDs when RAID setup allows that or such. The only good news might be that when you already have a spare "high-availability" replica, that's just idling, it can be used for taking "base backup" also and things should be snappier. There are some special considerations for that use case though, so look at the pg_basebackup documentation before hitting the road.

But what about the network bottleneck? Here we have some options:

1) Set up some VPN/SSH tunnel with compression

Using pg_basebackup will be convenient in the end but the tunnel setup could get quite complex and might be "out of scope" for DBAs.

2) Local pg_basebackup (or just file level copy) compressed before copying over.

This approach requires possibly a lot of extra disk space on the master that might not be available but otherwise a good option.

3) The SSH streaming approach.

Dump pg_basebackup output to stdout on the master and stream it to the replica. We'll concentrate on this approach with this post.

Compressed streaming pg_basebackup over SSH

So here's our quick fix to alleviate slow network – streaming "base backup" with master side compression, preferably with multiple compression threads! Note the "master side" part – pg_basebackup already has a "--gzip/--compress" flag, but it actually only compresses things on the replica side, so that all the Postgres data files are still moved over the wire "as is", which is definitely not optimal (I wonder are there any plans to address this issue?). But ok, how much "profit" can one expect with this trick, given that Postgres also has built in compression for larger strings? Well, it depends heavily on the contents of your data and the compression/parallelization level you'll configure – but 30-50% decrease in total replica creation time are easily possible. It might not sound too much, but this could translate already to hours for 1TB databases. For really slow networks the gains would be even bigger, when using highest levels of compression with more threads.

Sample code

A sample implementation of such approach can be found here. Some adjusting before usage is required but the main idea is that we need to combine the TAR format and compression on the master side and then stream over already the compressed data and then de-compress it on the replica side. For parallel compression/de-compression of the TAR stream I would recommend the "pigz" utility. NB! And not to forget about streaming the WAL files separately with pg_receivexlog as pg_basebackup does it for you only in the "plain" mode – without the WALS the whole "base backup" is later worthless.

So take a look and leave a comment when you have some thoughts on the approach or apply some other tricks. Cheers!

0 0 votes
Article Rating
Notify of
Newest Most Voted
Inline Feedbacks
View all comments
1 year ago

Is it working with database with multiple tablespaces ?

1 year ago
Reply to  Joe

Good question; did you try it?
By the way, from PostgreSQL v15 on, pg_basebackup can compress backups on the server side.

1 year ago
Reply to  laurenz

Yes, i tried it doesn't work

1 year ago
Reply to  Joe

Great, then you have answered your own question.

4 years ago

Nice tutorials thanks! pg_receveiwal is used since postgresql 12 though

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf

+43 (0) 2622 93022-0

Get the newest PostgreSQL Info & Tools

    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    CYBERTEC PostgreSQL International GmbH
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram