CYBERTEC Logo

Setting up PostgreSQL streaming replication

05.2021 / Category: / Tags: | |

There are two types of replication available in PostgreSQL at the moment: Streaming replication & Logical replication. If you are looking to set up streaming replication for PostgreSQL 13, this is the page you have been looking for. This tutorial will show you how to configure PostgreSQL replication and how to set up your database servers quickly.

PostgreSQL replication: What we want to achieve

Before we get started with configuring PostgreSQL, it makes sense to take a look at what we want to achieve. The goal of this tutorial is to create a primary server replicating data to a secondary one, using asynchronous replication.

Here is what the desired setup will look like:

The entire setup will be done using CentOS 8.3. The process on RHEL (Redhat Enterprise Linux) is expected to be the same. Simply follow the same procedures.

To show how the setup works, we use two virtual machines with the following IPs:

  • Primary: 10.0.3.200 (node1)
  • Secondary: 10.0.3.201 (node2)

Let’s prepare these systems step-by-step.

Installing PostgreSQL

Once you have installed CentOS / RHEL you can already prepare the installation of PostgreSQL itself. The way to do that is to go to the PostgreSQL website and follow the instructions.
The following script shows how things work. You can simply copy / paste the script:

Let’s check what we should now see on node1:

After this process, you should have:

  • Installed binaries
  • A fully working node1
    • Systemd scripts set up
    • Database instances initialized
  • A prepared 2nd node (node2)
    • Binaries installed
    • Systemd scripts set up

Now let’s move on to the next step: disabling the firewall on the primary.

Why is that necessary? The replica will connect to the master on port 5432. If the firewall is still active, the replica will not be able to access port 5432. In our example, the firewall will be disabled completely to make it easier for the reader. In a more secure setup, you might want to do this in a more precise manner.

Configuring the primary for replication

There are four things we have to do on the primary server:

  • Enable networking (bind addresses) in postgresql.conf
  • Create a replication user (best practice but not mandatory)
  • Allow remote access in pg_hba.conf
  • Restart the primary server

We can perform these things step-by-step.

The first thing is to change postgresql.conf.

The file can be found in /var/lib/pgsql/13/data/postgresql.conf. However, if you have no clue where to find postgresql.conf you can ask PostgreSQL itself to point you to the configuration file. Here is how it works:

The following parameter has to be changed in postgresql.conf:

listen_addresses = '*'

What does listen_addresses mean? By default, PostgreSQL only listens on localhost. Remote access is not allowed by default for security reasons. Therefore, we have to teach PostgreSQL to listen on remote requests as well. In other words: listen_addresses defines the bind addresses of our database service. Without it, remote access is not possible (even if you change pg_hba.conf later on).

Then we can create the user in the database:

Of course, you can also set a password. What is important here is that the user has the REPLICATION flag set. The basic idea is to avoid using the superuser to stream the transaction log from the primary to the replica.

The next thing we can do is to change pg_hba.conf, which controls who is allowed to connect to PostgreSQL from which IP.

Please add the following line to the configuration file:

host replication repuser 10.0.3.201/32 trust

We want to allow the repuser coming from 10.0.3.201 to log in and stream the transaction log from the primary. Keep in mind that 10.0.3.200 is the primary in our setup and 10.0.3.201 is the replica.

Finally, we can restart the primary because we have changed listen_addresses in postgresql.conf.
If you only changed pg_hba.conf a reload is enough:

Your system is now ready, and we can focus our attention on the replica.

Creating a base backup

The next step is to create the replica. There are various things we need to do to make this work. The first thing is to make sure that the replica is stopped and that the data directory is empty. Let’s first make sure the service is stopped:

Then, we need to make sure the data directory is empty:

Note that this step is not necessary if you have skipped the initdb step during installation.

However, it is necessary if you want to turn an existing server into a replica.

pg_basebackup will connect to the primary and simply copy all the data files over. The connection has to be made as repuser. To ensure that the copy process starts instantly, it makes sense to tell PostgreSQL to quickly checkpoint. The -D flag defines the destination directory where we want to store the data on the replica. The -R flag automatically configures our replica for replication.

No more configuration is needed on the secondary server. Finally, we created a replication slot. What is the purpose of a replication slot in PostgreSQL? Basically, the primary server is able to recycle the WAL - if it is not needed anymore on the primary. But what if the replica has not consumed it yet? In that case, the replica will fail unless there is a replication slot ensuring that the primary can only recycle the WAL if the replica has fully consumed it. We at CYBERTEC recommend to use replication slots in most common cases.

Let’s figure out what pg_basebackup has done:

pg_basebackup has copied everything over. However, there is more. The standby.signal file has been created which tells the replica that it is indeed a replica.

Finally, the tooling has adjusted the postgresql.auto.conf file, which contains all the configuration needed to make the replica connect to its replica on the primary server (node1):

Voilà, we are done and we can proceed to start the replica.

Firing up the replica

We are ready to start the replica using systemctl:

It’s a good idea to check that the processes are indeed running. It’s especially important to check for the existence of the walreceiver process. walreceiver is in charge of fetching the WAL from the primary. In case it is not there, your setup has failed.
Also make sure that the service is enabled.

Checking your PostgreSQL replication setup

Once the setup has been completed, it makes sense to take a look at monitoring. In general, it makes sense to use a tool such as pgwatch2 to professionally monitor your database.

Let us first check the primary:

The existence of a row in pg_stat_replication tells us that WAL is flowing from the primary to a secondary.

However, we can also make a check on the replica:

A row in pg_stat_wal_receiver ensures that the WAL receiver does indeed exist, and that data is flowing.

Conclusion

That’s it! I hope you have enjoyed this tutorial. For more information on choosing between synchronous and asynchronous replication, take a look at this page.

0 0 votes
Article Rating
Subscribe
Notify of
guest
1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
parveen verma
parveen verma
2 years ago

Thanx a lot, it really works

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    1
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram