CYBERTEC Logo

Patroni is a cluster manager used to customize and automate deployment and maintenance of PostgreSQL HA (High Availability) clusters. It uses distributed configuration stores like etcd, Consul, ZooKeeper or Kubernetes for maximum accessibility.

In this tutorial, we will be using a single local etcd instance and two local Patroni instances on a single host instead of a more complex multi-host setup.

Patroni High Availability Diagram
A simple Patroni cluster based on etcd, managing two highly-available PostgreSQL instances

 

I assume that you’re comfortable with PostgreSQL Streaming Replication, so let’s proceed with the installation.
If you're not sure what this all is about, I highly recommend giving this brief introduction a read:
PostgreSQL High-Availability and Patroni – an Introduction.

Install Requirements

A Patroni cluster requires not only executables for Patroni, but also for PostgreSQL of some version (at least 9.5 or above) and the configuration store of your choice (we'll use etcd).

Cybertec provides .rpm packages of Patroni 1.6.0 for both Fedora 30 and CentOS/RHEL 7 on Github.

Fedora

CentOS/RHEL 7

Ubuntu 19.04/Debian Buster

Start etcd

Here we start a single member etcd cluster.
This etcd will listen to client requests on port 2379 and stores data in the default directory: ./default.etcd .

To check the etcd member list, run:

Please be aware that this is only a basic setup guide. In a production environment, a single member etcd cluster would be a problematic failure point. If that etcd stops working, Patroni would have to stop the primary and secondary PostgreSQL instances.
There will be another post describing the details of etcd and how to set up clusters containing at least three members, which should be the minimum for a production setup.

Configuring Patroni

Everything that we create and modify from now on should be done by the user postgres, as postgres should own the config files, data directories and the Patroni and PostgreSQL processes.

Each Patroni instance needs its own config file. This config file tells it where to contact the DCS (Distributed Configuration Store - etcd in our case), where and how to start the database and how to configure it.

For simplicity, sample config files are provided for this tutorial for both Debian/Ubuntu and Fedora/RHEL/CentOS.
The REST API port and postgresql port are different in the config file for the second cluster member to avoid conflicts since we are running two members on the same host.

You can find the explanation of the fields in the patroni documentation .

Starting Patroni Cluster

Please note that the provided config files differ in two aspects: the directory that contains binaries for PostgreSQL and the directory that contains your database's data.
So please, use the files that match your distribution, as shown in the wget calls below.

To start both nodes:

Fedora/RHEL/CentOS:

Debian/Ubuntu:

Before starting the Patroni cluster, make sure the default PostgreSQL cluster has been stopped and disabled.
This can usually be done with a service postgresql stop .

If you examine the produced logfiles, you will find that one of the patroni processes has bootstrapped (i.e. ran initdb) and started as a leader.

The other process noticed that there is already another leader and configured itself to receive streaming replication from it:

Verify Patroni Cluster

Check the status of the cluster with patronictl:

Now you can simply connect to your new database using the port where your cluster's leader currently is:

Conclusion

The steps above can be easily modified to create a cluster that spans several hosts.
Just change the IP addresses for the advertised and initial cluster arguments for etcd and for Patroni and PostgreSQL in the Patroni config files and make sure that all necessary ports are opened, if you're running systems with a firewall.

Once the key concept of Patroni is well understood, it is easy to deploy and manage bigger clusters.

This post is part of a series.
Besides this post the following articles have already been published:
PostgreSQL High-Availability and Patroni – an Introduction.
The series will also cover:
setting up multi-member etcd clusters
– configuration and troubleshooting
– failover, maintenance, and monitoring
– client connection handling and routing
– WAL archiving and database backups using pgBackrest
– PITR a patroni cluster using pgBackrest

Updates to the series will be published regularly. Check back soon for the post on setting up multi-member etcd clusters!

Large companies and professional business have to make sure that data is kept secure. It is necessary to defend against internal, as well as external threats. PostgreSQL provides all the necessities a company needs to protect data and to ensure that people can only access what they are supposed to see. One way to protect data is “Row Level Security”, which has been around for a few years now. It can be used to reduce the scope of a user by removing rows from the result set automatically. Usually people apply simple policies to do that. But PostgreSQL Row Level Security (RLS) can do a lot more. You can actually control the way RLS behaves using configuration tables.

Configuring access restrictions dynamically

Imagine you are working for a large cooperation. Your organization might change, people might move from one department to the other or your new people might join up as we speak. What you want is that your security policy always reflects the way your company really is. Let us take a look at a simple example:

I have created two tables. One will know, who is managing which department. The second table knows, who will report to him. The goal is to come up with a security policy, which ensures that somebody can only see own data or data from departments on lower levels. In many cases row level policies are hardcoded – in our case we want to be flexible and configure visibility given the data in the tables.

Let us populate the tables:

hierarchy

 

As you can see “hans” has no manager. “paula” will report directly to “hans”. “manuel” will report to “paula” and so on.

In the next step we can populate the company table:

For the sake of simplicity, I have named those departments in a way that they reflect the hierarchy in the company. The idea is to make the results easier to read and easier to understand. Of course, any other name will work just fine as well.

Defining row level policies in PostgreSQL

To enable row level security (RLS) you have to run ALTER TABLE … ENABLE ROW LEVEL SECURITY:

What is going to happen is that all non-superusers, or users who are marked as BYPASSRLS, won’t see any data anymore. By default, PostgreSQL is restrictive and you have to define a policy to configure the desired scope of users. The following policy uses a subselect to travers our organization:

What you can see here is that a policy can be pretty sophisticated. It is not just a simple expression but can even be a more complex subselect, which uses some configuration tables to decide on what to do.

PostgreSQL row level security in action

Let us create a role now:

paula is allowed to log in and read all data in t_company and t_manager. Being able to read the table in the first place is a hard requirement to make PostgreSQL even consider your row level policy.

Once this is done, we can set the role to paula and see what happens:

As you can see paula is only able to see herself and the people in her department, which is exactly what we wanted to achieve.

Let us switch back to superuser now:

The output is as expected:

Row level security and performance

Keep in mind that a policy is basically a mandatory WHERE clause which is added to every query to ensure that the scope of a user is limited to the desired subset of data. The more expensive the policy is, the more impact it will have on performance. It is therefore highly recommended to think twice and to make sure that your policies are reasonably efficient to maintain good database performance.

The performance impact of row level security in PostgreSQL (or any other SQL database) cannot easily be quantified because it depends on too many factors. However, keep in mind – there is no such thing as a free lunch.

If you want to learn more about Row Level Security check out my post about PostgreSQL security.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

As you all may know, our main blog is a very appreciable source for various PostgreSQL-relevant guides and tips. Our PostgreSQL experts publish content about different topics - e.g. Patroni, advanced PostgreSQL features, configuration settings, security, etc.
All of those posts have one thing in common: they are data- or database-centric. That's why, that's the theme of our main blog - it's all about PostgreSQL & Data Science.

However, over time, the IT staff @ Cybertec has accumulated lots of knowledge on widely diverse topics. These topics can range from docker tips & tricks to parameters for psql. We'd like to share this knowledge with you, but it just doesn't fit the theme of our main blog.

With that in mind, we created a new website: Today I learned @ Cybertec

Whether it's a bug fix, a lesser known parameter of a command-line tool or a cool program we find useful - you'll find something you're interested in at Today I learned @ Cybertec.

The website is updated every day at 12:00 GMT+0, so remember to check it daily.

With PostgreSQL as our core business, we have open-source in mind. For that reason, the website and its content are completely open-source, check it out here.
If you have ideas to enhance the website, just create an issue on Github.

We hope this website will help a lot of people in their ongoing journey as developers!

How it works

The website is powered by GatsbyJS. We use Github to manage our source code and content. The website itself is hosted on Github Pages.

Authors use Github Pull requests (PRs) to write new posts.
For each PR, we use ZEIT Now to create a preview.

Each day (at 12:00 GMT+0), a bot executed through a Github Action merges one post from our backlog (= approved PRs) into master.

As soon as a new commit lands on master, the website is rebuilt and published by another Github action.

The whole process is public, you can watch everything happen here.

pg_cgroups puts PostgreSQL in a cage
 © Laurenz Albe 2018

(This article and the extension pg_cgroups was written for cgroups v1. The extension won't work with cgroups v2, but the rest is still useful information.)

In another blog post, I described how Linux control groups work and how they are useful for PostgreSQL.

Here I will present my little PostgreSQL plugin pg_cgroups which makes this simple to handle for the DBA.

What is pg_cgroups good for?

If you have a machine dedicated to a single PostgreSQL cluster, you normally do not want to limit PostgreSQL's resource usage.

But sometimes you have more than one cluster running on a single machine, or you have an application server running on the same machine.
In that case, you may want to prevent one PostgreSQL cluster from using up so many resources that it starves the other programs on the same machine.

To some extent, the CPU and I/O schedulers can take care of that, but it can be nice to have some control. For example, you may want a small, unimportant database to not use up more than 10 percent of your CPU time and I/O bandwidth. And while you can limit PostgreSQL's memory usage through shared_buffers and work_mem, it might be nice to keep one cluster from blowing other cluster's data out of the file system cache.

How does pg_cgroups work?

I won't describe the complete setup here (see the project page for that), but here is a short description how it works:

pg_cgroups creates a new control group under /postgres postgres and moves the postmaster to this group. All PostgreSQL processes will automatically inherit the control group.

Now you can set and adjust resource consumption limits on the fly simply by changing parameters in postgresql.conf! There is support for limits on memory consumption, CPU time and I/O bandwidth. One particularly nice feature: this is the only way to limit the amount of file system cache that is used for PostgreSQL!

pg_cgroups makes this magic happen by modifying the control group limits whenever the parameter settings are changed.

What alternatives are there?

Of course, pg_cgroups is not the only way to manage Linux control groups. Alternatives include:

Managing control groups “by hand”

You can manage control groups directly via the /sys/fs/cgroup file system or using libcg tools like cgexec.

This usually requires root access to the machine, which DBAs often don't have. Sure, you can set up a control group hierarchy managed by the postgres user (like pg_cgroups does), but you still have to memorize the cgroup file hierarchy and commands.

Managing control groups via systemd

systemd services allow setting control group limits via directives in the service file.

This is a great way of setting limits for PostgreSQL, but there are some disadvantages:

Using docker

Docker uses Linux control groups to enforce resource limits. So if you are already running PostgreSQL in a dockerized environment, that may be your first choice.

However, not everybody wants the added complexity of docker.

Is pg_cgroups ready for production?

The software is still in beta.

But since it is fairly simple and is not deeply tangled with PostgreSQL server code (all it does is define a few magic parameters), I don't expect many nasty bugs.

If you run into trouble, I am happy to help.

Conclusion

I think this nice little piece of software is useful for everybody who runs more than one PostgreSQL cluster per machine. I hope it makes the DBA's life a better one.

Are you running PostgreSQL databases or plan to do so? Have you thought about disaster recovery? What happens if your database crashes or - more likely - the (virtual) server it runs on stops working? The best option is probably a copy of your database, that is able to take over operations, as soon as something goes wrong. This concept is commonly referred to as high availability.

Let’s talk about some general concepts that everyone should know about highly available PostgreSQL clusters. This post will first introduce the concept of replication and show which types of replication can be used with PostgreSQL. As the configuration of replication can be quite complicated, I will further discuss how many challenges can be overcome by using an application called Patroni.

What is Replication?

A replica is a secondary copy of your data (the primary copy), i.e. of your databases. In the event of a failure, it is possible to continue business operations on the replica. During normal operations, both the primary and the secondary copy can be read from to boost performance.

As your data changes all the time, it is seldomly sufficient to create only a snapshot in time; You'd probably want a mechanism that keeps your replica up to date. This process is called replication, whereby the replica is periodically informed of any updates to the original data and applies them to the copy.

In the context of (PostgreSQL) databases, the database cluster that you're copying is called the primary, or sometimes leader. The copy of your database cluster is usually referred to by the name replica. You'll also often find the names "master" for the former and "slave" for the latter in some literature, but due to the negative connotation that these words carry, we dislike using them. Instead, primary and standby are the used terms.

Creation of a replica in PostgreSQL is very simple. Shut down your database in a controlled manner (so that it creates a checkpoint), copy the data directory to the device where you want to run your replica, start the primary and the replica as usual.
But if you want to be able to continually update your replica with all modifications that occur on the primary, you will need a replication mechanism as described above.

There are a couple of possible replication mechanisms for PostgreSQL.

Binary Replication

Because PostgreSQL uses a transaction log to enable replay or reversal of transactions, you could continually copy the contents of the transaction log (located in the pg_wal or pg_xlog directory) as it is produced by the primary server to the replica, where you configure the replica to replay any new transaction log files that are copied to it's own pg_wal or pg_xlog directory.
The biggest disadvantage to this method is the fact that transaction log is contained in chunks of usually 16MB. So, if you were to wait for the primary to switch to the next chunk before copying the finished one, your replica would always be 16MB worth of log delayed.

One less obvious disadvantage of this method is the fact that the copying is usually done by a third-party process, i.e. PostgreSQL is not aware of this process. Thus, it is impossible to tell the primary to delay the acceptance or rejection of a commit request until the replica has confirmed that it has copied all prior transaction log messages.

While this method is useful for scenarios where the Recovery Point Objective (RPO, i.e. the time span within which transactions may be lost after recovery) or the Recovery Time Objective (RTO, i.e. the time it takes from failure to successful recovery) are quite large, it is not sufficient for some high-availability requirements, which sometimes require an RPO of zero and RTO in the range of a couple seconds only.

A rather abstract and brief illustration of binary replication with PostgreSQL
A rather abstract and brief illustration of binary replication with PostgreSQL

Streaming Replication

Another approach that is more sophisticated is called streaming replication.
When using streaming replication, single transaction log messages are reproduced to the replica and synchronicity requirements can be handled on a per-message basis.

Streaming replication needs more setup - usually this involves creating a replication user and initiating the replication stream - but this pays off in terms of the recovery objectives.

When streaming replication is employed with the additional requirement of synchronicity, the replica must confirm that it has received (and written) all prior log messages before the primary can confirm or reject a client's commit request. As a result, after a failure on the primary, the replica can instantly be promoted and business can carry on as usual after all connections have been diverted to the replica.

A rather abstract and brief illustration of streaming replication with PostgreSQL.
A rather abstract and brief illustration of streaming replication with PostgreSQL.

 

What is Patroni and High Availability?

Patroni is a tool that can be used to create, manage, maintain and monitor highly available PostgreSQL cluster setups using streaming replication.

Patroni is distributed under the MIT license and can be easily installed via PIP. For Ubuntu and Debian, it is obtainable via the system repositories and for Fedora, CentOS, or RHEL, RPM packages are provided by CYBERTEC.

What does Patroni do?

Basically, everything you need to run highly available PostgreSQL clusters!
Patroni creates the cluster, initiates streaming replication, handles synchronicity requirements, monitors liveliness of primary and replica, can change the configuration of all cluster members, issues reload commands and restarts selected cluster members, handles planned switchovers and unplanned failovers, rewinds a failed primary to bring it back in line and reinitiates all replication connections to point to the newly promoted primary.

Patroni is engineered to be very fault tolerant and stable; By design, split-brain scenarios are avoided. Split-brain occurs when two members of the same cluster accept writing statements.
It guarantees that certain conditions are always fulfilled and despite the automation of so many complex tasks, it shouldn't corrupt the database cluster nor end in a situation where recovery is impossible.
For example, Patroni can be told never to promote a replica that is lagging behind the primary by more than a configurable amount of log.

It also fulfils several additional requirements; for example, certain replicas should never be considered for promotion if they exist only for the purpose of archiving or data lake applications and not business operations.

The architecture of Patroni is such that every PostgreSQL instance is accompanied by a designated Patroni instance that monitors and controls it.

All of the data that Patroni collects is mirrored in a distributed key-value store, and based on the information present in the store, all Patroni instances agree on decisions, such as which replica to promote if the primary has failed.
The distributed key-value store, for example etcd or consul, enables atomic manipulation of keys and values. This forwards the difficult problem of cluster consensus (which is critical to avoid the split-brain scenario) to battle tested components, proven to work correctly even under the worst circumstances.

Some of the data collected by Patroni is also exhibited through a ReST interface, which can be useful for monitoring purposes as well as for applications to select which PostgreSQL instance to connect to.

Summary - High Availability and performance

Adding a replica to your PostgreSQL cluster can both improve availability and performance. While the configuration of replication is not an easy task (due to the obvious complexity involved with rather sophisticated streaming replication), a solution exists that avoids many pitfalls and enables simple configuration, while also catering to extraordinary demands: Patroni.

This post is part of a series. Apart from this brief introduction, the series covers:


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

Edgar Allan Poe on view dependencies
© Laurenz Albe 2018

 

We all know that in PostgreSQL we cannot drop an object if there are view dependencies on it:

Some people like it because it keeps the database consistent; some people hate it because it makes schema modifications more difficult. But that's the way it is.

In this article I want to explore the mechanics behind view dependencies and show you how to track what views depend on a certain PostgreSQL object.

Why would I need that?

Imagine you want to modify a table, e.g. change a column's data type from integer to bigint because you realize you will need to store bigger numbers.
However, you cannot do that if there are views that use the column. You first have to drop those views, then change the column and then run all the CREATE VIEW statements to create the views again.

As the example shows, editing tables can be quite a challenge if there is a deep hierarchy of views, because you have to create the views in the correct order. You cannot create a view unless all the objects it requires are present.

Best practices with views

Before I show you how to untangle the mess, I'd like to tell you what mistakes to avoid when you are using views in your database design (excuse my switching to teacher mode; I guess holding courses has that effect on you).

Views are good for two things:

Neither of these applications require  you to “stack” views, that is, define views over views.

There are two patterns of using views that tend to be problematic, and they both stem from the mistaken idea that a view works exactly like a table, just because it looks like one:

Never forget that a view is just a “crystallized” SQL statement and gets replaced by its definition when the query is executed.

How are views stored in PostgreSQL?

A view in PostgreSQL is not that different from a table: it is a “relation”, that is “something with columns”.
All such objects are stored in the catalog table pg_class.

As the documentation states, a view is almost the same as a table, with a few exceptions:

This “query rewrite rule” contains the definition of the view and is stored in the ev_action column of the pg_rewrite catalog table.

Note that the view definition is not stored as a string, but in the form of a “query parse tree”. Views are parsed when they are created, which has several consequences:

Note that the way PostgreSQL handles views quite different from the way PostgreSQL handles functions: function bodies are stored as strings and not parsed when they are created. Consequently, PostgreSQL cannot know on which objects a given function depends.

How are the dependencies stored?

All dependencies (except those on “shared objects”) are stored in the catalog table pg_depend:

It is important to notice that there is no direct dependency of a view on the objects it uses: the dependent object is actually the view's rewrite rule. That adds another layer of indirection.

A simple example

In the following, I'll use this schema to test my queries:

I have thrown in a function, just to show that a view can depend on objects other than tables.

In the following I will concentrate on tables and columns, but the queries will work for functions too, if you replace the catalog pg_class that contains tables with the catalog pg_proc that contains functions.

Finding direct view dependencies on a table

To find out which views directly depend on table t1, you would query like this:

To find views with direct dependencies on the function f, simply replace “d.refclassid = 'pg_class'::regclass” with “d.refclassid = 'pg_proc'::regclass” and “refobjid = 't1'::regclass” with “refobjid = 'f'::regproc”.

Actually, the views will usually not depend on the table itself, but on the columns of the table (the exception is if a so-called “whole-row reference” is used in the view). That is why the view v2 shows up twice in the above list. You can remove those duplicates using DISTINCT.

Finding direct dependencies on a table column

We can modify the above query slightly to find those views that depend on a certain table column, which can be useful if you are planning to drop a column (adding a column to the base table is never a problem).

The following query finds the views that depend on the column val of table t1:

Recursively finding all dependent views

Now if you haven't heeded the advice I gave above and you went ahead and defined a complicated hierarchy of views, it doesn't stop with direct dependencies.
Rather, you need to recursively go through the whole hierarchy.

For example, let's assume that you want to DROP and re-create the table t1 from our example and you need the CREATE VIEW statements to re-create the views once you are done (dropping them won't be a problem if you use DROP TABLE t1 CASCADE).

Then you need to use the above queries in a recursive “common table expression” (CTE). The CTE is for tracking recursive view dependencies and can be reused for all such requirements; the only difference will be in the main query.

We need the GROUP BY because a view may depend on an object in more than one ways: in our example, v2 depends on t1 twice: once directly, and once indirectly via v1.

Have questions? Need PostgreSQL support? You can reach us here.

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

Upgrade to a new major version of PostgreSQL, by Kaarel Moppel - Soon it's that time of the year again - basically a 2nd Christmas for followers of the “blue elephant cult” if you will :). I'm, of course, referring to the upcoming release of the next PostgreSQL major version, v12. So I thought it's about time to go over some basics on upgrading to newer major versions! Database upgrades, not only Postgres, are quite a rare event for most people (at least for those running only a couple of DB-s). Since upgrades are so rare, it’s quite easy to forget how easy upgrading actually is. Yes, it is easy – so easy that I can barely wrap my head around why a lot of people still run some very old versions. Hopefully, this piece will help to convince you to upgrade faster in the future :). For the TLDR; in table version please scroll to the bottom of the article.

Why should I upgrade in the first place?

Some main points that come to my mind:

And to make it even more easy for you - there’s a really cool website (much kudos to Depesz!) where you can directly get a highlighted listing of all the changes between two target Postgres versions! Link here!

 

Minor Updates

Minor version changes a.k.a. bugfix releases happen  regularly and can be used as a warmp-up here. There is a minimum of 1 minor release per quarter, if no security-critical stuff is found. It is, of course, highly recommended to apply them as soon as they are released.

The good thing about minor releases is that they’re as fast as restarting your server! I recommend to “prefix” the restart  with a hand-initiated “checkpoint” so that effective downtime for applications is minimized since “pg_ctl stop / restart” already does checkpointing in exclusive mode, after active sessions have been kicked from the server.

Caution! When using some distribution-specific Postgres packages some extra attention is needed. For example running “apt upgrade postgresql-X” might mean an imminent restart to the running instance, after new binaries have been pulled in! For RedHat / CentOS it’s usually a bit better though and there you need to restart explicitly.

Also when running a clustered streaming replication setup where downtime on the primary node is scarce and restarts are to be avoided, my recommendation is to immediately update all replicas to the latest minor version (minor versions don’t need to match, only major). That way, when some accidental server issue appears (e.g.reboot, hardware maintenance etc) you’ll already be promoted to the latest version with a sort of “free downtime”.

Let's now move on to the real thing - major upgrade options.

 

Logical Dump / Restore - the old-fashioned way to upgrade

This is the original way of doing major version upgrades and it's the only option up until version 9.0 which introduced the binary / in-place option. Logical dumps use “pg_dumpall” or “pg_dump” / “pg_restore” tools, that basically “reverse engineer” the database state to normal SQL statements. When those statements are executed sequentially on a target DB, they re-create exactly the same state for all user-level objects as it was on the source cluster / DB. NB! “Pg_dumpall” works on the instance level and “pg_dump” on the database level.

PROS:

CONS:

One more recommendation - when dumping out the DB contents it’s usually best to use the tools from the latest Postgres version. Note that this might not be officially supported, if you fall out of the “5 latest releases supported” window. By the way, on the topic of dump / restore and Postgres versions, I’d recommend to read this nice FAQ by our colleague Pavlo. It’s a few years old, but still correct.

 

Binary In-Place Upgrade

Binary in-place upgrades are quite different from the logical ones as they happen on the filesystem level (i.e. always on the same machine if no shared storage is in play) and are a multi-step process with some additional complexity. However, things are still quite straightforward - after installing the new binaries and following a bit of preparation, it basically boils down to running a single command that typically finishes in 10 to 20 seconds in “--link” mode!! That's also why it's the most common upgrade option nowadays: nobody wants extended downtimes.

High-level steps to take (see documentation for details):

1. Install new major version binaries.
2. Initialize the new instance with new binaries (with the same major settings as on the old instance).
3. Do a test run with “pg_upgrade”. FYI - I tend to test first on a completely spare replica or a P.I.T.R. restored instance.
4. Stop the old instance.
5. Run “pg_upgrade” with or without file linking. Using hard-linking of files from the old data directory into the new one is the fastest way to upgrade! Basically only system catalogs are dumped and reloaded. Without linking all data files are copied over and it boils down to filesystem speed vs instance size, with the benefit that the old cluster is not modified.
6. Start the instance (it is usually also necessary to change the port back to 5432).
7. Start the “analyze” script generated and hinted at by the “pg_upgrade” tool. Performance for complex queries might suffer until it finishes.

PROS:

CONS:

NB! Also note that some distributions provide wrapper scripts (like “pg_upgradecluster” on Debian-based systems) to assist with in-place upgrades. They might be worth a look.

Logical Replication

Logical Replication (LR) is the latest addition to the family of Postgres major version upgrade options and is available from v10 and upwards. By the way, unofficially it’s also already possible from 9.4 and upwards so feel free to contact us on that if you can’t afford extended downtime for your older instances. But OK, how does this new thing work? I won’t go into technical details here, but check out this blogpost or the sample code here. In general, it works on the DB level: you’re going to be feeding all old data and data changes from the old primary into a newly bootstrapped and schema-synchronized fully independent master / primary server! Basically LR is about decoupling from the binary “on-disk” format and sending over data objects that could be thought of as kind of JSON. Again, to highlight the major benefit here: the servers are for the most part fully decoupled and the syncing process happens in near realtime, so one can take some time to test and validate the results before flipping the final switch!

It looks awesome and relatively simple, doesn’t it? And it is mostly! To spoil the party a bit - there's also a lot of "small print" to read, and an out-of-the box LR process might not always be possible - it depends a bit on the data model being used. Some tweaking / changing might be needed.

PROS:

CONS:

 

Summary of Pros / Cons for Different Upgrade Methods

Upgrade method Pro Contra
Dump / restore
  • Simple
  • Safe
  • Somewhat flexible
  • Slowest method
  • Per database approach has some pitfalls lurking
Binary in-place
  • Fast / very fast (depending on chosen mode)
  • Old instance not affected in default mode
  • More complex than Dump / Restore
  • Somewhat risky in “link” mode
  • Possibly loses standby servers
  • Double the disk space required in default mode
Logical Replication
  • Shortest possible downtime
  • Safe, with possibility of thorough "live tests"
  • Very flexible
  • Most complex method
  • Possibly some schema changes needed
  • Not everything is transferred (sequence state, large objects)
  • Possibly "slowish"
  • Always per database

Some General Advice and Parting Words of Upgrade Advice

I guess besides the obvious pre-upgrade DBA activities (testing, backups, thoroughly analyzing the release notes) for all upgrade methods, it’s also good to discuss some other things like application compatibility and UI or monitoring tools compatibility with end users / application teams. But truth be told - there is actually a very low risk factor involved in upgrading, main SQL features are backwards compatible.

I can't of course 100% recommend doing "lone ranger" types of covert database migrations (although I’ve done my fair share of these) where the DB disappears for a minute to then reappear in a shiny new “coat” without anyone raising an eyebrow. However, if your schema consists only of a handful of plain tables with the most common data types (no stored procedures or other PostgreSQL specific features like LISTEN / NOTIFY) and standard ANSI SQL is performed then most likely everything will be just fine. This is due to the fact that the Postgres wire protocol has not been changed since v7-point-something. Changing it for basic SQL operations would be a massive problem for all users globally. That's why changing it has been avoided.

If you're not running an absolutely time-critical 24/7 operation (something like a space-shuttle) in my opinion there are no excuses not to upgrade at least once every couple of years. Keep in mind – if you ignore the topic for too long you’ll stop receiving security patches at some point. Then, when it is most needed, the upgrade process becomes slightly more tricky, since the PostgreSQL Global Development Group only supports releases from the previous 5 years in it's tooling. If you've waited too long, you might need to run "pg_upgrade" twice in a row and there's some additional risk of stumbling into some corner case backwards compatibility topics, as there's no testing for such scenarios. In short, better upgrade sooner than later!

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

In my previous post about pgBackRest, we saw how to install and setup pgBackRest and make a backup of a PostgreSQL database with it. It was a very basic single server setup,  only intended to get the hang of the tool. Such setups are not used in a production environment, as it is not recommended (or rather does not serve the purpose) to perform the backup on the same server where the database is running. So: let's get familiar with how remote backup servers are set up with pgBackRest, and how a full and incremental backup is performed from the backup server and restored on the database server.

We need two servers. Let's call ours:

  1. pgbackup
  2. db1

 

Installing pgbackrest:

We need to install pgBackRest on the database and the backup server. Make sure you install the same version on both.

 

For the database server, please follow the installation steps from my previous post. The steps are slightly different for the backup server, since it is a better practice to create a separate user to own the pgBackRest repository.

 

Create a pgbackrest user on the backup server

 

 

Install required Perl package and pgBackRest from a package or manually on pgbackup as below

 

 

Create pgBackRest configuration files, directories and repository on pgbackup

 

 

Now we are ready to proceed with enabling communication between the database and the backup server. For that, pgBackRest requires a passwordless SSH connection.

We can do that by generating an SSH authentication key file using the

command.

 

On the pgbackup server as pgbackrest user:

On db1 as postgres user:

Exchange the public keys generated between the servers

On pgbackup:

 

On db1:

 

Test the passwordless connection as follows:

 

You can find more details about passwordless SSH connection here.

 

Configuration

On the pgbackup server, configure the pgBackRest configuration file with the database host and path, along with the repository path where the backups will be stored. We have added the database host as 'pg1-host' so that we can add more database hosts later as 'pg2-host' and so on. The same goes for pg1-path.

The Start Fast Option (--start-fast) has been used to force a checkpoint to start the backup quickly. Otherwise, the backup will start after the next regular checkpoint.

 

On server db1, configure the pgBackRest configuration file with the database path and backup host as follows:

 

On db1, update postgresql.conf to have the following changes:

 

Now restart PostgreSQL to reflect the configuration changes.

 

Create a stanza on the pgbackup server and check if it's working. Getting no result means the stanza has been created successfully.

 

Check if the stanza configuration is correct on db1:

Remote Backup

With the setup ready, let's take our first remote backup from pgbackup. The default backup is incremental, but the first backup is always a full backup.

 

 

You will find directories and files being created in the backup location.

 

 

Restore

Taking a full backup was easy. Let's see some examples of incremental backup and recovery with a target:

 

Create a test database on db1:

 

Make another backup. By default, this will be incremental since it's the second backup and we are not specifying the type:

 

 

To restore, stop PostgreSQL on db1 and run pgbackrest with the restore command:

 

 

If you check the database now, you will not find the test database. This is because the backup was restored from the first full backup.

 

To restore the data up to the incremental backup, run the restore command with recovery_target in --recovery-option.

 

Let's check the database now:

 

I hope this exercise could help to get you started with remote backup and pgBackRest! To learn more commands, please visit the pgBackRest official site here.

 

Stay tuned for more tutorials and exercises with pgBackRest. If you are not sure how to use PostgreSQL efficiently or how to setup your systems in general consider checking out our support services.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

BY Kaarel Moppel - In order to prepare for the start of training season (you can see our “stock” training offerings here by the way), I’ve updated our small “Postgres features showcase” project, and thought I'd echo it out too. Main changes - coverage on some features of the current v11 release and also from the upcoming v12.

Short showcase project background

The project was introduced two years ago - the main idea of the project itself is to provide a compact and "hands-on" set of commented samples to help newcomers get up to speed with Postgres. Of course, one cannot beat official documentation but sadly, the documentation for Postgres doesn’t have a dedicated folder for working code samples. However, the tutorial can be helpful. So, the idea is to provide some commented SQL for the “code-minded” people to quickly see and “feel” the concepts to get the gist of it faster.

Updated list of topics covered

RFC

If you see that this project is somewhat useful but could be improved even more, we would be very glad if you take the time to provide your ideas as Github issues – or even better – directly as Pull Requests. Thanks!

 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram