Soon it’s that time of the year again – basically a 2nd Christmas for followers of the “blue elephant cult” if you will :). I’m, of course, referring to the upcoming release of the next PostgreSQL major version, v12. So I thought it’s about time to go over some basics on upgrading to newer major versions! Database upgrades, not only Postgres, are quite a rare event for most people (at least for those running only a couple of DB-s). Since upgrades are so rare, it’s quite easy to forget how easy upgrading actually is. Yes, it is easy – so easy that I can barely wrap my head around why a lot of people still run some very old versions. Hopefully, this piece will help to convince you to upgrade faster in the future :). For the TLDR; in table version please scroll to the bottom of the article.
Why should I upgrade in the first place?
Some main points that come to my mind:
- More security
Needless to say, I think security is massively important nowadays! Due to various government regulations and the threat of being sued over data leaks, mistakes often carry a hefty price tag. Both new and modern authentication methods and SSL/TLS version support are regularly added to new PostgreSQL versions.
- More performance without changing anything on the application level!
Based on my gut feeling and on some previous testing, typically in total around ~5-15% of runtime improvements can be observed following an upgrade (if not IO-bound). So not too much…but on some releases (9.6 parallel query, 10.0 JIT for example) a possibly game changing 30-40% could be observed. As a rule, the upgrades only resulted in upsides – over the last 8 years, I’ve only seen a single case where the new optimizer chose a different plan so that things got slower for a particular query, and it needed to be re-written.
- New features for developers
- Every year, new SQL standard implementations, non-standard feature additions, functions and constructs for developers are introduced, in order to reduce the amounts of code written – or to achieve something that was very hard, or possible only on the application layer.
- Better DBA / management features
Better high-availability, backup, monitoring and scaling features for DBA-s. The last couple of releases have been a real boon in these matters for example with Logical Replication, partitioning and built-in monitoring roles.
And to make it even more easy for you – there’s a really cool website (much kudos to Depesz!) where you can directly get a highlighted listing of all the changes between two target Postgres versions! Link here!
Minor version changes a.k.a. bugfix releases happen regularly and can be used as a warmp-up here. There is a minimum of 1 minor release per quarter, if no security-critical stuff is found. It is, of course, highly recommended to apply them as soon as they are released.
The good thing about minor releases is that they’re as fast as restarting your server! I recommend to “prefix” the restart with a hand-initiated “checkpoint” so that effective downtime for applications is minimized since “pg_ctl stop / restart” already does checkpointing in exclusive mode, after active sessions have been kicked from the server.
Caution! When using some distribution-specific Postgres packages some extra attention is needed. For example running “apt upgrade postgresql-X” might mean an imminent restart to the running instance, after new binaries have been pulled in! For RedHat / CentOS it’s usually a bit better though and there you need to restart explicitly.
Also when running a clustered streaming replication setup where downtime on the primary node is scarce and restarts are to be avoided, my recommendation is to immediately update all replicas to the latest minor version (minor versions don’t need to match, only major). That way, when some accidental server issue appears (e.g.reboot, hardware maintenance etc) you’ll already be promoted to the latest version with a sort of “free downtime”.
Let’s now move on to the real thing – major upgrade options.
Logical Dump / Restore
This is the original way of doing major version upgrades and it’s the only option up until version 9.0 which introduced the binary / in-place option. Logical dumps use “pg_dumpall” or “pg_dump” / “pg_restore” tools, that basically “reverse engineer” the database state to normal SQL statements. When those statements are executed sequentially on a target DB, they re-create exactly the same state for all user-level objects as it was on the source cluster / DB. NB! “Pg_dumpall” works on the instance level and “pg_dump” on the database level.
- 100% safe – the original instance is not changed in any way so testing etc is quite easy and straightforward and rolling back is also no problem.
- Dumps can theoretically be pulled online or from a degraded “read only” instance. The only thing to watch out for: make sure your applications are stopped (or access limited on Postgres side) when pulling the “dump”, as it will only contain a snapshot of the data from the exact moment when the process was launched. If any rows are changed after that, they will be lost if they have not been picked up by some 3rd party “merge” tools (like dbForge for example).
- Flexible – there are lots of flags to only dump / restore specific objects or schemas, re-map or remove ownership / access privileges. Data can also be represented with SQL standard INSERT statements so theoretically one could also move between different database systems.
- Slow – although pg_dump / pg_restore on modern PG versions can be parallelized very easily for better performance longer downtimes might be unavoidable. In practice you start to notice problems around 100GB, especially when you have a lot of indexes and constraints. This is where most of the time will be typically burnt during restore as it’s both CPU and disk intensive.
- Possibly a “per database” approach is needed, where consistency on the instance level is not guaranteed out of the box (see“pg_dump –snapshot” param for a workaround).
One more recommendation – when dumping out the DB contents it’s usually best to use the tools from the latest Postgres version. Note that this might not be officially supported, if you fall out of the “5 latest releases supported” window. By the way, on the topic of dump / restore and Postgres versions, I’d recommend to read this nice FAQ by our colleague Pavlo. It’s a few years old, but still correct.
Binary In-Place Upgrades
Binary in-place upgrades are quite different from the logical ones as they happen on the filesystem level (i.e. always on the same machine if no shared storage is in play) and are a multi-step process with some additional complexity. However, things are still quite straightforward – after installing the new binaries and following a bit of preparation, it basically boils down to running a single command that typically finishes in 10 to 20 seconds in “–link” mode!! That’s also why it’s the most common upgrade option nowadays: nobody wants extended downtimes.
High-level steps to take (see documentation for details):
1. Install new major version binaries.
2. Initialize the new instance with new binaries (with the same major settings as on the old instance).
3. Do a test run with “pg_upgrade”. FYI – I tend to test first on a completely spare replica or a P.I.T.R. restored instance.
4. Stop the old instance.
5. Run “pg_upgrade” with or without file linking. Using hard-linking of files from the old data directory into the new one is the fastest way to upgrade! Basically only system catalogs are dumped and reloaded. Without linking all data files are copied over and it boils down to filesystem speed vs instance size, with the benefit that the old cluster is not modified.
6. Start the instance (it is usually also necessary to change the port back to 5432).
7. Start the “analyze” script generated and hinted at by the “pg_upgrade” tool. Performance for complex queries might suffer until it finishes.
- Typically very fast. Especially with “pg_upgrade –link –jobs X”.
- More risky: more things can theoretically go wrong (especially in “–link” mode), so some extra testing would be nice. At a minimum, after performing a successful upgrade process (for more important DB-s) I do a quick logical dump / restore (heavily parallel if possible) on the side. If that would take too long for large instances I at least make sure that a dump to /dev/null runs through clean. For large instances, I also tend to utilize some home-brewed scripts to dump single tables independently. Doing that avoids the negative effects of a prolonged snapshot. It only makes sense to do so as long as the DB is not a humongous single table (which we see far more often than we should).
- All extensions and other modules also need to be present for the new version – this could be especially burdensome if using a lot of 3rd party extensions
- No semi-online mode – always real downtime incurred, albeit short.
- You lose all standby servers if not using “–link” mode! With “–link” there are some ways to get around that problem though if an extended downtime allows it – see here for details.
- Degraded 1st startup. The upgrade process currently does not copy over any statistics on our data distributions! This is stuff like histogram, most common values and their frequencies and could mean a considerable performance hit for complex queries before our “vacuumdb –analyze-only –analyze-in-stages” finishes.
NB! Also note that some distributions provide wrapper scripts (like “pg_upgradecluster” on Debian-based systems) to assist with in-place upgrades. They might be worth a look.
Logical Replication (LR) is the latest addition to the family of Postgres major version upgrade options and is available from v10 and upwards. By the way, unofficially it’s also already possible from 9.4 and upwards so feel free to contact us on that if you can’t afford extended downtime for your older instances. But OK, how does this new thing work? I won’t go into technical details here, but check out this blogpost or the sample code here. In general, it works on the DB level: you’re going to be feeding all old data and data changes from the old primary into a newly bootstrapped and schema-synchronized fully independent master / primary server! Basically LR is about decoupling from the binary “on-disk” format and sending over data objects that could be thought of as kind of JSON. Again, to highlight the major benefit here: the servers are for the most part fully decoupled and the syncing process happens in near realtime, so one can take some time to test and validate the results before flipping the final switch!
It looks awesome and relatively simple, doesn’t it? And it is mostly! To spoil the party a bit – there’s also a lot of “small print” to read, and an out-of-the box LR process might not always be possible – it depends a bit on the data model being used. Some tweaking / changing might be needed.
- Flexible. One can for example already make some changes on the target DB – new columns, indexes etc…
- Safe. One should only watch out for the replication slot on the source DB if the process is canceled or the target server taken down suddenly.
- Minimal downtime required.
- Quite a few steps to take.
- Could take a long time for big databases.
- Possibly there will be a need to modify the schema or at least REPLICA IDENTITY.
Summary of Pros / Cons for Different Upgrade Methods
|Dump / restore|
Some General Advice and Parting Words
I guess besides the obvious pre-upgrade DBA activities (testing, backups, thoroughly analyzing the release notes) for all upgrade methods, it’s also good to discuss some other things like application compatibility and UI or monitoring tools compatibility with end users / application teams. But truth be told – there is actually a very low risk factor involved in upgrading, main SQL features are backwards compatible.
I can’t of course 100% recommend doing “lone ranger” types of covert database migrations (although I’ve done my fair share of these) where the DB disappears for a minute to then reappear in a shiny new “coat” without anyone raising an eyebrow. However, if your schema consists only of a handful of plain tables with the most common data types (no stored procedures or other PostgreSQL specific features like LISTEN / NOTIFY) and standard ANSI SQL is performed then most likely everything will be just fine. This is due to the fact that the Postgres wire protocol has not been changed since v7-point-something. Changing it for basic SQL operations would be a massive problem for all users globally. That’s why changing it has been avoided.
If you’re not running an absolutely time-critical 24/7 operation (something like a space-shuttle) in my opinion there are no excuses not to upgrade at least once every couple of years. Keep in mind – if you ignore the topic for too long you’ll stop receiving security patches at some point. Then, when it is most needed, the upgrade process becomes slightly more tricky, since the PostgreSQL Global Development Group only supports releases from the previous 5 years in it’s tooling. If you’ve waited too long, you might need to run “pg_upgrade” twice in a row and there’s some additional risk of stumbling into some corner case backwards compatibility topics, as there’s no testing for such scenarios. In short, better upgrade sooner than later!