PostgreSQL major version upgrade by using Logical Replication

By Kaarel Moppel - Some weeks ago, in the light of PostgreSQL v12 release, I wrote a general overview on various major version upgrade methods and benefits of upgrading in general - so if upgrading is a new thing for you I’d recommend to read that posting first. But this time I’m concentrating on the newest (available since v10) and the most complex upgrade method - called “Logical Replication” or LR shortly. For demonstration purposes I’ll be migrating from v10 to freshly released v12 as this is probably the most likely scenario. But it should work the same also with v11 to v12. But do read on for details.

Table of Contents

Benefits of LR upgrades

First a bit of recap from the previous post on why would you use LR for upgrading at all. Well, in short - because it’s the safest option with shortest possible downtime! With that last point I’m already sold...but here again the list of “pros” / “cons”:

PROS

Minimal downtime required

After the initial setup burden one just needs to wait (and verify) that the new instance hast all the data from the old one...and then just shut down the old instance and point applications to the new instance. Couldn’t be easier!

Also before the switchover one can make sure that statistics are up to date, to minimize the typical “degraded performance” period seen after "pg_upgrade" for more complex queries (on bigger databases). For high load application one could even be more careful here and pull the most popular relations into shared buffers by using the (relatively unknown) “pg_prewarm” Contrib extension or by just running common SELECT-s in a loop, to counter the “cold cache” effect.

Flexible

One can for example already make some changes on the target DB – add columns / indexes, change datatypes, leave out some old archive tables, etc. The general idea is that LR does not work on the binary, 1-to-1 level as”pg_upgrade” does, but rather JSON-like data objects are sent over to another master / primary instance, providing quite some freedom on the details.

Safe

Before the final switchover you can anytime abort the process and re-try if something seems fishy. The old instances data is not changed in any way even after the final switchover! Meaning you can easily roll back (with cost of some data loss typically though) to the old version if some unforeseen issues arise. One should only watch out for the replication slot on the source / publisher DB if the target server just taken down suddenly.

CONS

Quite a few steps to take and possibly one needs to modify the schema a bit.
Always per DB.
Could take a long time for big databases.
Large objects, if in use (should be a thing of the past really), need to be exported / imported manually.

Preparing for LR

As LR has some prerequisites on the configuration and schema, you’d first need to see if it’s possible to start with the migration process at all or some changes are needed on the old master node, also called the “publisher” in LR context.

Action points:

1) Enable LR on the old master aka subscriber aka source DB if not done already. This means setting “wal_level” to “logical” in postgresql.conf and making sure that “replication” connections are allowed in “pg_hba.conf” from the new host (also called the “subscriber” in LR context). FYI - changing “wal_level” needs server restart! To enable any kind of streaming replication some other params are needed but they are actually already set accordingly out of the box as of v10 so it shouldn’t be a problem.

2) Check that all tables have a Primary Key (which is good database design anyways) or alternatively have REPLICA IDENTITY set. Primary Keys don’t need much explaining probably but what is this REPLICA IDENTITY thing? A bit simplified - basically it allows to say which columns formulate uniqueness within a table and PK-s are automatically counted as such.

3) If there’s no PK for a particular table, you should create one, if possible. If you can’t do that, set unique constraints / indexes to serve as REPLICA IDENTITY, if at all possible. If even that isn’t possible, you can set the whole row as REPLICA IDENTITY, a.k.a. REPLICA IDENTITY FULL, meaning all columns serve as PK’s in an LR context - with the price of very slow updates / deletes on the subscriber (new DB) side, meaning the whole process could take days or not even catch up, ever! It’s OK not to define a PK for a table, as long as it’s a write-only logging table that only gets inserts.

Sample code:

psql -c “ALTER SYSTEM SET wal_level TO logical;”
sudo systemctl postgresql@10-main restart

# find problematic tables (assuming we want to migrate everything 'as is')
SELECT format('%I.%I', nspname, relname) AS tbl
FROM pg_class AS c
    JOIN pg_namespace AS n ON c.relnamespace = n.oid
WHERE c.relkind = 'r'
  AND NOT n.nspname LIKE ANY (ARRAY[E'pg\_%', 'information_schema'])
  AND NOT EXISTS (SELECT FROM pg_index AS i
                  WHERE i.indrelid = c.oid
                    AND i.indisunique AND i.indisvalid
                    AND i.indisready AND i.indislive)
ORDER BY 1;

# set replica identities on tables highlighted by the previous query
ALTER TABLE some_bigger_table REPLICA IDENTITY USING INDEX unique_idx ;
ALTER TABLE some_table_with_no_updates_deletes REPLICA IDENTITY FULL ;

psql -c “ALTER SYSTEM SET wal_level TO logical;”

sudo systemctl postgresql@10-main restart

# find problematic tables (assuming we want to migrate everything 'as is')

SELECT format('%I.%I', nspname, relname) AS tbl

FROM pg_class AS c

JOIN pg_namespace AS n ON c.relnamespace = n.oid

WHERE c.relkind = 'r'

AND NOT n.nspname LIKE ANY (ARRAY[E'pg\_%', 'information_schema'])

AND NOT EXISTS (SELECT FROM pg_index AS i

WHERE i.indrelid = c.oid

AND i.indisunique AND i.indisvalid

AND i.indisready AND i.indislive)

ORDER BY 1;

# set replica identities on tables highlighted by the previous query

ALTER TABLE some_bigger_table REPLICA IDENTITY USING INDEX unique_idx ;

ALTER TABLE some_table_with_no_updates_deletes REPLICA IDENTITY FULL ;

Fresh setup of the new “subscriber” DB

Second most important step is to set up a new totally independent instance with a newer Postgres version - or at least create a new database on an existing instance with the latest major version. And as a side note - same version LR migrations are also possible, but you'd be solving some other problem in that case.

This step is actually very simple - just a standard install of PostgreSQL, no special steps needed! With the important addition that to make sure everything works exactly the same way as before for applications - same encoding and collation should be used!

-- on old
SELECT pg_catalog.pg_encoding_to_char(d.encoding) AS 'Encoding', d.datcollate as 'Collate' FROM pg_database d WHERE datname = current_database();
-- on new
CREATE DATABASE appdb TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8';

-- on old

SELECT pg_catalog.pg_encoding_to_char(d.encoding) AS 'Encoding', d.datcollate as 'Collate' FROM pg_database d WHERE datname = current_database();

-- on new

CREATE DATABASE appdb TEMPLATE template0 ENCODING UTF8 LC_COLLATE 'en_US.UTF-8';

NB! Before the final switchover it’s important that no normal users have access to the new DB - as they might alter table data or structures and thereby inadvertently produce replication conflicts that mostly mean starting from scratch (or a costly investigation / fix) as “replay” is a sequential process.

Schema / roles synchronization

Next we need to synchronize the old schema onto the new DB as Postgres does not take care of that automatically as of yet. The simplest way is to use the official PostgreSQL backup tool called “pg_dump”, but if you have your schema initialization scripts in Git or such and they're up to date then this is fine also. For syncing roles “pg_dumpall” can be used.

NB! After this point it’s not recommended to introduce any changes to the schema or be at least very careful when doing it, e.g. creating new tables / columns first on the subscriber and refreshing the subscriptions when introducing new tables - otherwise data synchronization will break! Tip - a good way to disable unwanted schema changes is to use DDL triggers! An approximate example on that is here. Adding new tables only on the new DB is no issue though but during an upgrade not a good idea anyways - my recommendation is to first upgrade and then to evolve the schema.

pg_dumpall -h $old_instance --globals-only | psql -h $new_instance
pg_dump -h $old_instance --schema-only appdb | psql -h $new_instance appdb

1 2	pg_dumpall -h $old_instance --globals-only \| psql -h $new_instance pg_dump -h $old_instance --schema-only appdb \| psql -h $new_instance appdb

Create a “publication” on the old DB

If preparations on the old DB has been finished (all tables having PK-s or replication identities) then this is a oneliner:

CREATE PUBLICATION upgrade FOR ALL TABLES;

1	CREATE PUBLICATION upgrade FOR ALL TABLES;

Here we added all (current and those added in future) tables to a publication (a replication set) named “upgrade” but technically we could also leave out some or choose to only replicate some operations like UPDATE-s, but for a pure version upgrade you want typically all.

NB! As of this moment the replication identities become important - and you might run into trouble on the old master if the identities are not in place on all tables that get changes! In such case you might see errors like that:

UPDATE pgbench_history SET delta = delta WHERE aid = 1;
ERROR:  cannot update table 'pgbench_history' because it does not have a replica identity and publishes updates
HINT:  To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

UPDATE pgbench_history SET delta = delta WHERE aid = 1;

ERROR: cannot update table 'pgbench_history' because it does not have a replica identity and publishes updates

HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE.

Create a “subscription” on the target DB

Next step - create a “subscription” on the new DB. This is also a oneliner, that creates a logical replication slot on the old instance, pulls initial table snapshots and then starts to stream and apply all table changes as they happen on the source, resulting eventually in a mirrored dataset! Note that currently superuser rights are needed for creating the subscription and actually hit also makes life a lot easier on the publisher side.

CREATE SUBSCRIPTION upgrade_sub CONNECTION 'port=5432 user=postgres' PUBLICATION upgrade;
NOTICE:  created replication slot 'upgrade_sub' on publisher
CREATE SUBSCRIPTION

CREATE SUBSCRIPTION upgrade_sub CONNECTION 'port=5432 user=postgres' PUBLICATION upgrade;

NOTICE: created replication slot 'upgrade_sub' on publisher

CREATE SUBSCRIPTION

WARNING! As of this step the 2 DB-s are “coupled” via a replication slot, carrying some dangers if the process is aborted abruptly and the old DB is not “notified” of that. If this sounds new please see the details from documentation.

Check replication progress

Depending on the amount of data it will take X minutes / days until everything is moved over and “live” synchronization is working.

Things to inspect for making sure there are no issues:

No errors in server logs on both sides
There’s an active “pg_replication_slots” entry on the master with the name that we used to create the “subscription” on the new DB
All tables are actively replicating on the subscriber side, i.e. “pg_subscription_rel.srsubstate” should be ‘r’ for all tables (ready - normal replication)

Basic data verification / switchover preparation

Although not a mandatory step, when it comes to data consistency / correctness, it always makes sense to go the extra mile and run some queries that validate that things (source - target) have the same data. For a running DB it’s of course a bit difficult as there’s always some replication lag but for “office hours” applications it should make a lot of sense. My sample script for comparing rowcounts (in a non-threaded way) is for example here but using some slightly more "costly" aggregation / hashing functions that really look at all the data would be even better there.

Also important to note if you’re using sequences (which you most probably are) - sequence state is not synchronized by LR and needs some manual work / scripting! The easiest option I think is that you leave the old DB ticking in read-only mode during switchover so that you can quickly access the last sequence values without touching the indexes for maximum ID-s on the subscriber side.

Switchover time!

We’re almost there with our little undertaking...with the sweaty part remaining - the actual switchover to start using the new DB! Needed steps are simple though and somewhat similar to switching over to a standard, “streaming replication” replica.

1) Re-check the system catalog views on replication status.
2) Stop the old instance. Make sure it’s a nice shutdown. The last logline should state “database system is shut down”, meaning all recent changes were delivered to connected replication clients, including our new DB. Start of downtime! PS Another alternative to make sure absolutely all data is received is to actually configure the new instance in “synchronous replication” mode! This has the usual synchronous replication implications of course so I’d avoid it for bigger / busier DBs.
3) Start the old DB in read-only mode by creating a recovery.conf file (from v12 this is achieved by declaring a "standby.signal" file)
4) Optionally make some more quick “health checks” if time constraints allow it - verify table sizes, row counts, your last transactions, etc. For “live” comparisons it makes sense to restart the old DB under a new, random port so that no-one else connects to it.
5) Synchronize the sequences. Given we'll leave the old DB in read-only mode the easiest way is something like that:

psql -h $old_instance -XAtqc 'SELECT $$select setval('$$ || quote_ident(schemaname)||$$.$$|| quote_ident(sequencename) || $$', $$ || last_value || $$); $$ AS sql FROM pg_sequences' appdb 
| psql -h $new_instance appdb

1 2	psql -h $old_instance -XAtqc 'SELECT $$select setval('$$ \|\| quote_ident(schemaname)\|\|$$.$$\|\| quote_ident(sequencename) \|\| $$', $$ \|\| last_value \|\| $$); $$ AS sql FROM pg_sequences' appdb \| psql -h $new_instance appdb

6) Reconfigure your pg_hba.conf to allow access for all "mortal" users, then reconfigure your application, connection pooler, DNS or proxy to start using the new DB! If the two DB-s were on the same machine then it’s even easier - just change the ports and restart. End of downtime!
7) Basically we’re done here, but would be nice of course to clean up and remove the (no-more needed) subscription not to accumulate errors in server log.

DROP SUBSCRIPTION upgrade_sub;

1	DROP SUBSCRIPTION upgrade_sub;

Note that if you won't keep the old “publisher” accessible in read-only or normal primary mode (dangerous!) though, some extra steps are needed here before dropping:

ALTER SUBSCRIPTION  upgrade_sub DISABLE ;
ALTER SUBSCRIPTION  upgrade_sub SET (slot_name = NONE);
DROP SUBSCRIPTION upgrade_sub;

ALTER SUBSCRIPTION upgrade_sub DISABLE ;

ALTER SUBSCRIPTION upgrade_sub SET (slot_name = NONE);

DROP SUBSCRIPTION upgrade_sub;

8) Time for some bubbly drinks 🙂

Summary

Although there are quite some steps and nuances involved, LR is worth adding to the standard upgrade toolbox for time-critical applications as it’s basically the best way to do major version upgrades nowadays - minimal dangers, minimal downtime!

FYI - if you're planning to migrate dozens of DB-s the LR upgrade process can be fully automated! Even starting from version 9.4 actually, with the help of the “pglogical” extension. So feel free to contact us if you might need something like that and don't particularly enjoy the details. Thanks for reading!

5 1 vote

Article Rating

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Pierre-Marie Petit

4 years ago

Hello, i have a situation about logical replication slots that i do not understand.
On my cluster "cluster 1", when i try to create a logical replication slot:

SELECT 'init' FROM pg_create_logical_replication_slot('slot2', 'pgoutput');
ERROR: cannot create logical replication slot in transaction that has performed writes

so i stop everything running on it.
I create on the same server a 2nd cluster "cluster 2", with the same parameters.

postgres@stock(test)=# SELECT 'init' FROM pg_create_logical_replication_slot('slot3', 'pgoutput');
?column?
----------
init
(1 row)

a select * from pg_stat_activity shows the same on both (i am the only one connected)

i stop/start both clusters, i have always the same result. it is impossible to create a logical replciation slot (but is it possible using pg_recvlogical)

Do you have any ideas ?
Thanks
my conf is pgsql 12.2

laurenz

4 years ago

Reply to Pierre-Marie Petit

Assuming you are using psql, perhaps there is something in your .psqlrc that modifies data? You could set log_statement = 'all' for the user, try again and see in the log what statements are executed.

Pierre-Marie Petit

4 years ago

Reply to laurenz

nothing in .psqlrc, log_statement shows nothing....
i'm still searching.
executed with psql, on the same server, 2 identical clusters (except port number)

something i do not understand : my session starts a transaction:
# select * from pg_locks;
locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath
--------------- ---------- ---------- ------ ------- ------------ --------------- --------- ------- ---------- -------------------- ------- ----------------- --------- ----------
relation | 41044 | 12143 | | | | | | | | 47/910 | 24363 | AccessShareLock | t | t
virtualxid | | | | | 47/910 | | | | | 47/910 | 24363 | ExclusiveLock | t | t
transactionid | | | | | | 6979701 | | | | 47/910 | 24363 | ExclusiveLock | t | f
(3 rows)

24363 is my session....

i do not understand why i start a trx :
i connet to psql using
psql mydatabase
and then
select * from pg_locks;
shows a trx created by my pid.

Pierre-Marie Petit

4 years ago

Reply to Pierre-Marie Petit

ok, i have found the pb
it comes from extension pg_show_plans

Marc Rechté

4 years ago

Hello Kaarel. Thanks for this article.

Regarding sequence adjustments, it seems there is a slight problem in the statement. Did you mean :

psql -XAtqc "SELECT 'SELECT setval(' || quote_ident(schemaname) ||'.'|| quote_ident(sequencename) ||', '|| last_value || ');' AS sql FROM pg_sequences"
Marc

Kaarel

4 years ago

Reply to Marc Rechté

Thanks for the feedback, but it worked for me. Yours seems the same just without the dollar notation. Maybe some copy-paste problem?

Ramnadh G

1 month ago

Below query is not working for me

postgres=# SELECT
postgres-#   quote_ident(nspname) || '.' || quote_ident(relname) AS tbl
postgres-# FROM
postgres-#   pg_class c
postgres-#   JOIN pg_namespace n ON c.relnamespace = n.oid
postgres-# WHERE
postgres-#   relkind = 'r'
postgres-#   AND NOT nspname LIKE ANY (ARRAY[E'pg\_%', 'information_schema'])
postgres-#   AND NOT relhaspkey
postgres-#   AND NOT EXISTS (SELECT * FROM pg_index WHERE indrelid = c.oid
postgres(#       AND indisunique AND indisvalid AND indisready AND indislive)
postgres-# ORDER BY
postgres-#   1;
ERROR: column "relhaspkey" does not exist
LINE 9:   AND NOT relhaspkey
          ^

postgres=# select version();
version
-----------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 16.2 (Ubuntu 16.2-1.pgdg20.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0, 64-bit
(1 row)

Tested on 11,15 and 16 versions

Laurenz Albe

1 month ago

Reply to Ramnadh G

Thanks for the report. That column was removed in v11.
I have updated the query.

Ramnadh G

28 days ago

If I have application triggers in the source and these will get copied to the target using pg_dump and by default they will be in enabled in target.
So, before enabling the replication in target, do you recommend disabling the triggers in target to avoid the data corruption and then enable them once the cut-over is done?

Laurenz Albe

26 days ago

Reply to Ramnadh G

Application triggers normally don't fire during logical replication, because logical replication sets session_replication_role to replica. So there is nothing you have to do. Only triggers for which you used ALTER TABLE ... ENABLE REPLICA|ALWAYS TRIGGER trigger_name will fire during logical replication, but you only define such triggers if you want them to fire during logical replication.

Ramnadh

25 days ago

Reply to Laurenz Albe

Thanks Laurenz for clarifying

Stay tuned with our

Upgrading Postgres major versions using Logical Replication

Benefits of LR upgrades

PROS

CONS

Preparing for LR

Fresh setup of the new “subscriber” DB

Schema / roles synchronization

Create a “publication” on the old DB

Create a “subscription” on the target DB

Check replication progress

Basic data verification / switchover preparation

Switchover time!

Summary

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts