Today, I want to share some of the lessons learned when it comes to actually building an Oracle to PostgreSQL migration tool. Moving from Oracle to PostgreSQL has become a popular sport, widely adopted by many who want to free themselves from license costs, hefty support costs and also technical limitations on the Oracle side. The same is true for people moving from MS SQL and other commercial platforms to PostgreSQL. However, my impression is that moving from Oracle to PostgreSQL is by far the most popular route. This post will teach you how to avoid pitfalls in migrations and in the development of a migration tool – learn from our years of experience with actual migrations.
Over the years, we’ve tried out many different approaches to creating the perfect tool, and we’ve definitely made some mistakes on the way – mistakes you will not want to repeat. I’ll share some of the insights I personally had on this multi-year journey. I can’t give you a complete account of them in the space of a short article, so what I’ve done here is to highlight the most useful methods you can take advantage of to make migrations more efficient. I’ll tell you how to avoid the main sources of error, including what slows down migrations and where the differences between how Oracle and PostgreSQL handle data present difficulties.
Transaction length matters
In case you are migrating just one or two gigabytes of data from Oracle to PostgreSQL, transaction length is really not relevant. However, things are different if we are talking about 10+ TB of data. For many clients, downtime is not an option. To achieve close-to-zero downtime, you need to support CDC (= change data capture). The idea is to take an initial snapshot of data and then apply the changes as the target system (= PostgreSQL) catches up with the source database.
In reality, this means that we have to copy a vast amount of data from Oracle while changes are still being made to the source database. What you soon face on the Oracle side is a famous problem:
ORA-01555 Error Message “Snapshot Too Old”
Many of you with migration experience will be painfully aware of this issue. Ideally, you need to ensure that Oracle is configured properly to handle real transactions, not just short read bursts. It makes sense to teach the migrator beforehand to check for these frequent showstoppers.
We also need to keep an additional issue from Oracle in mind:
ORA-08177: can't serialize access for this transaction
PostgreSQL has very good and proper support for high transaction isolation levels – Oracle does not. You can’t just blindly fire up a transaction in isolation SERIALIZABLE and expect things to work – they won’t. Some foresight has to be applied while reading the data, as well. Otherwise, Oracle’s limitations will become painfully obvious. The bottom line really is: Reading many many TBs of data is not the same as reading a handful of rows. It makes a huge difference, and you have to prepare for that.
The Oracle catalog is SLOW
PostgreSQL users are a bit spoiled. Querying the system catalog is usually extremely fast. In fact, 99% of all PostgreSQL users have never spent a minute on system catalog performance in the first place. When building a migration toolchain such as the CYBERTEC Migrator, the Oracle catalog has to be extensively queried and closely examined. The first thing you will notice: is, “Oh god, how slow can it be?”. What does that mean for migrations? The horrible performance has a couple of implications:
- The GUI must avoid reading from the Oracle catalog as much as possible
- Make sure you’ve got a cached version of the data around (in memory or in PostgreSQL)
If your graphical interfaces rely on Oracle system catalog performance, your user experience is doomed. The application won’t be usable anymore. We found this to be especially true if the number of objects to be migrated is high
INVALID objects: Be prepared
PostgreSQL stores many things such as views in binary format, which comes with a couple of advantages. Consider the following:
test=# CREATE TABLE a (aid int); CREATE TABLE test=# CREATE TABLE b (bid int); CREATE TABLE test=# CREATE VIEW v AS SELECT * FROM a, b; CREATE VIEW
I have created two tables and a view joining them. So far, it’s all pretty straightforward, and works within Oracle as well. However, what happens if we rename “a” to “c”?
test=# ALTER TABLE a RENAME TO c; ALTER TABLE
In PostgreSQL, the name of the view is just a label. Behind the scenes, everything is a “number”. We simply don’t store a view as a string; rather, it is stored as a binary representation. That has a couple of advantages:
test=# \d+ v View "public.v" Column | Type | Collation | Nullable | Default | Storage | Description --------+---------+-----------+----------+---------+---------+------------- aid | integer | | | | plain | bid | integer | | | | plain | View definition: SELECT c.aid, b.bid FROM c, b;
The view is still valid, and PostgreSQL will automatically use the new name. In Oracle, that’s NOT the case. You will end up with an INVALID view. Your migration tool has to be prepared at all times for this — many types of objects in Oracle can actually be invalid. You will require extra logic to exclude and mark those if you want to ensure a smooth transition. Otherwise, you will again jeopardize your user experience, because you have to retry the reply process countless times– as it keeps failing because of all those stale and invalid objects.
Null bytes and broken table content
We have done countless migrations in the past. What we have seen too often is unusable data coming from Oracle. Basically, there are two frequent errors: null bytes and broken data. Null bytes simply have to be excluded. But, there is a catch: In various industries (including but not limited to finance) changes made to data have to be documented, so that regulators can track what has happened with the information in transit. That basically means that we just can’t exclude data and be happy (even if the data is obviously broken).
You have to capture those rows and document them. In some cases, it might also be necessary to come up with transformation rules. The idea is again to have revision-safe rules which actually describe what has happened to the data. This is vital to success and acceptance. Of course, we can’t keep migrating TBs of data to find out over and over again that data cannot be loaded. What you need is some kind of “dry run” and a “find me all the broken data run” to again ensure that the tooling stays reasonably usable.
The way PostgreSQL handles COPY statements in case of error cases certainly does not help much. You have to find a couple of ways to ensure that the transaction loading the data ALWAYS commits, even if a lot of data is loaded at a time.
Efficient data loading in PostgreSQL
When loading data, PostgreSQL – as well as every other relational database, including Oracle – has to write a WAL (= Write-Ahead transaction Log). Not everyone is fully aware of the consequences: We actually have to write data TWICE. It can mean that up to 40% of the total time needed to load data into PostgreSQL is used for that doubled I/O. For more information on reducing WAL, see my post about reducing checkpoint distances.
But there is more: Did you know that the first reading operation after a write is usually also a write? Yes, you read that correctly. A simple SELECT can be a write operation. Why is that the case? The reason is: hint bits. In other words, once you have loaded the data, it does NOT mean that you are done writing to disk. There might still be TBs of I/O (= especially O) left once the migration is over. This can lead to bad performance instantly after switching to the new system.
You’ve got to avoid that at all costs. Loading has to be done intelligently. One way to do it is to use COPY FREEZE, as described in one of my posts on the topic. It also helps to have some general awareness of hint bits, to create the most optimal loading process possible.
What this shows us is that while performance adjustments during a migration may require a fair amount of knowledge, they can lead to far better results. In many cases, the amount of I/O can be reduced drastically – especially when PostgreSQL replication is added later, we can greatly speed up the loading process.
“It’s the datatype, – stupid!”
In the Oracle world, the set of data types used differs quite substantially from what we have on the PostgreSQL side. That means that data type mapping is important for a couple of reasons: first of all, it is a matter of efficiency. Not all data types are created equal. The integer data type is far more efficient than, say, numeric, and boolean is going to be a lot smaller than integer (just to give a few examples). Secondly, it is a matter of what you expect on the target side. When creating tooling, you need to keep in mind that …
A migration is a chance to do cleanup
This is vitally important. Don’t miss the opportunity to clean up legacy messes, remove unneeded stuff or just fix your data structure by using more suitable and more efficient data types.
A lot can be said about the differences between Oracle and PostgreSQL and which nuances should be migrated in what fashion. However, this time I wanted to shed some light on the topic from the perspective of a toolmaker, instead of from the standpoint of a person performing the actual migration.
If you want to learn more about the tooling we came up with, consider checking out the CYBERTEC migrator directly: