We all know and value SQL functions as a handy shortcut. PostgreSQL v14 has introduced a new, better way to write SQL functions. This article will show the advantages of the new syntax.
Let's create a simple example of an SQL function with the โclassicalโ syntax so that we have some material for demonstrations:
1 2 3 4 5 |
CREATE EXTENSION unaccent; CREATE FUNCTION mangle(t text) RETURNS text LANGUAGE sql AS 'SELECT lower(unaccent(t))'; |
You can use the new function like other database functions:
1 2 3 4 5 6 |
SELECT mangle('Schรถn dumm'); mangle โโโโโโโโโโโโ schon dumm (1 row) |
You may ask what good an SQL function is. After all, the main purpose of a database function is to be able to run procedural code inside the database, something you cannot do with SQL. But SQL functions have their use:
CREATE AGGREGATE
or CREATE OPERATOR
Moreover, simple SQL functions can be inlined, that is, the optimizer can replace the function call with the function definition at query planning time. This can make SQL functions singularly efficient:
We can see function inlining if we use EXPLAIN (VERBOSE)
on our example function:
1 2 3 4 5 6 7 |
EXPLAIN (VERBOSE, COSTS OFF) SELECT mangle('Schรถn dumm'); QUERY PLAN โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Result Output: lower(unaccent('Schรถn dumm'::text)) (2 rows) |
PostgreSQL functions are great. One of the nice aspects is that you are not restricted to a single programming language. Out of the box, PostgreSQL supports functions written in SQL, C, PL/pgSQL (a clone of Oracle's PL/SQL), Perl, Python and Tcl. But that is not all: in PostgreSQL, you can write a plugin that allows you to use any language of your choice inside the database. To allow that flexibility, the function body of a PostgreSQL function is simply a string constant that the call handler of the procedural language interprets when PostgreSQL executes the function. This has some undesirable side effects:
Usually, PostgreSQL tracks dependencies between database objects in the pg_depend
and pg_shdepend
catalog tables. That way, the database knows the relationships between objects: it will either prevent you from dropping objects on which other objects depend (like a table with a foreign key reference) or drop dependent objects automatically (like dropping a table drops all indexes on the table).
Since the body of a function is just a string constant that PostgreSQL cannot interpret, it won't track dependencies between a function and objects used in the function. A procedural language can provide a validator that checks the function body for syntactic correctness (if check_function_bodies = on
). The validator can also test if the objects referenced in the function exist, but it cannot keep you from later dropping an object used by the function.
Let's demonstrate that with our example:
1 2 3 4 5 6 7 8 9 |
DROP EXTENSION unaccent; SELECT mangle('boom'); ERROR: function unaccent(text) does not exist LINE 1: SELECT lower(unaccent(t)) ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. QUERY: SELECT lower(unaccent(t)) CONTEXT: SQL function 'mangle' during inlining |
We will fix the problem by creating the extension again. However, it would be better to get an error message when we run DROP EXTENSION
without using the CASCADE
option.
search_path
as a security problemSince PostgreSQL parses the function body at query execution time, it uses the current setting of search_path
to resolve all references to database objects that are not qualified with the schema name. That is not limited to tables and views, but also extends to functions and operators. We can use our example function to demonstrate the problem:
1 2 3 4 5 6 7 8 9 |
SET search_path = pg_catalog; SELECT public.mangle('boom'); ERROR: function unaccent(text) does not exist LINE 1: SELECT lower(unaccent(t)) ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts. QUERY: SELECT lower(unaccent(t)) CONTEXT: SQL function 'mangle' during inlining |
In our example, it is a mere annoyance that we can avoid by using public.unaccent()
in the function call. But it can be worse than that, particularly with SECURITY DEFINER
functions. Since it is cumbersome to schema-qualify each function and operator, the recommended solution is to force a search_path
on the function:
1 |
ALTER FUNCTION mangle(text) SET search_path = public; |
Note that the schemas on the search_path
should allow CREATE
only to privileged users, so the above is not a good idea on versions older than v15!
An unpleasant downside of setting a search_path
is that it prevents the inlining of the SQL function.
From PostgreSQL v14 on, the body of SQL functions and procedures need no longer be a string constant. You can now use one of the following forms for the function body:
1 2 3 4 5 6 7 8 |
CREATE FUNCTION function_name(...) RETURNS ... RETURN expression; CREATE FUNCTION function_name(...) RETURNS ... BEGIN ATOMIC statement; ... END; |
The first form requires the function body to be an expression. So if you want to perform a query, you have to wrap it in parentheses (turning it into a subquery, which is a valid expression). For example:
1 2 |
CREATE FUNCTION get_data(v_id bigint) RETURNS text RETURN (SELECT value FROM data WHERE is = v_id); |
The second form allows you to write a function with more than one SQL statement. As it used to be with multi-statement SQL functions, the result of the function will be the result of the final SQL statement. You can also use the second form of the new syntax to create SQL procedures. The first form is obviously not suitable for a procedure, since procedures don't have a return value.
We can easily rewrite our example function to use the new syntax:
1 2 |
CREATE OR REPLACE FUNCTION mangle(t text) RETURNS text RETURN lower(unaccent(t)); |
Note that these new SQL functions can be inlined into SQL statements just like the old ones!
The main difference is that the new-style SQL functions and procedures are parsed at function definition time and stored in parsed form in the prosqlbody
column of the pg_proc
system catalog. As a consequence, the two shortcomings noted above are gone:
Because the function body is available in parsed form, PostgreSQL can track dependencies. Let's try that with our redefined example function:
1 2 3 4 |
DROP EXTENSION unaccent; ERROR: cannot drop extension unaccent because other objects depend on it DETAIL: function mangle(text) depends on function unaccent(text) HINT: Use DROP ... CASCADE to drop the dependent objects too. |
search_path
with new-style SQL functionssearch_path
is only relevant when SQL is parsed. Since this now happens when CREATE FUNCTION
runs, we don't have to worry about the current setting of that parameter at function execution time:
1 2 3 4 5 6 7 8 |
SET search_path = pg_catalog; SELECT public.mangle('Schรถn besser'); mangle โโโโโโโโโโโโโโ schon besser (1 row) |
You may notice that the multi-statement form for defining SQL functions contains semicolons to terminate the SQL statements. That will not only confuse the usual suspects like HeidiSQL (which never learned dollar quoting), but it will be a problem for any client that recognizes semicolons as separator between SQL statements. Even older versions of psql
have a problem with that syntax:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
psql (13.7, server 15beta2) WARNING: psql major version 13, server major version 15. Some psql features might not work. Type 'help' for help. test=> CREATE FUNCTION tryme() RETURNS integer BEGIN ATOMIC SELECT 42; END; ERROR: syntax error at end of input LINE 3: SELECT 42; ^ WARNING: there is no transaction in progress COMMIT |
psql
thinks that the semicolon after โSELECT 42
โ terminates the CREATE FUNCTION
statement. The truncated statement causes an error. The final END
is treated as its own statement, which is a synonym for COMMIT
and causes a warning.
In v14 and above, psql
handles such statements correctly. pgAdmin 4 has learned the new syntax with version 6.3. But I am sure that there are many clients out there that have not got the message yet.
The new syntax for SQL function introduced by PostgreSQL v14 has great advantages for usability and security. Get a client that supports the new syntax and start using it for your SQL functions. You should consider rewriting your existing functions to make use of these benefits.
Read another great post to increase your PostgreSQL syntax savvy: my post on Cross Join in PostgreSQL.
Find out more about how to get the most performance out of your PostgreSQL database with Hans' post onย how to find and fix a missing index.
For PostgreSQL powerusers, automating repeated steps is becoming more and more necessary, and gexec can help. This blog will show you how to use the ||
operator and the gexec
command to avoid unnecessary repetition in your workflow.
The CLI client that ships with PostgreSQL is called psql
. Like many CLI clients, it is often overlooked and replaced with something with a GUI, or it is only used for the most basic tasks, while more complex operations are carried out elsewhere. However, psql
is a very capable tool with lots of useful features.
One common pattern is the need to run the same command with different arguments. Often, users simply rewrite the command over and over, or sometimes they may opt to use a text editor to write the command once, then copy and paste and edit it to accommodate different arguments.
Sometimes it can be useful to automate such steps, not only in the interest of saving time, but also in the interest of avoiding errors due to typos or copy-pasting. PostgreSQL can take the results of queries and add text to create commands with those results as arguments.
For this purpose, we can prepend or append text to any query result using the ||
operator.
||
operatorLet's assume a new user needs access to some tables in a schema, e.g. all those tables that match a certain prefix.
Now, we could do this manually, or ask the database to automate the boring stuff.
1. Let's retrieve the relevant tables with names starting with pgbench
1 2 3 4 5 6 7 8 |
postgres=# SELECT tablename FROM pg_tables WHERE tablename~'^pgbench'; tablename ------------------ pgbench_accounts pgbench_branches pgbench_history pgbench_tellers (4 rows) |
2. Let's use ||
to prepend and append command fragments to create a valid command with the tablename
as a parameter.
1 2 3 4 5 6 7 8 |
postgres=# SELECT 'GRANT SELECT ON TABLE ' || tablename || ' TO someuser;' FROM pg_tables WHERE tablename~'^pgbench'; ?column? ----------------------------------------------------- GRANT SELECT ON TABLE pgbench_accounts TO someuser; GRANT SELECT ON TABLE pgbench_branches TO someuser; GRANT SELECT ON TABLE pgbench_history TO someuser; GRANT SELECT ON TABLE pgbench_tellers TO someuser; (4 rows) |
Note that the strings end or begin with additional spaces, as the tablename itself does not contain the necessary spaces for argument separation. The semicolon ;
was also added so these commands could be run straight away.
Please keep in mind that, while it is convenient to use ||
to concatenate things, it is not considered good practice, as it can be vulnerable to SQL injection attacks, as a helpful commenter detailed below:
Do NOT blindly concatenate table names with queries. Use quote_ident(), or format() with %I, instead. These apply correct escaping as necessary.
A safer approach to achieve the same results would be something like this:
1 2 3 4 5 6 7 8 |
postgres=# SELECT format('GRANT SELECT ON TABLE %I TO someuser;', tablename) FROM pg_tables WHERE tablename~'^pgbench'; format ----------------------------------------------------- GRANT SELECT ON TABLE pgbench_accounts TO someuser; GRANT SELECT ON TABLE pgbench_branches TO someuser; GRANT SELECT ON TABLE pgbench_history TO someuser; GRANT SELECT ON TABLE pgbench_tellers TO someuser; (4 rows) |
Now, these commands could be copied and then pasted straight into the prompt.
I've even seen people take such lines, store them into a file and then have psql
execute all commands from the file.
But thankfully, a much easier way exists.
gexec
In psql
, there are many shortcuts and helpers to quickly gather info about the database, schemas, tables, privileges and much more.
The psql
shell allows for working on the input and output buffers, and this can be used together with gexec
to have psql
execute each command from the output buffer.
gexec
Reusing the query to generate the necessary commands, we can call gexec
to execute each line from the previous output.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
postgres=# SELECT 'GRANT SELECT ON TABLE ' || tablename || ' TO someuser;' FROM pg_tables WHERE tablename~'^pgbench'; ?column? ----------------------------------------------------- GRANT SELECT ON TABLE pgbench_accounts TO someuser; GRANT SELECT ON TABLE pgbench_branches TO someuser; GRANT SELECT ON TABLE pgbench_history TO someuser; GRANT SELECT ON TABLE pgbench_tellers TO someuser; (4 rows) postgres=# gexec GRANT GRANT GRANT GRANT |
gexec
Assuming that you want to do something involving more arguments, you can always add more ||
to add more command fragments around the results from a query.
Suppose you need to grant privileges to insert, update, and delete from those tables as well.
A simple cross join gives us the desired action (constructed as a relation using the VALUES
constructor) for each of the table names.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
postgres=# SELECT action, tablename FROM pg_tables CROSS JOIN (VALUES ('INSERT'),('UPDATE'),('DELETE')) AS t(action) WHERE tablename~'^pgbench'; action | tablename --------+------------------ INSERT | pgbench_accounts UPDATE | pgbench_accounts DELETE | pgbench_accounts INSERT | pgbench_branches UPDATE | pgbench_branches DELETE | pgbench_branches INSERT | pgbench_history UPDATE | pgbench_history DELETE | pgbench_history INSERT | pgbench_tellers UPDATE | pgbench_tellers DELETE | pgbench_tellers (12 rows) |
Note that we explicitly assign the action
column name using AS t(action)
to the table generated using VALUES
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
postgres=# SELECT 'GRANT ' || action || ' ON TABLE ' || tablename || ' TO someuser;' FROM pg_tables CROSS JOIN (VALUES ('INSERT'),('UPDATE'),('DELETE')) AS t(action) WHERE tablename~'^pgbench'; ?column? ----------------------------------------------------- GRANT INSERT ON TABLE pgbench_accounts TO someuser; GRANT UPDATE ON TABLE pgbench_accounts TO someuser; GRANT DELETE ON TABLE pgbench_accounts TO someuser; GRANT INSERT ON TABLE pgbench_branches TO someuser; GRANT UPDATE ON TABLE pgbench_branches TO someuser; GRANT DELETE ON TABLE pgbench_branches TO someuser; GRANT INSERT ON TABLE pgbench_history TO someuser; GRANT UPDATE ON TABLE pgbench_history TO someuser; GRANT DELETE ON TABLE pgbench_history TO someuser; GRANT INSERT ON TABLE pgbench_tellers TO someuser; GRANT UPDATE ON TABLE pgbench_tellers TO someuser; GRANT DELETE ON TABLE pgbench_tellers TO someuser; (12 rows) |
This output can then again be executed using gexec
.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
postgres=# gexec GRANT GRANT GRANT GRANT GRANT GRANT GRANT GRANT GRANT GRANT GRANT GRANT |
Depending on the circumstances, it may be required to add additional quotes to the output, for example when table names contain capitalization or spaces. In such cases, matching double quotes "
can be added to the strings prepended and appended to arguments.
1 2 3 4 5 6 7 8 9 10 11 12 |
postgres=# SELECT 'GRANT SELECT ON TABLE '' || tablename || '' TO someuser;' FROM pg_tables WHERE schemaname='public'; ?column? ----------------------------------------------------- GRANT SELECT ON TABLE 'with spaces' TO someuser; GRANT SELECT ON TABLE 'Capitalization' TO someuser; GRANT SELECT ON TABLE 'capitalization' TO someuser; (3 rows) postgres=# gexec GRANT GRANT GRANT |
Now that you know how to use gexec
, why not take the next step? Take a look at our blog on column order in PostgreSQL to see it used in another practical example.
If you would like to learn more about security in PostgreSQL, see my blog about Transport Layer Security.
We at CYBERTEC are proud to release the CYBERTEC Migrator Standard Edition, the fastest tool to migrate a database to PostgreSQL. And the best part of it is-
The CYBERTEC Migrator Standard Edition
is available free of charge
What? Another free (as in beer) tool to migrate databases to PostgreSQL? We already have free tools to migrate Oracle databases to PostgreSQL and they are even open source. So why should I care?
If you are not a curious person and donโt care about the why, skip the next section and jump to the interesting part of this blog post where we walk you through the migration of Oracle's HR schema using the CYBERTEC Migrator Standard Edition. In case you don't have the time to read the article, watch the screen cast.
This article is going to be the first of a series of blog posts about how to migrate from Oracle to PostgreSQL. The articles will cover the challenges you may face, possible solutions and how the CYBERTEC Migrator reduces both your effort and the risks of botched migrations.
So you havenโt skipped this section and are curious. Why did we develop the CYBERTEC Migrator?
The simplest answer would be to have an additional option on how to migrate a database to PostgreSQL. And we all agree on one truth: โA world with one Oracle database less and one PostgreSQL database more is a better worldโ.
As you may know, we at CYBERTEC use open source tools and even contribute to the open source community. One of our most used software products - the Foreign Data Wrapper for Oracle - is used by many open source projects.
So why develop the Migrator?
At the beginning there was speed! Or better, the lack of speed. "Necessity is the mother of invention."
Here at CYBERTEC we often see customers with extreme requirements. We assume thatโs probably why they come to us, because people out there tell them โIf someone knows how to do this with PostgreSQL, and is crazy enough to try it out, try CYBERTEC!โ. We are their last hope to solve their problems.
Long story short - some of our customers have databases in the terabyte regions. And they ask for ways to reduce the service downtime imposed by a database migration.
Previously, we didnโt have the resources to implement an asynchronous data replication pipeline. (Such a pipeline captures data changes in the source database during the migration, and synchronizes those changes to PostgreSQL after the migration finishes).
This left us with the option to reduce migration time by improving the speed of the data transfer. The solution to this problem was ora_migrator, a plugin based on our Foreign Data Wrapper for Oracle. But this solution had some disadvantages:
Looking back at what features we implemented, the why and the how, we noticed another motivation for the existence of our Migrator: convenience.
We do not like to do repetitive, boring stuff. We want to concentrate on the challenges rather than the kinds of mundane tasks we had to do when we used the existing tooling.
And since we are faced with migrating databases to PostgreSQL on a daily basis, we want to have tools which get the boring stuff out of the way. We want to be productive where it matters, so we need tools which provide a working solution.
Donโt get me wrong, using vim (as everyone knows, the only true editor) makes me productive. But switching between different terminals, exporting code to the file system, checking log files on another machine, and โฆ โwhat state was the migration in before I left for coffee?โ All wrapped up in a plethora of shell and SQL scripts, which at the end may fail because the production system differs from the test system! Thatโs not productive.
The raison d'etre of the CYBERTEC Migrator is so that customers do not have to know how to map Oracle data types to PostgreSQL or all the details explained in the technical guide on how to migrate from Oracle to PostgreSQL.
If you want a tool which frees the cognitive capacity of your brain from the boring stuff so you can use it on porting the database over to PostgreSQL, thatโs what the CYBERTEC Migrator is - The easiest way to migrate to PostgreSQL - in a professional way.
And by the way - it is also the fastest way to PostgreSQL!
So why would you give such a tool away for free?
We wanted to provide persons and organisations with small budgets an easy way to migrate their databases from Oracle Express or/and Personal Edition to PostgreSQL. We even bundle an hour of consulting for free to provide them with the impetus to migrate to PostgreSQL.
The CYBERTEC Migrator Professional and Enterprise Edition are meant for customers with stringent database SLAs and high availability. More details about the differences between the editions can be found on the CYBERTEC Migrator web page.
First you'll see how to install the CYBERTEC Migrator Standard Edition. Then, you'll walk you through the migration of Oracleโs HR demo database to PostgreSQL.
The CYBERTEC Migrator is a so-called web application, which means its main functionality, the Migrator core, runs on a server. The user configures and controls the database migration via a graphical user interface that runs on a web browser.
As we mentioned above, one of the hassles we had with existing tools was a complicated installation process. The CYBERTEC Migrator is deployed as a set of container images. CYBERTEC provides a public Github repository (cybertec_migrator) which facilitates the installation process. We use Docker Compose to configure and orchestrate the containers. The images for the Migrator Professional and Enterprise Edition are pulled from the Docker Hub container registry.
As a side note - we also have customers who run the Migrator on Kubernetes in their private cloud, mainly on OpenShift.
For an installation on an air gapped system we provide an archive file which contains the git repository mentioned above including the container images. The CYBERTEC Migrator Standard Edition is available as an archive file. Visit this CYBERTEC Migrator link to get it.
Desktop systems like Docker Desktop for Mac and Windows include Docker Engine and Docker Compose as part of those desktop installs. If you use an MS Windows desktop system, install Docker for Desktop with the WSL 2 backend.
Git comes installed with most of the Windows Subsystem for Linux distributions. Should this not be the case, open a shell in Ubuntu/Debian and enter the command:
1 |
$ sudo apt-get install git |
Assuming all preconditions are met, follow the instructions provided in the offline installation section on the Migrator's installation page:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
$ tar xf cybertec_migrator-v3.7.0-standard.tar.gz $ cd cybertec_migrator/ $ ./migrator configure [OK] Generated environment file [INFO] Run './migrator install' to complete setup $ ./migrator install --archive ../cybertec_migrator-v3.7.0-standard.tar.gz [INFO] Extracting upgrade information 6be90f1a2d3f: Loading layer [==========================>]ย 72.54MB/72.54MB ... Loaded image: cybertecpostgresql/cybertec_migrator-core:v3.7.0-standard fd95118eade9: Loading layer [==========================>] ย 83.9MB/83.9MB ... Loaded image: cybertecpostgresql/cybertec_migrator-web_gui:v3.7.0-standard ... Loaded image: postgres:13-alpine [INFO] Extracted upgrade information [INFO] Upgraded to v3.7.0-standard [INFO] Run './migrator up' to switch to new version [WARN] Switching will abort running migrations $ ./migrator up Creating network 'cybertec_migrator_common' with the default driver Creating volume 'cybertec_migrator_core_db-data' with default driver Creating cybertec_migrator_core_db_1ย ย ... done Creating cybertec_migrator_core_1 ย ย ย ... done Creating cybertec_migrator_web_gui_1ย ย ... done [OK] Started on 'http://orcus' |
Note: the Migrator version and the hashes may differ from the file you downloaded.
orcus
is the name of the host, it will differ from the one you installed. Open the web browser of your choice and visit the URL shown in the terminal, in this case http://orcus
. If you canโt reach the Migrator on the URL provided due to name resolution problems in your Docker setup, try http://localhost
which should work.
You probably are asking yourself, โWhat? HTTP is not secure!โ
For the sake of simplicity we do not use HTTPS in this demo run. It would just complicate the setup and configuration process and not contribute to the goal of this article. In a production environment, you would configure the NGINX server providing the Web GUI to use your SSL certificates.
If all went well, you should see the Migrator Dashboard which provides the status of CPU's and RAM of the host where the Migrator is running. The dashboard also displays the most recently changed migrations. Since we havenโt created one yet, the page is empty.
In case the Migrator failed to start with an error Bind for 0.0.0.0:80 failed: port is already allocated
you have to change the default port configuration since the default port 80 is already in use. Change the value of the environment variable EXTERNAL_HTTP_PORT
in the .env
file located in the directory to an unused port.
Migrator installed and running - check.
For the demo, we need an Oracle database containing the HR sample schema and a PostgreSQL database. In case you have access to both, you are set up for the demo and you can skip to the next section, Database Users and Privileges.
If you don't have access to both databases servers, we provide a git repository containing a helper script demo-env
which facilitates the setup of the demo environment. Clone the git repository of the demo database environment:
1 |
$ git clone https://github.com/cybertec-postgresql/cybertec_migrator_demo demo |
The following command will start a PostgreSQL server demo_db providing a demo database owned by the demo user, having the password demo. Easy to remember, right?
1 2 |
$ ./demo/demo-env up demo_db Creating demo_demo_db_1 ... done |
In case you get an error because the port is already in use, change the environment variable EXTERNAL_DEMO_POSTGRES_PORT
in the demo/.env
file.
All we are missing now is the Oracle source database with the HR schema. The following command is going to pull Oracleโs Express Edition provided by Oracle's Container Registry. Be aware that you need around 5 GB for the Oracle container. And yes, Oracle is bigger and slower than an elephant.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
$ ./demo/demo-env up oracle Pulling oracle (container-registry.oracle.com/database/express:21.3.0-xe)... 21.3.0-xe: Pulling from database/express 9347a8f0b307: Pull complete 92de79849cc5: Pull complete d3ad8d0938bf: Pull complete d5ad62ab629c: Pull complete 4f4fb700ef54: Pull complete a97d1b8f3aa9: Pull complete Digest: sha256:20e2eaf39538eada633040e390ee7308a3abc2d6256075adb2e4bb73bf128f1f Status: Downloaded newer image for container-registry.oracle.com/database/express:21.3.0-xe Creating demo_oracle_1 ... done |
We have to wait until Oracle is up and running, showing the status healthy.
Once again, in case the Oracle default port 1521 is already occupied, the command above will fail. You know the drill, adjust the environment variable EXTERNAL_DEMO_ORACLE_PORT in the directory demo/.env.
1 2 3 4 5 6 |
$ docker ps --format '{{ .Names }}t{{ .Status }}t{{ .Ports }}' demo_oracle_1 Up About a minute (healthy) 0.0.0.0:1521->1521/tcp, :::1521->1521/tcp demo_demo_db_1 Up About a minute ย ย ย 0.0.0.0:5432->5432/tcp, :::5432->5432/tcp cybertec_migrator_web_gui_1 ย ย Up About an hourย ย ย ย 0.0.0.0:80->80/tcp, :::80->80/tcp cybertec_migrator_core_1ย ย ย ย Up About an hour cybertec_migrator_core_db_1 ย ย Up About an hour (healthy)ย ย 5432/tcp |
The output shows the mapped ports for the various services:
demo_db
on port 5432 providing the database demo
xepdb1
Both database services started by the demo environment do not use volume containers. In case you're not familiar with containers, it means the data is not persisted. On restarting the demo environment, you end up with an empty PostgreSQL demo database.
After you finish the demo, you can shut down the demo environment with
1 |
$ demo/demo-env down |
The Oracle database user we are going to employ needs read-only privileges. No changes to the source database are necessary when migrating the database:
SELECT_CATALOG_ROLE
grants users SELECT
privileges on data dictionary viewsSELECT ANY TABLE
privileges users to access views and tables in other schemasFLASHBACK ANY TABLE
privileges users to issue an Oracle Flashback Query. This guarantees that we migrate a consistent snapshot at a specific point in time (SCN).The demo environment from the previous chapter provides a user "migrator" with the above privileges.
1 2 3 |
-- On the Oracle database CREATE USER migrator IDENTIFIED by migrator; GRANT CONNECT, SELECT_CATALOG_ROLE, SELECT ANY TABLE, FLASHBACK ANY TABLE TO migrator; |
For the target database, the CYBERTEC Migrator expects to have a PostgreSQL database and a user with CREATE privilege on the database. The demo
user provided with the database demo environment was created with following statements:
1 2 3 4 |
-- As PostgreSQL super user CREATE DATABASE demo; CREATE USER demo WITH PASSWORD 'demo'; GRANT CREATE ON DATABASE demo TO demo; |
If you use the demo environment, we are ready to go.
So, the Migrator is installed, the demo environment is set up. Let's migrate Oracleโs HR demo database to PostgreSQL. We start by creating a migration.
On the left side of the Dashboard select MIGRATION
to change to the migrations page. Select the button โADD NEW MIGRATION
โ which opens the wizard for creating a migration.
oracle://oracle:1521/xepdb1
migrator
migrator
In case you don't want to use the database demo environment, you will need to adjust the values to your environment.
Selecting CHECK
performs a connectivity check from the Migrator core with the source connection provided. If the connection is successful, we can proceed to the Schema Selection with NEXT
.
In case the connectivity check fails, an error message is displayed. Check the connection string and your credentials. In case the database user is missing privileges, they are shown.
Letโs select the Oracle HR demo schema we want to migrate, and proceed with NEXT
to provide the Target Connection.
Now we have to provide the connection to the target database and check the connectivity:
postgresql://demo_db:5432/demo
demo
demo
On selecting NEXT
the Migrator reads the meta-data from the Oracle system catalogs. Assuming we donโt encounter an unexpected error (yes, sometimes Oracle databases have a screwed up system catalog) the Migrator persists the meta-data of the source database and creates a configuration based on this meta-data.
The migration configuration is later used to migrate the data structure to PostgreSQL. It takes care of mapping data types, table structure, constraints and default values, adjusting indexes and much more.
The creation of the migration configuration is the result of the cumulative knowledge gathered over years of migrating hundreds of databases to PostgreSQL.
As you see in the screenshot above, one of the few features the Standard Edition does not contain is the migration of partitioned tables. The Standard Edition will migrate partitioned tables as normal tables.
Selecting NEXT
creates a migration for the HR schema and we end up in the migration overview page.
When we create a migration, a set of data consists of:
Note that once a migration is created, you can not change the source, nor the target database, nor the meta-data of the source database. This is an important concept which has an impact on how to roll out migrations to production systems. The life cycle of a migration and how it is used merits its own article.
The screenshot below shows the landing page of the newly created migration for the HR schema.
The migration page is divided into four areas:
For now, let's take a detailed look at the overview page.
The Overview tab shows analytical information about the source database. The left hand side shows a table with the database objects grouped by object type, their cardinality and the stage (more about stages in the next sections) and the order in which they are going to be migrated. The right hand side of the overview page provides information about the column data types and their cardinality. In both tables we may drill down to get more specific information.
Our PostgreSQL experts are able to use the Overview page to make a rough assessment of how much effort it will take to migrate the database.
Wait, what? Do we need an expert to make the assessment?
Donโt worry - it is on our roadmap and we are currently working on it. The migration assessment will include our 20 years of knowledge of how to migrate databases to PostgreSQL, including an analysis of the PL/SQL code. Coming soon!
If you want to know what we are currently working on, check out the CYBERTEC Migrator Feature Roadmap at the end of this article. But enough about โwhat's going to beโ - letโs start with the migration.
Selecting the Stages tab will lead you to the heart of our Migrator. This is the place where you run, change, resume and replay a migration.
Before we start with the migration of the HR schema, a word about migration stages. The CYBERTEC Migrator divides the execution of a migration into so-called migration stages, natural synchronization points in the migration process:
Now for a demonstration...
Select the START
button which - you guessed right - starts the migration. Starting a migration job will automatically open the Log view and follow the log messages showing whatโs currently happening. Migrating the HR schema we end up with an error at the Logic stage.
In the default settings, starting a migration executes all the stages until it succeeds with a complete migration or it reaches the first error. In our example, the Migrator executed Structure, Data and Integrity stages successfully. However, it hit an error in the Logic stage on migrating procedure:โHRโ.โADD_JOB_HISTORYโ
. This means our PostgreSQL demo database contains the migrated tables, with the data, constraints and indexes.
The HR schema contains only a few rows of data. This means the Data and Integrity stage are fast.
The takeaways here are:
The CYBERTEC Migrator Standard Edition transfers at maximum three tables in parallel. The same goes for creating indexes and foreign keys. If you need a faster migration, you can upgrade to the Migrator Professional Edition.
Back to our demo to the log view.
Reading the the Log view from the bottom up, we are informed about
procedure:โHRโ.โADD_JOB_HISTORYโ
IS
โNote that all DBO's in the log view are hyperlinked. Selecting the hyperlink for the procedure:โHRโ.โADD_JOB_HISTORYโ
opens the code editor for the stored procedure. Hyperlinking error messages to the DBO that caused them saves me at least a couple weeks of work (and frustration) each year.
The screenshot below shows the code editor of the erroneous procedure.
On the top of the editor we see the DBOโs fully qualified name as provided in the source database. In our case HR > ADD_JOB_HISTORY
. The code editor contains the PL/SQL extracted from the Oracle source database, which visually marks the same syntax error reported by the migration.
So why this error? Anyone with experience porting code from Oracle PL/SQL to PL/pgSQL knows that the SQL procedural languages between database technologies are not 100% compatible. So Oracle's PL/SQL code has to be translated to PL/pgSQL, PostgreSQLโs SQL procedural language.
But Max, why doesn't the CYBERTEC Migrator translate PL/SQL code to PL/pgSQL automatically?
The answer is simple. Our in-house PostgreSQL experts are really skeptical about automatic code conversion tools. This probably stems from the situation that at CYBERTEC, we mostly deal with databases with complex PL/SQL code. To achieve the quality of code translation we would like to have, we would need an Oracle PL/SQL parser - which you guessed right - is on our feature roadmap.
Thatโs why our experts use the Migrator Search and Replace feature.
For now, letโs change the code in the editor by replacing โIS BEGIN
โ with โLANGUAGE SQL AS $$
โ and โEND add_job_history;
โ with โ$$
โ (without double quotes). Selecting SHOW DIFFERENCE
shows the difference between the original Oracle code and the changed procedure code.
What you do not see is what happens behind the scenes. In the background, the CYBERTEC Migrator Standard Edition validates the content of the code editor against the target database. It drops and recreates the procedure and in case of an error we show it in the editor. Donโt worry about unwanted changes, due to PostgreSQL transactional DDL's, the validation is performed in a transaction with rollback.
The background validation is maybe not so much of use in a stored procedure, but saves a lot of time when adapting views. It prevents unnecessary round trips due to typos in column or table names.
Save the changes by selecting the SAVE button, which persists the changes we just made in the Migratorโs internal database. Leave the editor and go back to the "Stages" tab by selecting โ
on top of the page, near the migration name.
The migration controls provide us with the following options:
We want to resume the migration where it was stopped due to the first error. To avoid the โresume, show error, fix errorโ-cycle over and over again, we disable โAbort stage on first errorโ. This will execute the stage until its end and report all detected errors.
Select RESUME. This will create the fixed procedure but leave us with errors in another procedure, view and trigger.
Let's fix the trigger and select the link in the error log line of trigger:โHRโ.โUPDATE_JOB_HISTORYโ
which opens the code editor.
An Oracle trigger is migrated to PostgreSQL by wrapping the trigger code into a PostgreSQL function which is called by the trigger. The function name matches the trigger name.
The screenshot below shows an error near the call to add_job_history
.
PostgreSQL is picky when calling a function with no return value. So letโs change the code to PERFORM add_job_history
. This will lead to the editor complaining about a syntax error at or near โ:
โ.
:old
, respective :new
are unknown variables to PostgreSQL data triggers. Since this shows up multiple times, we are going to use the Search and Replace functionality. On the bottom of the migration window open the Search view.
We are going to use regular expressions to replace all occurrences of :old
with old
, respectively :new
with new
.
Select the "regex" mode by selecting .*
and search for :(new|old)
. This uses regex capture groups to capture the parameter $1
which we use as a replacement, removing the colon.
In the HR schema we have only one trigger function, so not much to replace. Note that the scope for the Search and Replace operation can be tweaked on the right hand side. The use of the Search and Replace feature deserves its own article.
Select Replace All
fixes the rest of the errors in the trigger function.
Letโs fix the remaining errors. Switch back to the Log view and select the view:"HR"."EMP_DETAILS_VIEW"
link, which opens the code editor with the erroneous view. Scroll to the bottom and remove the last line WITH READ ONLY
, which has no equivalent in PostgreSQL.
I already mentioned the code validation performed by the migrator against the database. Try to change a table name in the view and see what happens.
Undo the change to the table name (CTRL-z) so the view works again, and select the last erroneous DBO procedure:"HR"."SECURE_DML"
in the log view.
Let's assume we are not interested in migrating this procedure and exclude it from the migration by enabling Exclude in the editor's upper right corner. The sidebar shows the exclusion of the procedure with SECURE_DML.
Now we SAVE
our changes and resume the execution. Try the floating migration control on the right side of the screen to resume the migration. It is handy when you donโt want to return to the Stages tab.
Congratulations - you just migrated your first database using the CYBERTEC Migrator. If you go back to the migration page all your stages will show up green.
If needed you can download the content of the log view by selecting the โ for audit purposes. Yes, some customers need this.
If you reached this part of the article and you followed all the steps - kudos.
One important part left to demonstrate is the table editor. Selecting one of the tables on the sidebar will open the table editor. This is the place where you can change the database objects which end up in PostgreSQL.
There you can exclude one or more columns from the table, or the whole table. Change the data type, the nullable attribute, default values. Constraints, indices, triggers and partitions can be adapted.
There are a couple of features left to mention. These will be featured in future articles:
We at CYBERTEC are constantly improving the Migrator. The priority of the features included depends on customer feedback and the migration projects CYBERTEC is currently working on.
Our paying customers have access to the monthly feature releases of the Migrator.
How many of these features and in which order we can implement them depends on the resources we can reserve.
If you want to help the PostgreSQL community, download the free Migrator, try it out, provide feedback and spread the word about a new tool to migrate databases to PostgreSQL.
I've recently seen some really broad tables (hundreds of columns) in a somewhat inefficiently structured database. Our PostgreSQL support customer complained about strange runtime behavior which could not be easily explained. To help other PostgreSQL users in this same situation, I decided to reveal the secrets of a fairly common performance problem many people donโt understand: Column order and column access.
The first question is: How can we create a table containing many columns? The easiest way is to simply generate the CREATE TABLE
statement using generate_series:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
test=# SELECT 'CREATE TABLE t_broad (' || string_agg('t_' || x || ' varchar(10) DEFAULT ''a'' ', ', ') || ' )' FROM generate_series(1, 4) AS x; ?column? ย ย ย ---------------------------------------------------------- CREATE TABLE t_broad ( t_1 varchar(10) DEFAULT 'a' , t_2 varchar(10) DEFAULT 'a' , t_3 varchar(10) DEFAULT 'a' , t_4 varchar(10) DEFAULT 'a'ย ) (1 row) test=# gexec CREATE TABLE |
For the sake of simplicity I have only used 4 columns here. Once the command has been generated we can use gexec to execute the string we have just compiled. gexec is a really powerful thing: It treats the previous result as SQL input which is exactly what we want here. It leaves us with a table containing 4 columns.
However, let's drop the table and create a really large one.
1 2 |
test=# DROP TABLE t_broad ; DROP TABLE |
The following statement creates a table containing 1500 columns. Mind that the upper limit is 1600 columns:
1 2 3 4 |
test=# SELECT 'CREATE TABLE t_broad (' || string_agg('t_' || x || ' varchar(10) DEFAULT ''a'' ', ', ') || ' )' FROM generate_series(1, 1500) AS x; |
In real life such a table is far from efficient and should usually not be used to store data. It will simply create too much overhead and in most cases it is not good modelling in the first place.
Let's populate the table and add 1 million rows:
1 2 3 4 5 6 7 8 9 10 11 |
test=# timing Timing is on. test=# INSERT INTO t_broad SELECT 'a' FROM generate_series(1, 1000000); INSERT 0 1000000 Time: 67457,107 ms (01:07,457) test=# VACUUM ANALYZE ; VACUUM Time: 155935,761 ms (02:35,936) |
Note that the table has default values so we can be sure that those columns actually contain something. Finally I have executed VACUUM
to make sure that all hint bits and alike are set.
The table we have just created is roughly 4 GB in size which can easily be determined using the following line:
1 2 3 4 5 |
test=# SELECT pg_size_pretty(pg_total_relation_size('t_broad')); pg_size_pretty ---------------- 3907 MB (1 row) |
PostgreSQL stores data in rows. As you might know data can be stored column- or row-oriented. Depending on your use case one or the other option might be beneficial. In the case of OLTP a row-based approach is usually far more efficient.
Let's do a count(*)
and see how long it takes:
1 2 3 4 5 6 |
test=# SELECT count(*) FROM t_broad; count --------- 1000000 (1 row) Time: 416,732 ms |
We can run the query in around 400 ms which is quite ok. As expected, the optimizer will go for a parallel sequential scan:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
test=# explain SELECT count(*) FROM t_broad; QUERY PLAN -------------------------------------------------------------------- Finalize Aggregate (cost=506208.55..506208.56 rows=1 width=8) -> Gather (cost=506208.33..506208.54 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=505208.33..505208.34 rows=1 width=8) -> Parallel Seq Scan on t_broad (cost=0.00..504166.67 rows=416667 width=0) JIT: Functions: 4 Options: Inlining true, Optimization true, Expressions true, Deforming true (8 rows) |
Let's compare this to a count on the first column. You'll see a small difference in performance. The reason is that count(*)
has to check for the existence of the row while count(column)
has to check if a NULL
value is fed to the aggregate or not. In case of NULL
the value has to be ignored:
1 2 3 4 5 6 |
test=# SELECT count(t_1) FROM t_broad; count --------- 1000000 (1 row) Time: 432,803 ms |
But, let's see what happens if we access column number 100? The time to do that will differ significantly:
1 2 3 4 5 6 7 |
test=# SELECT count(t_100) FROM t_broad; count --------- 1000000 (1 row) Time: 857,897 ms |
The execution time has basically doubled. The performance is even worse if we do a count on column number 1000:
1 2 3 4 5 6 |
test=# SELECT count(t_1000) FROM t_broad; count --------- 1000000 (1 row) Time: 8570,238 ms (00:08,570) |
Wow, we are already 20 times slower than before. This is not a small difference but a major problem which has to be understood.
To understand why the problem happens in the first place we need to take a look at how PostgreSQL stores data: After the tuple header which is present in every row we got a couple of varchar
columns. We just used varchar
here to prove the point. The same issues will happen with other data types - the problem is simply more apparent with varchar
as it is more complicated internally than, say, integer
.
How does PostgreSQL access a column? It will fetch the row and then dissect this tuple to calculate the position of the desired column inside the row. So if we want to access column #1000 it means that we have to figure out how long those first 999 columns before our chosen one really are. This can be quite complex. For integer
we simply have to add 4, but in case of varchar
, the operation turns into something really expensive. Let's inspect how PostgreSQL stores varchar
(just to see why it is so expensive):
Now imagine what that means if we need to loop over 1000 columns? It does create some non-trivial overhead.
The key insight here is that using extremely large tables is often not beneficial from a performance standpoint. It makes sense to use sensible table layouts to have a good compromise between performance and convenience.
If you are interested in other ways to improve performance, read my blog on CLUSTER.
In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.
Bonus cards, โMiles & moreโ, bonus points - donโt we all love and hate them at the same time? Recently we had an interesting use case which made me think about sharing some of the techniques we used in this area to reduce client code by writing some clever SQL. This post will show you how to efficiently code bonus programs in SQL.
Suppose we want to run a bonus program. What we want is to know how many bonus points somebody had at any given point in time. This is how we might want to store the data:
1 2 3 4 5 6 |
CREATE TABLE t_bonus_card ( card_number text NOT NULL, d date, points int ); |
For each bonus card, we want to store how many points were awarded when. So far this is relatively easy. Let's load some sample data:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
COPY t_bonus_cardย FROM stdin DELIMITER ';'; A4711;2022-01-01;8 A4711;2022-01-04;7 A4711;2022-02-12;3 A4711;2022-05-05;2 A4711;2022-06-07;9 A4711;2023-02-02;4 A4711;2023-03-03;7 A4711;2023-05-02;1 B9876;2022-01-07;8 B9876;2022-02-03;5 B9876;2022-02-09;4 B9876;2022-10-18;7 . |
In my example, we have data for two bonus cards which receive some rewards from time to time. To run our bonus program using PostgreSQL, we might want to answer some basic questions:
Let's answer these questions usingโฆ
To answer all these questions we can use windowing functions along with some advanced, fancy frame clauses. Let's take a look at a basic query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECT *, ย array_agg(points) OVER (ORDER BY d RANGE BETWEEN '6 months' PRECEDING AND CURRENT ROW) FROM t_bonus_card WHERE card_number = 'A4711' ; card_number | ย ย dย ย ย | points |ย array_aggย ย -------------+------------+--------+------------- ย A4711 ย ย ย | 2022-01-01 |ย ย ย 8 | {8} ย A4711 ย ย ย | 2022-01-04 |ย ย ย 7 | {8,7} ย A4711 ย ย ย | 2022-02-12 |ย ย ย 3 | {8,7,3} ย A4711 ย ย ย | 2022-05-05 |ย ย ย 2 | {8,7,3,2} ย A4711 ย ย ย | 2022-06-07 |ย ย ย 9 | {8,7,3,2,9} ย A4711 ย ย ย | 2023-02-02 |ย ย ย 4 | {4} ย A4711 ย ย ย | 2023-03-03 |ย ย ย 7 | {4,7} ย A4711 ย ย ย | 2023-05-02 |ย ย ย 1 | {4,7,1} (8 rows) |
What this does is simple: It goes through our data set line by line (sorted by date). Then it checks if there are rows between our current row and a value 6 months earlier. For debugging purposes, we aggregate those values into an array. What we see is that on June 7th we have 5 entries. But keep in mind: The rules of our bonus program say that points awarded are taken away after 6 months. By using a sliding window, we can easily achieve this goal.
Note that in SQL we have โROWS
โ, โRANGE
โ and โGROUP
โ as possible keywords in our frame clause. ROWS
means that we want to see a specific number of older rows in our frame. However, this makes no sense here - what we need is an interval and this is exactly what RANGE
can do for us. Rewards might be granted at random points in time so we certainly need to operate with intervals here.
The array_agg
function is really useful to debug things. However, in a real world scenario, we need to add up those numbers using sum
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
SELECT *, sum(points) OVER (ORDER BY dย RANGE BETWEEN '6 months' PRECEDING AND CURRENT ROW) FROM t_bonus_card WHERE card_number = 'A4711' ; ย card_number | ย ย dย ย ย | points | sumย -------------+------------+--------+----- ย A4711 ย ย ย | 2022-01-01 |ย ย ย 8 | ย 8 ย A4711 ย ย ย | 2022-01-04 |ย ย ย 7 |ย 15 ย A4711 ย ย ย | 2022-02-12 |ย ย ย 3 |ย 18 ย A4711 ย ย ย | 2022-05-05 |ย ย ย 2 |ย 20 ย A4711 ย ย ย | 2022-06-07 |ย ย ย 9 |ย 29 ย A4711 ย ย ย | 2023-02-02 |ย ย ย 4 | ย 4 ย A4711 ย ย ย | 2023-03-03 |ย ย ย 7 |ย 11 ย A4711 ย ย ย | 2023-05-02 |ย ย ย 1 |ย 12 (8 rows) |
We have seen that points drop in 2023 again. That's exactly what we wanted.
PARTITION BY
Maybe you have noticed that we did the entire calculation for just one card number. However, what has to be done to make this work for any number of cards? The answer is PARTITION BY
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
SELECTย *, ย ย ย sum(points)ย OVER (PARTITION BY card_number, date_trunc('year', d)ย ย ORDER BY dย ย RANGE BETWEEN '6 months' PRECEDING AND CURRENT ROW) FROM ย t_bonus_card ; ย card_number | ย ย dย ย ย | points | sumย -------------+------------+--------+----- ย A4711 ย ย ย | 2022-01-01 |ย ย ย 8 | ย 8 ย A4711 ย ย ย | 2022-01-04 |ย ย ย 7 |ย 15 ย A4711 ย ย ย | 2022-02-12 |ย ย ย 3 |ย 18 ย A4711 ย ย ย | 2022-05-05 |ย ย ย 2 |ย 20 ย A4711 ย ย ย | 2022-06-07 |ย ย ย 9 |ย 29 ย A4711 ย ย ย | 2023-02-02 |ย ย ย 4 | ย 4 ย A4711 ย ย ย | 2023-03-03 |ย ย ย 7 |ย 11 ย A4711 ย ย ย | 2023-05-02 |ย ย ย 1 |ย 12 ย B9876 ย ย ย | 2022-01-07 |ย ย ย 8 | ย 8 ย B9876 ย ย ย | 2022-02-03 |ย ย ย 5 |ย 13 ย B9876 ย ย ย | 2022-02-09 |ย ย ย 4 |ย 17 ย B9876 ย ย ย | 2022-10-18 |ย ย ย 7 | ย 7 (12 rows) |
PARTITION BY card_number
ensures that our calculations are done for each incarnation of card_number
separately. In other words: User Aโs points cannot be mixed with user Bโs points anymore. But there is more to this query: We want that at the beginning of every year those points should be set to zero and counting should resume. We can achieve this by using PARTITION BY
as well. By rounding out dates to full years we can use the year as partition criteria.
As you can see, SQL is really powerful. A lot can be done without having to write a single line of client code. A handful of SQL statements can produce terrific results and it makes sense to leverage your application.
If you want to know more about PostgreSQL 15 and if you are interested in merging data, check out my post about MERGE - which can be found here.