Foreign data wrappers have been around for quite a while and are one of the most widely used feature in PostgreSQL. People simply like foreign data wrappers and we can expect that the community will add even more features as we speak. As far as the postgres_fdw is concerned there are some hidden tuning options, which are not widely known by users. So let us see how we can speed up the PostgreSQL foreign data wrapper.

Foreign data wrappers: Creating a “database link”

To show how things can be improved we first have to create some sample data in “adb”, which can then be integrated into some other database:

adb=# CREATE TABLE t_local (id int);
CREATE TABLE
adb=# INSERT INTO t_local 
		SELECT * FROM generate_series(1, 100000);
INSERT 0 100000

In this case I have simply loaded 100.000 rows into a very simple table. Let us now create the foreign data wrapper (or “database link” as Oracle people would call it). The first thing to do is to enable the postgres_fdw extension in “bdb”.

bdb=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION

In the next step we have to create the “SERVER”, which points to the database containing our sample table. CREATE SERVER works like this:

bdb=# CREATE SERVER some_server 
		FOREIGN DATA WRAPPER postgres_fdw 
		OPTIONS (host 'localhost', dbname 'adb');
CREATE SERVER

Once the foreign server is created the users we need can be mapped:

bdb=# CREATE USER MAPPING FOR current_user 
		SERVER some_server 
		OPTIONS (user 'hs');
CREATE USER MAPPING

In this example the user mapping is really easy. We simply want the current user to connect to the remote database as “hs” (which happens to be my superuser).

Finally we can link the tables. The easiest way to do that is to use “IMPORT FOREIGN SCHEMA”, which simply fetches the remote data structure and turns everything into a foreign table.

bdb=# \h IMPORT
Command:     IMPORT FOREIGN SCHEMA
Description: import table definitions from a foreign server
Syntax:
IMPORT FOREIGN SCHEMA remote_schema
    [ { LIMIT TO | EXCEPT } ( table_name [, ...] ) ]
    FROM SERVER server_name
    INTO local_schema
    [ OPTIONS ( option 'value' [, ... ] ) ]

The command is really easy and shown in the next listing:

bdb=# IMPORT FOREIGN SCHEMA public 
		FROM SERVER some_server 
		INTO public;
IMPORT FOREIGN SCHEMA

As you can see PostgreSQL has nicely created the schema for us and we are basically ready to go.

bdb=# \d
            List of relations
 Schema |  Name   |     Type      | Owner 
--------+---------+---------------+-------
 public | t_local | foreign table | hs
(1 row)

Testing postgres_fdw performance

When we query our 100.000 row table we can see that the operation can be done in roughly 7.5 milliseconds:

adb=# explain analyze SELECT * FROM t_local ;
                                    QUERY PLAN                                                  
----------------------------------------------------------------------------------
 Seq Scan on t_local  (cost=0.00..1443.00 rows=100000 width=4) 
	(actual time=0.010..7.565 rows=100000 loops=1)
 Planning Time: 0.024 ms
 Execution Time: 12.774 ms
(3 rows)

Let us connect to “bdb” now and see, how long the other database needs to read the data:

adb=# \c bdb
bdb=# explain analyze SELECT * FROM t_local ;
                                      QUERY PLAN                                                    
--------------------------------------------------------------------------------------
 Foreign Scan on t_local  (cost=100.00..197.75 rows=2925 width=4) 
	(actual time=0.322..90.743 rows=100000 loops=1)
 Planning Time: 0.043 ms
 Execution Time: 96.425 ms
(3 rows)

In this example you can see that 90 milliseconds are burned to do the same thing. So why is that? Behind the scenes the foreign data wrapper creates a cursor and fetches data in really small chunks. By default only 50 rows are fetched at a time. This translates to thousands of network requests. If our two database servers would be further away, things would take even longer – A LOT longer. Network latency plays a crucial role here and performance can really suffer.

One way to tackle the problem is to fetch larger chunks of data at once to reduce the impact of the network itself. ALTER SERVER will allow us to set the “fetch_size” to a large enough value to reduce network issues without increasing memory consumption too much. Here is how it works:

bdb=# ALTER SERVER some_server 
	OPTIONS (fetch_size '50000');
ALTER SERVER

Let us run the test and see, what will happen:

bdb=# explain analyze SELECT * FROM t_local;
                                      QUERY PLAN                                                     
---------------------------------------------------------------------------------------
 Foreign Scan on t_local  (cost=100.00..197.75 rows=2925 width=4) 
	(actual time=17.367..40.419 rows=100000 loops=1)
 Planning Time: 0.036 ms
 Execution Time: 45.910 ms
(3 rows)
PostgreSQL Foreign Data Wrapper performance
PostgreSQL Foreign Data Wrapper performance

Wow, we have managed to more than double the speed of the query. Of course, the foreign data wrapper is still slower than a simple local query. However, the speedup is considerable and it definitely makes sense to toy around with the parameters to tune it.

If you want to learn more about Foreign Data Wrappers, performance and monitoring, check out one of our other postings.