PostgreSQL is a really powerful database and offers many features to make SQL even more powerful. One of these impressive things is the concept of a composite data type. In PostgreSQL a column can be a fairly complex thing. This is especially important if you want to work with server side stored procedures or functions. However, there are some details people are usually not aware of when making use of stored procedures and composite types.

Composite data types in PostgreSQL

Before we dive into the main topic of this post, I want to give you a mini introduction to composite types in general:

test=# CREATE TYPE person AS (id int, name text, income numeric);
CREATE TYPE

I have created a simple data type to store persons. The beauty is that the composite type can be seen as one column:

test=# SELECT '(10, "hans", 500)'::person;
person
------------------
(10," hans",500)
(1 row)

However, it is also possible to break it up again and represent it as a set of fields.

test=# SELECT ('(10, "hans", 500)'::person).*;
 id | name | income
----+------+--------
 10 | hans | 500
(1 row)

A composite type can be used as part of a table just like any other data type. Here is how it works:

test=# CREATE TABLE data (p person, gender char(1));
CREATE TABLE
test=# \d data
Table "public.data"
 Column |     Type     | Collation | Nullable | Default
--------+--------------+-----------+----------+---------
      p |       person |           |          |
 gender | character(1) |           |          |

As you can see the column type is “person”.
Armed with this kind of information we can focus our attention on performance. In PostgreSQL a composite type is often used in conjunction with stored procedures to abstract values passed to a function or to handle return values.

Be careful with database performance

Why is that important? Let me create a type containing 3 million entries:

test=# CREATE TABLE x (id int);
CREATE TABLE
test=# INSERT INTO x SELECT *
FROM generate_series(1, 3000000);
INSERT 0 3000000
test=# vacuum ANALYZE ;
VACUUM

pgstattuple is an extension which is especially useful if you want to detect bloat in a table. It makes use of a composite data type to return data. Installing the extension is easy:

test=# CREATE EXTENSION pgstattuple;
CREATE EXTENSION

What we want to do next is to inspect the content of “x” and see the data (all fields). Here is what you can do:

test=# explain analyze SELECT (pgstattuple('x')).*;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Result (cost=0.00..0.03 rows=1 width=72) (actual time=1909.217..1909.219 rows=1 loops=1)
   Planning Time: 0.016 ms
   Execution Time: 1909.279 ms
(3 rows)

Wow, it took close to 2 seconds to generate the result. Why is that the case? Let us take a look at a second example:

test=# explain analyze SELECT * FROM pgstattuple('x');
                                               QUERY PLAN
---------------------------------------------------------------------------------------------------------------
 Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual time=212.056..212.057 rows=1 loops=1)
    Planning Time: 0.019 ms
    Execution Time: 212.093 ms
(3 rows)

Ooops? What happened? If we put the query in the FROM-clause the database is significantly faster. The same is true is we use a subselect:

test=# explain analyze SELECT (y).* FROM (SELECT pgstattuple('x') ) AS y;
                                            QUERY PLAN
-----------------------------------------------------------------------------------------
 Result (cost=0.00..0.01 rows=1 width=32) (actual time=209.666..209.666 rows=1 loops=1)
    Planning Time: 0.034 ms
    Execution Time: 209.698 ms
(3 rows)

Let us analyze the reasons for this behavior!

PostgreSQL: Expanding the FROM clause

The problem is that PostgreSQL expands the FROM-clause. It actually turns (pgstattuple(‘x’)) into …

…

(pgstattuple('x')).table_len,
(pgstattuple('x')).tuple_count,
(pgstattuple('x')).tuple_len,
(pgstattuple('x')).tuple_percent,

…

As you can see, the function is called more often in this case which of course explains the runtime difference. Therefore it makes a lot of sense to understand what is going on under the hood here. The performance improvement can be quite dramatic. We have seen a couple of cases in PostgreSQL support recently which could be related to this kind of behavior.

Finally …

If you want to know more about performance consider checking out my blog post about CREATE INDEX and parallelism.