Column order in PostgreSQL does matter

I've recently seen some really broad tables (hundreds of columns) in a somewhat inefficiently structured database. Our PostgreSQL support customer complained about strange runtime behavior which could not be easily explained. To help other PostgreSQL users in this same situation, I decided to reveal the secrets of a fairly common performance problem many people don’t understand: Column order and column access.

Table of Contents

Creating a large table

The first question is: How can we create a table containing many columns? The easiest way is to simply generate the CREATE TABLE statement using generate_series:

test=# SELECT 'CREATE TABLE t_broad ('
        || string_agg('t_' || x
        || ' varchar(10) DEFAULT ''a'' ', ', ')
        || ' )'
FROM generate_series(1, 4) AS x;

                      ?column?       
----------------------------------------------------------
CREATE TABLE t_broad (
t_1 varchar(10) DEFAULT 'a' ,
t_2 varchar(10) DEFAULT 'a' ,
t_3 varchar(10) DEFAULT 'a' , 
t_4 varchar(10) DEFAULT 'a'  )
(1 row)

test=# gexec
CREATE TABLE

test=# SELECT 'CREATE TABLE t_broad ('

|| string_agg('t_' || x

|| ' varchar(10) DEFAULT ''a'' ', ', ')

|| ' )'

FROM generate_series(1, 4) AS x;

?column?

----------------------------------------------------------

CREATE TABLE t_broad (

t_1 varchar(10) DEFAULT 'a' ,

t_2 varchar(10) DEFAULT 'a' ,

t_3 varchar(10) DEFAULT 'a' ,

t_4 varchar(10) DEFAULT 'a' )

(1 row)

test=# gexec

CREATE TABLE

For the sake of simplicity I have only used 4 columns here. Once the command has been generated we can use gexec to execute the string we have just compiled. gexec is a really powerful thing: It treats the previous result as SQL input which is exactly what we want here. It leaves us with a table containing 4 columns.

However, let's drop the table and create a really large one.

test=# DROP TABLE t_broad ;
DROP TABLE

1 2	test=# DROP TABLE t_broad ; DROP TABLE

Create an extremely wide table

The following statement creates a table containing 1500 columns. Mind that the upper limit is 1600 columns:

test=# SELECT 'CREATE TABLE t_broad ('
         || string_agg('t_' || x
         || ' varchar(10) DEFAULT ''a'' ', ', ') || ' )'
       FROM generate_series(1, 1500) AS x;

test=# SELECT 'CREATE TABLE t_broad ('

|| string_agg('t_' || x

|| ' varchar(10) DEFAULT ''a'' ', ', ') || ' )'

FROM generate_series(1, 1500) AS x;

In real life such a table is far from efficient and should usually not be used to store data. It will simply create too much overhead and in most cases it is not good modelling in the first place.

Let's populate the table and add 1 million rows:

test=# timing
Timing is on.

test=# INSERT INTO t_broad
       SELECT 'a' FROM generate_series(1, 1000000);
INSERT 0 1000000
Time: 67457,107 ms (01:07,457)

test=# VACUUM ANALYZE ;
VACUUM
Time: 155935,761 ms (02:35,936)

test=# timing

Timing is on.

test=# INSERT INTO t_broad

SELECT 'a' FROM generate_series(1, 1000000);

INSERT 0 1000000

Time: 67457,107 ms (01:07,457)

test=# VACUUM ANALYZE ;

VACUUM

Time: 155935,761 ms (02:35,936)

Note that the table has default values so we can be sure that those columns actually contain something. Finally I have executed VACUUM to make sure that all hint bits and alike are set.

The table we have just created is roughly 4 GB in size which can easily be determined using the following line:

test=# SELECT pg_size_pretty(pg_total_relation_size('t_broad'));
 pg_size_pretty
----------------
 3907 MB
(1 row)

test=# SELECT pg_size_pretty(pg_total_relation_size('t_broad'));

pg_size_pretty

----------------

3907 MB

(1 row)

Accessing various columns

PostgreSQL stores data in rows. As you might know data can be stored column- or row-oriented. Depending on your use case one or the other option might be beneficial. In the case of OLTP a row-based approach is usually far more efficient.

Let's do a count(*) and see how long it takes:

test=# SELECT count(*) FROM t_broad;
  count
---------
 1000000
(1 row)
Time: 416,732 ms

test=# SELECT count(*) FROM t_broad;

count

---------

1000000

(1 row)

Time: 416,732 ms

We can run the query in around 400 ms which is quite ok. As expected, the optimizer will go for a parallel sequential scan:

test=# explain SELECT count(*) FROM t_broad;
                              QUERY PLAN
--------------------------------------------------------------------
 Finalize Aggregate (cost=506208.55..506208.56 rows=1 width=8)
 -> Gather (cost=506208.33..506208.54 rows=2 width=8)
    Workers Planned: 2
    -> Partial Aggregate
       (cost=505208.33..505208.34 rows=1 width=8)
       -> Parallel Seq Scan on t_broad
          (cost=0.00..504166.67 rows=416667 width=0)
 JIT:
     Functions: 4
     Options: Inlining true, Optimization true, Expressions true,
              Deforming true
(8 rows)

test=# explain SELECT count(*) FROM t_broad;

QUERY PLAN

--------------------------------------------------------------------

Finalize Aggregate (cost=506208.55..506208.56 rows=1 width=8)

-> Gather (cost=506208.33..506208.54 rows=2 width=8)

Workers Planned: 2

-> Partial Aggregate

(cost=505208.33..505208.34 rows=1 width=8)

-> Parallel Seq Scan on t_broad

(cost=0.00..504166.67 rows=416667 width=0)

JIT:

Functions: 4

Options: Inlining true, Optimization true, Expressions true,

Deforming true

(8 rows)

Let's compare this to a count on the first column. You'll see a small difference in performance. The reason is that count(*) has to check for the existence of the row while count(column) has to check if a NULL value is fed to the aggregate or not. In case of NULL the value has to be ignored:

test=# SELECT count(t_1) FROM t_broad;
  count
---------
 1000000
(1 row)
Time: 432,803 ms

test=# SELECT count(t_1) FROM t_broad;

count

---------

1000000

(1 row)

Time: 432,803 ms

But, let's see what happens if we access column number 100? The time to do that will differ significantly:

test=# SELECT count(t_100) FROM t_broad;
  count
---------
 1000000
(1 row)

Time: 857,897 ms

test=# SELECT count(t_100) FROM t_broad;

count

---------

1000000

(1 row)

Time: 857,897 ms

The execution time has basically doubled. The performance is even worse if we do a count on column number 1000:

test=# SELECT count(t_1000) FROM t_broad;
  count
---------
 1000000
(1 row)
Time: 8570,238 ms (00:08,570)

test=# SELECT count(t_1000) FROM t_broad;

count

---------

1000000

(1 row)

Time: 8570,238 ms (00:08,570)

Wow, we are already 20 times slower than before. This is not a small difference but a major problem which has to be understood.

Debunking PostgreSQL performance issues: column order

To understand why the problem happens in the first place we need to take a look at how PostgreSQL stores data: After the tuple header which is present in every row we got a couple of varchar columns. We just used varchar here to prove the point. The same issues will happen with other data types - the problem is simply more apparent with varchar as it is more complicated internally than, say, integer.

How does PostgreSQL access a column? It will fetch the row and then dissect this tuple to calculate the position of the desired column inside the row. So if we want to access column #1000 it means that we have to figure out how long those first 999 columns before our chosen one really are. This can be quite complex. For integer we simply have to add 4, but in case of varchar, the operation turns into something really expensive. Let's inspect how PostgreSQL stores varchar (just to see why it is so expensive):

1 bit indicating short (127 bytes) vs. long string (> 127 bit)
7 bit or 31 bit length (depending on first bit)
“data” + (to terminate the string )
alignment (to make sure the next column starts at a multiple of CPU-word length)

Now imagine what that means if we need to loop over 1000 columns? It does create some non-trivial overhead.

Finally …

The key insight here is that using extremely large tables is often not beneficial from a performance standpoint. It makes sense to use sensible table layouts to have a good compromise between performance and convenience.

If you are interested in other ways to improve performance, read my blog on CLUSTER.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

0 0 votes

Article Rating

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Hans-Jürgen Schönig

1 year ago

there is no general solution to the problem. sometimes normalization is good - sometimes not because joins are overhead too. one has to decide case by case. this is really important. i am NOT saying that one should normalize the hell out of things - this is certainly counterproductive.

Kiran Kumar

1 year ago

So u suggest normalisation is better

Ze Governo

1 year ago

So for tables with lots of columns the suggestions is to break it into more tables consolidating "common" information to each one, and joining the data if addtional data from other tables is needed? This performs better than holding everything in the same table?

Stay tuned with our

Column order in PostgreSQL does matter

Creating a large table

Create an extremely wide table

Accessing various columns

Debunking PostgreSQL performance issues: column order

Finally …

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts