I’ve recently seen some really broad tables (hundreds of columns) in a somewhat inefficiently structured database. Our PostgreSQL support customer complained about strange runtime behavior which could not be easily explained. To help other PostgreSQL users in this same situation, I decided to reveal the secrets of a fairly common performance problem many people don’t understand: Column order and column access.
Creating a large table
The first question is: How can we create a table containing many columns? The easiest way is to simply generate the
CREATE TABLE statement using generate_series:
test=# SELECT 'CREATE TABLE t_broad (' || string_agg('t_' || x || ' varchar(10) DEFAULT ''a'' ', ', ') || ' )' FROM generate_series(1, 4) AS x; ?column? ---------------------------------------------------------- CREATE TABLE t_broad ( t_1 varchar(10) DEFAULT 'a' , t_2 varchar(10) DEFAULT 'a' , t_3 varchar(10) DEFAULT 'a' , t_4 v archar(10) DEFAULT 'a' ) (1 row) test=# \gexec CREATE TABLE
For the sake of simplicity I have only used 4 columns here. Once the command has been generated we can use \gexec to execute the string we have just compiled. \gexec is a really powerful thing: It treats the previous result as SQL input which is exactly what we want here. It leaves us with a table containing 4 columns.
However, let’s drop the table and create a really large one.
test=# DROP TABLE t_broad ; DROP TABLE
Create an extremely wide table
The following statement creates a table containing 1500 columns. Mind that the upper limit is 1600 columns:
test=# SELECT 'CREATE TABLE t_broad (' || string_agg('t_' || x || ' varchar(10) DEFAULT ''a'' ', ', ') || ' )' FROM generate_series(1, 1500) AS x;
In real life such a table is far from efficient and should usually not be used to store data. It will simply create too much overhead and in most cases it is not good modelling in the first place.
Let’s populate the table and add 1 million rows:
test=# \timing Timing is on. test=# INSERT INTO t_broad SELECT 'a' FROM generate_series(1, 1000000); INSERT 0 1000000 Time: 67457,107 ms (01:07,457) test=# VACUUM ANALYZE ; VACUUM Time: 155935,761 ms (02:35,936)
Note that the table has default values so we can be sure that those columns actually contain something. Finally I have executed
VACUUM to make sure that all hint bits and alike are set.
The table we have just created is roughly 4 GB in size which can easily be determined using the following line:
test=# SELECT pg_size_pretty(pg_total_relation_size('t_broad')); pg_size_pretty ---------------- 3907 MB (1 row)
Accessing various columns
PostgreSQL stores data in rows. As you might know data can be stored column- or row-oriented. Depending on your use case one or the other option might be beneficial. In the case of OLTP a row-based approach is usually far more efficient.
Let’s do a
count(*) and see how long it takes:
test=# SELECT count(*) FROM t_broad; count --------- 1000000 (1 row) Time: 416,732 ms
We can run the query in around 400 ms which is quite ok. As expected, the optimizer will go for a parallel sequential scan:
test=# explain SELECT count(*) FROM t_broad; QUERY PLAN -------------------------------------------------------------------- Finalize Aggregate (cost=506208.55..506208.56 rows=1 width=8) -> Gather (cost=506208.33..506208.54 rows=2 width=8) Workers Planned: 2 -> Partial Aggregate (cost=505208.33..505208.34 rows=1 width=8) -> Parallel Seq Scan on t_broad (cost=0.00..504166.67 rows=416667 width=0) JIT: Functions: 4 Options: Inlining true, Optimization true, Expressions true, Deforming true (8 rows)
Let’s compare this to a count on the first column. You’ll see a small difference in performance. The reason is that
count(*) has to check for the existence of the row while
count(column) has to check if a
NULL value is fed to the aggregate or not. In case of
NULL the value has to be ignored:
test=# SELECT count(t_1) FROM t_broad; count --------- 1000000 (1 row) Time: 432,803 ms
But, let’s see what happens if we access column number 100? The time to do that will differ significantly:
test=# SELECT count(t_100) FROM t_broad; count --------- 1000000 (1 row) Time: 857,897 ms
The execution time has basically doubled. The performance is even worse if we do a count on column number 1000:
test=# SELECT count(t_1000) FROM t_broad; count --------- 1000000 (1 row) Time: 8570,238 ms (00:08,570)
Wow, we are already 20 times slower than before. This is not a small difference but a major problem which has to be understood.
Debunking PostgreSQL performance issues: column order
To understand why the problem happens in the first place we need to take a look at how PostgreSQL stores data: After the tuple header which is present in every row we got a couple of
varchar columns. We just used
varchar here to prove the point. The same issues will happen with other data types – the problem is simply more apparent with
varchar as it is more complicated internally than, say,
How does PostgreSQL access a column? It will fetch the row and then dissect this tuple to calculate the position of the desired column inside the row. So if we want to access column #1000 it means that we have to figure out how long those first 999 columns before our chosen one really are. This can be quite complex. For
integer we simply have to add 4, but in case of
varchar, the operation turns into something really expensive. Let’s inspect how PostgreSQL stores
varchar (just to see why it is so expensive):
- 1 bit indicating short (127 bytes) vs. long string (> 127 bit)
- 7 bit or 31 bit length (depending on first bit)
- “data” + \0 (to terminate the string )
- alignment (to make sure the next column starts at a multiple of CPU-word length)
Now imagine what that means if we need to loop over 1000 columns? It does create some non-trivial overhead.
The key insight here is that using extremely large tables is often not beneficial from a performance standpoint. It makes sense to use sensible table layouts to have a good compromise between performance and convenience.
If you are interested in other ways to improve performance, read my blog on CLUSTER.