When writing data structures and table layouts people usually don’t care about the order of columns. Why should they anyway? Columns can be reordered inside a SELECT at will so why care? Well, you should care because the order of columns might have significant impact on the size of your table. That’s right: The size of the table might depend on the order of columns – even if the same data is stored.

Why is that so? Internally there is something called CPU alignment which governs the internal size of a data structure. By cleverly aligning and ordering columns there is some potential to optimize things. You don’t believe it? Let us give it a try:

test=# CREATE TABLE t_test (

            i1         int,

            i2         int,

            i3         int,

            v1        varchar(100),

            v2        varchar(100),

            v3        varchar(100)

);

CREATE TABLE

In this example there are 6 columns. 3 integer columns close to each other and 3 varchar columns also close to each other. Let us add 10 million rows to the table:

test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd'

            FROM generate_series(1, 10000000);

INSERT 0 10000000

The overall size of the table is 574 MB:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));

 pg_size_pretty

----------------

 574 MB

(1 row)

Let us try to change the layout of those columns. In the next example there is a varchar field followed by an integer column. This layout is repeated three times:

test=# CREATE TABLE t_test (

            v1        varchar(100),

            i1         int,

            v2        varchar(100),

            i2         int,

            v3        varchar(100),

            i3         int

);

CREATE TABLE

If 10 millions rows are added now …

test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30

            FROM generate_series(1, 10000000);

INSERT 0 10000000

… the table will be a lot larger:

test=# SELECT pg_size_pretty(pg_relation_size('t_test'));

 pg_size_pretty

----------------

 651 MB

(1 row)

The data in the table is the same – it has just been chosen in a way to demonstrate the effect. If I had used “abc” instead of “abcd” there would be no difference in size because the 4 character string made sure that the string just did not fit into a smaller space.

Conclusion

The important conclusion here is that it definitely makes sense, to pack similar data types together. And, I have found it useful to pack integer columns at the beginning of the tables. In many cases this can yield in some extra percentage points of speed – just because things are a bit smaller than if you don’t make use of this little rule.