When writing data structures and table layouts people usually don’t care about the order of columns. Why should they anyway? Columns can be reordered inside a SELECT at will so why care? Well, you should care because the order of columns might have significant impact on the size of your table. That’s right: The size of the table might depend on the order of columns – even if the same data is stored.
Why is that so? Internally there is something called CPU alignment which governs the internal size of a data structure. By cleverly aligning and ordering columns there is some potential to optimize things. You don’t believe it? Let us give it a try:
test=# CREATE TABLE t_test ( i1 int, i2 int, i3 int, v1 varchar(100), v2 varchar(100), v3 varchar(100) ); CREATE TABLE
In this example there are 6 columns. 3 integer columns close to each other and 3 varchar columns also close to each other. Let us add 10 million rows to the table:
test=# INSERT INTO t_test SELECT 10, 20, 30, 'abcd', 'abcd', 'abcd' FROM generate_series(1, 10000000); INSERT 0 10000000
The overall size of the table is 574 MB:
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 574 MB (1 row)
Let us try to change the layout of those columns. In the next example there is a varchar field followed by an integer column. This layout is repeated three times:
test=# CREATE TABLE t_test ( v1 varchar(100), i1 int, v2 varchar(100), i2 int, v3 varchar(100), i3 int ); CREATE TABLE
If 10 millions rows are added now …
test=# INSERT INTO t_test SELECT 'abcd', 10, 'abcd', 20, 'abcd', 30 FROM generate_series(1, 10000000); INSERT 0 10000000
… the table will be a lot larger:
test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 651 MB (1 row)
The data in the table is the same – it has just been chosen in a way to demonstrate the effect. If I had used “abc” instead of “abcd” there would be no difference in size because the 4 character string made sure that the string just did not fit into a smaller space.
The important conclusion here is that it definitely makes sense, to pack similar data types together. And, I have found it useful to pack integer columns at the beginning of the tables. In many cases this can yield in some extra percentage points of speed – just because things are a bit smaller than if you don’t make use of this little rule.