CYBERTEC Logo

When writing data structures and table layouts, people usually don't care about the order of columns. Why should they anyway? Columns can be reordered inside a SELECT at will, so why care? Well, you should care because the order of columns might have significant impact on the size of your table. That's right: The size of the table might depend on the order of columns - even if the same data is stored.

Why is that so? Internally, there is something called CPU alignment which governs the internal size of a data structure. By cleverly aligning and ordering columns, there is some potential to optimize things. You don't believe it? Let us give it a try:

 

 

In this example there are 6 columns. 3 integer columns close to each other and 3 varchar columns also close to each other. Let us add 10 million rows to the table:

 

The overall size of the table is 574 MB:

 

Let us try to change the layout of those columns. In the next example there is a varchar field followed by an integer column. This layout is repeated three times:

 

If 10 million rows are added now …

 

 

The data in the table is the same - it has just been chosen in a way to demonstrate the effect. If I had used “abc” instead of “abcd” there would be no difference in size because the 4 character string made sure that the string just did not fit into a smaller space.

Conclusion

The important conclusion here is that it definitely makes sense, to pack similar data types together. And, I have found it useful to pack integer columns at the beginning of the tables. In many cases this can yield in some extra percentage points of speed - just because things are a bit smaller than if you don't make use of this little rule.

Get some great hints and tips about indexes in my post, "Find and Fix a Missing PostgreSQL Index".

UPDATE September 2023: This feature is now called pg_resetwal, see the PostgreSQL documentation about it here. The blog below will be preserved for reference. See this blog by Laurenz Albe about corrupting databases to read more aboutpg_resetwal.

PostgreSQL has proven to be one of the most reliable Open Source databases around. In the past we have dealt with customers who have achieved superior uptimes and who are really happy with PostgreSQL. But even if PostgreSQL does its job day after day after day - there are some components, which can just fail and cause trouble. In short: Hardware may crash, filesystem can fail, and so on. In 99% of all cases there is a way around total disaster. Either you have a replica, which protected you from downtime or you simply have a reasonably good backup to help out in case of disaster. But what if there is really no option left anymore? You got to make this database work again - no matter what (even if it is just a partial or an unreliable restore). Maybe pg_resetxlog is your last helping hand in this case.

Killing an instance

As you might know the purpose of the transaction log (= xlog or WAL) is to restore order and consistency in case of a crash. If the xlog is broken, you might face disaster. This is exactly what we want 😉

So, to cause some disaster I sent a “kill -9” to a database system under load to stop it. In case of a restart this would lead to a recovery process and PostgreSQL would fix things on startup by replaying the transaction log. Let us disturb the replay process a little ...

What I have just done is really nasty. I have overwritten some transaction log files found in the pg_xlog directory with zeros. This should instantly make sure that the recovery process ends up in hell. Let us check if my attempts to really kill things has succeeded:

Wow, this is exactly what I have intended to do. Nothing works anymore so it is time to fix things and demonstrate the power of pg_resetxlog.

pg_resetxlog at work

pg_resetxlog is a C program shipped with PostgreSQL. So, there is no need to install it on top of PostgreSQL. It is simply there and ready for action in case it is needed. Actually it is very powerful as the following help page shows:

It is possible to nicely configure pg_resetxlog to make it do precisely what is needed (maybe by just setting xlog back a little). However, in this example we don't care too much - we just want to reset the xlog completely:

One line is enough to reset the xlog of PostgreSQL.

Let us see if this has worked:

Voila, the database was able to start up again. The autovacuum daemon has been launched nicely and the system is able to accept connection. A simple test reveals that all databases seem to be around:

So, where is the problem?

Be cautious

pg_resetxlog is really the last line of defence. If you have to call it, you should expect some data to be lost. Maybe you will face slightly broken tables or corrupted indexes. You should not really trust this database anymore - however, you will have the chance to extract some data, restore at least something.

Under any circumstances: If you are forced to use pg_resetxlog I would really advise to not continue with the database instance in doubt, take a backup, PROPERLY check the data and start over with a new database instance.

With PostgreSQL 9.4 just around the corner and 8.4 just retired I thought it is time to review those years of development to see how the size of the PostgreSQL code base has evolved over time. In the past couple of years (I even already got to say “decades”) a lot of development has happened and all those changes had a significant impact on the sheer volume of the code.

Out of plain curiosity I decided to start a little analysis regarding the code size of the PostgreSQL project over time.

To start the analysis I decided to count the lines of plain C code, all header files as well as grammar files (.y). Here is the shell command used on all versions:

The results are pretty interesting:
9.3.5: 1190501
9.2.9: 1146762
9.1.14: 1091400
9.0.18: 1018774
8.4.22: 954521

The interesting thing here to see is that the size of the code has grown fairly consistently over time. From one major release to the next we see a constant increase in code source, which is more or less stable over the years.

Historically ...

Just out of curiosity: On 29th of January 1997 PostgreSQL 6.0 was released. A long long time ago. Guess what the size of the code was back then.

The answer is? 185437 lines

PostgreSQL has come a long way since then.

UPDATED JULY 19, 2023 - When I do training here at CYBERTEC, people often ask how to load data fast and efficiently. There are many ways to achieve this. One way not too many know about is COPY (FREEZE). It makes sure that PostgreSQL can organize data nicely straight away by instantly writing hint bits (as explained in my previous post).

Sample data for our bulk load test

To demonstrate COPY (FREEZE), first generate some data:

We will load the data into this table:

Importing the data with COPY

To import the data on my test system (single SATA disk in a Linux box) I use a normal COPY:

As you can see, it takes around 10 minutes to do so.

Speeding up the load with COPY (FREEZE)

To speed things up, let's try COPY (FREEZE):

The important thing here is that this command can only operate inside the transaction that created or truncated the table. Otherwise, it doesn't work (I'll explain the reason later). So, let's try the following:

The execution time goes down dramatically!

Explanation for the speed of COPY (FREEZE)

The main reason it's now more efficient is that PostgreSQL did not have to write transaction log data. That's because I configured wal_level = minimal. With this setting, you can't use WAL archiving or replication, and PostgreSQL only logs enough data for crash recovery. Because of that, PostgreSQL does not need to write WAL for data modifications that take place in the transaction that created or truncated the table. Skipping WAL writes speeds up data loading considerably! Changing wal_level requires a restart of the database, but it may be worth the pain if you need to load data quickly.

But there's also another gain: The second major benefit will be visible when the data is read for the first time after the import. As already shown in my previous posts, PostgreSQL has to set hint bits during the first reading operation. This translates to a large number of writes later on down the road.

COPY (FREEZE) has been made to fix exactly this kind of problem - the data are already loaded in the frozen state, and there is no need to write hint bits any more. Frozen rows are unconditionally visible, and that is the reason why the table had to be created or truncated in the same transaction: otherwise, concurrent readers could see the rows before the transaction commits, which would violate transaction isolation.

Analytical applications can benefit greatly
from the COPY (FREEZE) feature!

Finally...

I’ve recently seen some really broad tables (hundreds of columns) in a somewhat inefficiently structured database. Our PostgreSQL support customer complained about strange runtime behavior which could not be easily explained. To help other PostgreSQL users in this same situation, I decided to reveal the secrets of a fairly common performance problem many people don’t understand: Column order and column access.
 


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

Post UPDATED June 2023: PostgreSQL is a highly sophisticated relational database system capable of performing wonderful operations. But, sophistication also means that there is a bit of complexity under the surface, which is not always well understood by users. One thing people usually don't know about are hint bits.

What are hint bits?

Actually, they are an internal optimization which makes sure that visibility checks are sped up considerably. As you might know, PostgreSQL traditionally had to decide for each row whether it was visible to a user or not. The best way to speed up the process is to avoid that - and that is why hint bits were invented.

When PostgreSQL finds a row that is visible to all transactions, it sets a bit in the row header indicating this fact. Next time someone finds this row, they can avoid comparing the row visibility information to their snapshot, because the hint bit already tells them that it must be visible.

Creating some demo data

To demonstrate how things work, let's create some demo data:

A simple Perl script can create the data nicely. In this example, half a billion rows are used (big enough to demonstrate the point and yet small enough to do the test in reasonable time):

To ensure that autovacuum does not spoil our day, it is turned off for this test:

Then data is loaded into PostgreSQL. The easiest way to do that is to use “COPY … FROM PROGRAM” and to simply read things in through a pipe. It works like this:

On our system...

(some Intel i5 and a 750 GB SATA disk) it takes around 10 minutes to load the data. Note that default PostgreSQL parameters were used (no checkpoint optimizations). “top” leaves us with output as shown in the next listing:

You can clearly see that generating numbers actually causes a decent share of the CPU load. Actually, this is good - because it means that PostgreSQL is cheap, relative to the data creation part.

Hint bits: first contact …

Up until now, nothing special could be observed. Let's try to run a SELECT:

The interesting part here is not the result of the query. The really interesting part is hidden nicely below the surface.

Here is vmstat:

“bi” (= blocks in) and “bo” (= blocks out) are the two columns of interest here.

At first, things look pretty normal.

We start to read at roughly 100 MB / second, a reasonable value for a desktop PC featuring a single SATA drive. But then, things start to go south. At some point, PostgreSQL starts to write data out to disk. Why that? Well, when a row is loaded, PostgreSQL has no idea whether it is visible to others without checking visibility all over again. Therefore PostgreSQL will mark a row as “visible to everyone” when it's read for the first time. This is exactly where the I/O comes from - when rows are marked as “visible to everybody”. The advantage here is that visibility checks are sped up tremendously.

When the query is executed again, the I/O pattern is totally different:

As you can see, there is no more writing going on - just a clean 100+ MB / second. Exactly what we want to see.

The read is a lot faster now:

Wow, the time needed to perform this one has gone down dramatically! And more importantly, the execution time is pretty stable:

The theory behind it is that reads happen more frequently than writes, and therefore investing in hint bits makes an awful lot of sense, because it seriously improves run-times.

Turning to VACUUM

But there is a bit more. What if we VACUUM the table?

Here is vmstat again:

Again, some writing is going on. We are seriously bound by the disk here, as the following couple of lines taken from “top” indicate:

VACUUM takes quite a while as well:

The beauty now is that we not only know whether a single row is visible or not - VACUUM can also figure out whether an entire block is visible or not.

Again, we see an improvement in speed - but this time, it's not very big (just two seconds):

The main benefit really comes from those bits attached to a single line - and this benefit is massive.

Finally...

If you liked this article, check out the next article in this series: Speed up PostgreSQL data loading with COPY (FREEZE)

For further information on how to speed up performance in PostgreSQL, see this post about CLUSTER and HOT updates, or see Laurenz Albe's first article on HOT updates.


In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram