When I am doing training here at CYBERTEC people often ask, how to load data fast and efficiently. Of course, there are many ways to achieve this. One way not too many know about is VACUUM FREEZE. It makes sure that PostgreSQL can organize data nicely straight away by instantly writing those hint bits (as explained in my previous post).

To demonstrate VACUUM FREEZE we generate some data first:

[hs@jacqueline blog]$ cat numbers.pl
#!/usr/bin/perl

for ($count = 1; $count <= 500000000; $count++)
{
            print "$count   $count\n";
}

Importing data

To import the data on my test system (single SATA disk in a Linux box) I am using a normal COPY:

test=# COPY t_test FROM '/data/numbers.txt';
COPY 500000000
Time: 627509.061 ms

As you can see it takes around 10 minutes to do so.

To speed things up we can try VACUUM FREEZE:

test=# COPY t_test FROM '/data/numbers.txt' FREEZE;
ERROR:  cannot perform FREEZE because the table was not created or truncated in the current subtransaction

The important thing here is that this command can only operate inside a transaction block. Otherwise it does not work. So, let us try this one:

test=# BEGIN;
BEGIN

test=# CREATE TABLE t_test (a int, b int);
CREATE TABLE

test=# COPY t_test FROM '/data/numbers.txt' FREEZE;
COPY 500000000

The time to do that goes down dramatically:

Time: 304082.907 ms

The main reason here is that PostgreSQL does not have to write transaction log (unless you are using replication) – this reduces the time to import significantly.

But there is more to gain: The second major benefit will be visible when the data is read for the first time after the import. As already shown in my previous posts PostgreSQL has to set hint bits during the first reading operating. Usually this translates to a high amount of writes.

VACUUM FREEZE has been made to fix exactly this kind of problem – there is no need to write hint bits on first read anymore. Especially analytical applications can benefit greatly from this feature.