When I am doing training here at CYBERTEC people often ask, how to load data fast and efficiently. Of course, there are many ways to achieve this. One way not too many know about is VACUUM FREEZE. It makes sure that PostgreSQL can organize data nicely straight away by instantly writing those hint bits (as explained in my previous post).
To demonstrate VACUUM FREEZE we generate some data first:
[hs@jacqueline blog]$ cat numbers.pl #!/usr/bin/perl for ($count = 1; $count <= 500000000; $count++) { print "$count $count\n"; }
Importing data
To import the data on my test system (single SATA disk in a Linux box) I am using a normal COPY:
test=# COPY t_test FROM '/data/numbers.txt'; COPY 500000000 Time: 627509.061 ms
As you can see it takes around 10 minutes to do so.
To speed things up we can try VACUUM FREEZE:
test=# COPY t_test FROM '/data/numbers.txt' FREEZE; ERROR: cannot perform FREEZE because the table was not created or truncated in the current subtransaction
The important thing here is that this command can only operate inside a transaction block. Otherwise it does not work. So, let us try this one:
test=# BEGIN; BEGIN test=# CREATE TABLE t_test (a int, b int); CREATE TABLE test=# COPY t_test FROM '/data/numbers.txt' FREEZE; COPY 500000000
The time to do that goes down dramatically:
Time: 304082.907 ms
The main reason here is that PostgreSQL does not have to write transaction log (unless you are using replication) – this reduces the time to import significantly.
But there is more to gain: The second major benefit will be visible when the data is read for the first time after the import. As already shown in my previous posts PostgreSQL has to set hint bits during the first reading operating. Usually this translates to a high amount of writes.
VACUUM FREEZE has been made to fix exactly this kind of problem – there is no need to write hint bits on first read anymore. Especially analytical applications can benefit greatly from this feature.