Speed up PostgreSQL data loading with COPY (FREEZE)

08.2014 | Category: Performance | Tags: development

08.2014

UPDATED JULY 19, 2023 - When I do training here at CYBERTEC, people often ask how to load data fast and efficiently. There are many ways to achieve this. One way not too many know about is COPY (FREEZE). It makes sure that PostgreSQL can organize data nicely straight away by instantly writing hint bits (as explained in my previous post).

Table of Contents

Sample data for our bulk load test

To demonstrate COPY (FREEZE), first generate some data:

$ cat numbers.pl
#!/usr/bin/perl

for ($count = 1; $count <= 500000000; $count++)
{
	print '$countt$countn';
}

1

2

3

4

5

6

7

$ cat numbers.pl

#!/usr/bin/perl

for ($count = 1; $count <= 500000000; $count++)

{

print '$countt$countn';

}

We will load the data into this table:

CREATE TABLE t_test (
   a integer,
   b integer
);

1

2

3

4

CREATE TABLE t_test (

a integer,

b integer

);

Importing the data with `COPY`

To import the data on my test system (single SATA disk in a Linux box) I use a normal COPY:

test=# COPY t_test FROM '/data/numbers.txt';
COPY 500000000
Time: 627509.061 m

1

2

3

test=# COPY t_test FROM '/data/numbers.txt';

COPY 500000000

Time: 627509.061 m

As you can see, it takes around 10 minutes to do so.

Speeding up the load with `COPY (FREEZE)`

To speed things up, let's try COPY (FREEZE):

test=# COPY t_test FROM '/data/numbers.txt' (FREEZE);
ERROR:  cannot perform COPY FREEZE because the table was not created or truncated 
in the current subtransaction

1

2

3

test=# COPY t_test FROM '/data/numbers.txt' (FREEZE);

ERROR: cannot perform COPY FREEZE because the table was not created or truncated

in the current subtransaction

The important thing here is that this command can only operate inside the transaction that created or truncated the table. Otherwise, it doesn't work (I'll explain the reason later). So, let's try the following:

test=# BEGIN;
BEGIN

test=*# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 2/A7D6310
(1 row)

test=*# TRUNCATE t_test;
TRUNCATE

test=*# timing
Timing is on.

test=*# COPY t_test FROM '/data/numbers.txt' FREEZE;
COPY 500000000
Time: 304082.907 ms

test=*# timing
Timing is off.

test=*# SELECT pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 2/A7DC000
(1 row)

test=*# COMMIT;
COMMIT

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

test=# BEGIN;

BEGIN

test=*# SELECT pg_current_wal_lsn();

pg_current_wal_lsn

--------------------

2/A7D6310

(1 row)

test=*# TRUNCATE t_test;

TRUNCATE

test=*# timing

Timing is on.

test=*# COPY t_test FROM '/data/numbers.txt' FREEZE;

COPY 500000000

Time: 304082.907 ms

test=*# timing

Timing is off.

test=*# SELECT pg_current_wal_lsn();

pg_current_wal_lsn

--------------------

2/A7DC000

(1 row)

test=*# COMMIT;

COMMIT

The execution time goes down dramatically!

Explanation for the speed of `COPY (FREEZE)`

The main reason it's now more efficient is that PostgreSQL did not have to write transaction log data. That's because I configured wal_level = minimal. With this setting, you can't use WAL archiving or replication, and PostgreSQL only logs enough data for crash recovery. Because of that, PostgreSQL does not need to write WAL for data modifications that take place in the transaction that created or truncated the table. Skipping WAL writes speeds up data loading considerably! Changing wal_level requires a restart of the database, but it may be worth the pain if you need to load data quickly.

But there's also another gain: The second major benefit will be visible when the data is read for the first time after the import. As already shown in my previous posts, PostgreSQL has to set hint bits during the first reading operation. This translates to a large number of writes later on down the road.

COPY (FREEZE) has been made to fix exactly this kind of problem - the data are already loaded in the frozen state, and there is no need to write hint bits any more. Frozen rows are unconditionally visible, and that is the reason why the table had to be created or truncated in the same transaction: otherwise, concurrent readers could see the rows before the transaction commits, which would violate transaction isolation.

Analytical applications can benefit greatly
from the COPY (FREEZE) feature!

Finally...

I’ve recently seen some really broad tables (hundreds of columns) in a somewhat inefficiently structured database. Our PostgreSQL support customer complained about strange runtime behavior which could not be easily explained. To help other PostgreSQL users in this same situation, I decided to reveal the secrets of a fairly common performance problem many people don’t understand: Column order and column access.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

One response to “Speed up PostgreSQL data loading with COPY (FREEZE)”

Craig Ringer (2ndQuadrant) says:

August 11, 2014 at 2:22 am

Note that there are some non-performance consequences of page size, too:

* A single tuple can't span multiple pages (though TOAST lets you store big individual fields out-of-line), so with 4k pages you're limited to tuples roughly half as wide; and

* Index entries may not span multiple pages, even those for TOASTed values, so the existing limitation on the size of indexed text fields is made smaller with a smaller page size, so it's more likely you'll hit it in production.

For that reason, I'd be pretty reluctant to set page size smaller than 4k.

Also, DBs with different page sizes aren't on-disk compatible as the page size is a compile-time parameter. This means that any change must involve a dump and reload, and it means you have to use custom-compiled Pg instances for all replicas, etc.

I don't think that's worth a 5-10% boost.

Reply

Speed up PostgreSQL data loading with COPY (FREEZE)

Sample data for our bulk load test

Importing the data with `COPY`

Speeding up the load with `COPY (FREEZE)`

Explanation for the speed of `COPY (FREEZE)`

Finally...

One response to “Speed up PostgreSQL data loading with COPY (FREEZE)”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Speed up PostgreSQL data loading with COPY (FREEZE)

Sample data for our bulk load test

Importing the data with COPY

Speeding up the load with COPY (FREEZE)

Explanation for the speed of COPY (FREEZE)

Finally...

One response to “Speed up PostgreSQL data loading with COPY (FREEZE)”

Leave a Reply Cancel reply

Hans-Jürgen Schönig

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts

Importing the data with `COPY`

Speeding up the load with `COPY (FREEZE)`

Explanation for the speed of `COPY (FREEZE)`