What is pg_dump compression?
pg_dump is a PostgreSQL utility for backing up a local or remote PostgreSQL database. It creates a logical backup file that contains either plain SQL commands for recreating the database, or a binary file that can be restored with the pg_restore utility. The binary backup file can be used to restore the database from scratch, or to restore only selected parts of the database.
Binary files produced by pg_dump can be compressed to reduce their size. This is done by passing the
--format=custom flag to pg_dump, which causes it to produce a compressed backup file. The file can then be decompressed with pg_restore later. To indicate the level of compression, you should provide option
-Z, followed by a number which indicates the compression level (0-9, where 0 is no compression and 9 is the maximum compression level.) The compression algorithm used in pg_dump is the same as used in
What is new for pg_dump compression in PostgreSQL 16?
Georgios Kokolatos implemented a patch which allows you to specify not only the level of compression to be used but also a method for compression.
The patch was reviewed and committed by Michael Paquier. The commit message is:
Compression specifications are currently used by pg_basebackup and pg_receivewal, and are able to let the user control in an extended way the method and level of compression used. As an effect of this commit, pg_dump's -Z/--compress is now able to use more than just an integer, as of the grammar "method[:detail]". The method can be either "none" or "gzip", and can optionally take a detail string. If the detail string is only an integer, it defines the compression level. A comma-separated list of keywords can also be used method allows for more options, the only keyword supported now is "level". The change is backward-compatible, hence specifying only an integer leads to no compression for a level of 0 and gzip compression when the level is greater than 0. Most of the code changes are straight-forward, as pg_dump was relying on an integer tracking the compression level to check for gzip or no compression. These are changed to use a compression specification and the algorithm stored in it. As of this change, note that the dump format is not bumped because there is no need yet to track the compression algorithm in the TOC entries. Hence, we still rely on the compression level to make the difference when reading them. This will be mandatory once a new compression method is added, though. In order to keep the code simpler when parsing the compression specification, the code is changed so as pg_dump now fails hard when using gzip on -Z/--compress without its support compiled, rather than enforcing no compression without the user knowing about it except through a warning. Like before this commit, archive and custom formats are compressed by default when the code is compiled with gzip, and left uncompressed without gzip.
As you can see, only two method specifiers are implemented here: none and gzip. But the importance of this change is that it prepares the infrastructure for adding all other compression methods available at the time, like gzip, lz4, zstd.
Let’s try it out!
~$ pg_dump --version pg_dump (PostgreSQL) 16devel ~$ pgbench --initialize --scale=100 dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data (client-side)... 10000000 of 10000000 tuples (100%) done (elapsed 18.34 s, remaining 0.00 s) vacuuming... creating primary keys... done in 23.30 s (drop tables 0.00 s, create tables 0.00 s, client-side generate 18.49 s, vacuum 0.32 s, primary keys 4.48 s). ~$ psql --command="select pg_size_pretty(pg_database_size('postgres'))" pg_size_pretty ---------------- 1503 MB (1 row) ~$ pg_dump --format=custom --compress=9 > 1_compress_9.gz ~$ pg_dump --format=custom --compress=0 > 2_compress_0.gz ~$ pg_dump --format=custom --compress=none:9 > 3_none_9.gz pg_dump: error: invalid compression specification: compression algorithm "none" does not accept a compression level ~$ pg_dump --format=custom --compress=gzip:9 > 4_gzip_9.gz ~$ pg_dump --format=custom --compress=gzip:0 > 5_gzip_0.gz pg_dump: error: invalid compression specification: compression algorithm "gzip" expects a compression level between 1 and 9 (default at -1) ~$ pg_dump --format=custom --compress=gzip:1 > 6_gzip_1.gz ~$ ls -l --block-size=M total 1061M -rw-rw-r-- 1 postgres postgres 27M Dec 2 12:12 1_compress_9.gz -rw-rw-r-- 1 postgres postgres 981M Dec 2 12:12 2_compress_0.gz -rw-rw-r-- 1 postgres postgres 0M Dec 2 12:12 3_none_9.gz -rw-rw-r-- 1 postgres postgres 27M Dec 2 12:13 4_gzip_9.gz -rw-rw-r-- 1 postgres postgres 0M Dec 2 12:13 5_gzip_0.gz -rw-rw-r-- 1 postgres postgres 28M Dec 2 12:13 6_gzip_1.gz
The results show that the non-zero compression level raises an error when the compression algorithm is none. The results also show that a compression level other than zero is required when the compression algorithm is gzip.
-Z/--compress in PostgreSQL 16 will support more than just an integer. It can be used to specify the method and level of compression used. The default is still gzip with a level of 6. As I already said, pg_dump is sometimes used to update and/or upgrade the database. In case you want to understand the difference between an update and an upgrade, check out this blog post by Hans-Jürgen Schönig. Or check our other related publications about updating and upgrading.