I recently had an interesting support case that shows how the cause of a problem can sometimes be where you would least suspect it.

About table bloat

After an UPDATE or DELETE, PostgreSQL keeps old versions of a table row around. This way, concurrent sessions that want to read the row don’t have to wait. But eventually this “garbage” will have to be cleaned up. That is the task of the autovacuum daemon.

Usually you don’t have to worry about that, but sometimes something goes wrong. Then old row versions don’t get deleted, and the table keeps growing. Apart from the wasted storage space, this will also slow down sequential scans and – to some extent – index scans.

To get rid of the bloat, you can use VACUUM (FULL) and other tools like pg_squeeze. But it is important to find and fix the cause of the table bloat so that it does not reappear.

The problem

I got called by a customer who experienced table bloat in the pg_attribute system catalog table that contains the table column metadata.

This can happen if table columns get modified or dropped frequently. Most of the time, these are temporary tables which are automatically dropped when the session or transaction ends.

The customer made heavy use of temporary tables. But they have several Linux machines with databases that experience the same workload, and only some of those had the problem.

Searching the cause

I went through the list of common causes for table bloat:

  • Database transactions that remain open (state “idle in transaction”).
    These will keep autovacuum from cleaning up row versions that have become obsolete after the start of the transaction.
  • A rate of data modification that is so high that autovacuum cannot keep up.
    In this case, the correct answer is to make autovacuum more aggressive.

Both were not the case; the second option could be ruled out because that would cause bloat on all machines and not only on some.

Then I had a look at the usage statistics for the affected table:

dbname=> SELECT * FROM pg_stat_sys_tables
dbname->          WHERE relname = 'pg_attribute';

-[ RECORD 1 ]-------+-------------
relid               | 1249
schemaname          | pg_catalog
relname             | pg_attribute
seq_scan            | 167081
seq_tup_read        | 484738
idx_scan            | 1506941
idx_tup_fetch       | 4163837
n_tup_ins           | 0
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 0
n_dead_tup          | 0
n_mod_since_analyze | 0
last_vacuum         |
last_autovacuum     |
last_analyze        |
last_autoanalyze    |
vacuum_count        | 0
autovacuum_count    | 0
analyze_count       | 0
autoanalyze_count   | 0

This confirms that autovacuum has never run. But more interesting is that we get a hint why it didn’t run:

PostgreSQL thinks that the number of dead tuples (row versions that could be removed) is 0, so it didn’t even try to remove them.

The statistics collector

A suspicion became certainty after I found the following message in the server logs:

using stale statistics instead of current ones because stats collector is
not responding

The statistics collector process is the PostgreSQL backend process that collects usage statistics.

After each activity, PostgreSQL backends send statistics about their activity. These statistics updates are sent through a UDP socket on localhost; that is created at PostgreSQL startup time. The statistics collector reads from the socket and aggregates the collected statistics.

Closing in on the problem

The statistics collector was running:

918     1 /usr/pgsql-10/bin/postmaster -D /var/lib/pgsql/10/data/
947   918 postgres: logger process   
964   918 postgres: checkpointer process   
965   918 postgres: writer process   
966   918 postgres: wal writer process   
967   918 postgres: autovacuum launcher process   
968   918 postgres: stats collector process   
969   918 postgres: bgworker: logical replication launcher

To see what the statistics collector was doing and to spot any problems it had, I traced its execution:

# strace -p 968
strace: Process 968 attached
epoll_pwait(3,

The statistics collector was waiting for messages on the UDP socket, but no messages were coming through!

I had a look at the UPD socket:

# netstat -u -n -p
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State       PID/Program name    
udp6       0      0 ::1:59517       ::1:59517       ESTABLISHED 918/postmaster

Nothing suspicious so far.

But when I tried the same thing on a machine that didn’t have the problem, I got something different:

# netstat -u -n -p
Active Internet connections (w/o servers)
Proto Recv-Q Send-Q Local Address   Foreign Address State       PID/Program name    
udp        0      0 127.0.0.1:46031 127.0.0.1:46031 ESTABLISHED 9303/postmaster

It turned out that on all systems that experienced table bloat, the statistics collector socket was created on the IPv6 address for localhost, while all working systems were using the IPv4 address!

But all machines had IPv6 disabled for the loopback interface:

# ifconfig lo
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 6897  bytes 2372420 (2.2 MiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 6897  bytes 2372420 (2.2 MiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Nailing the bug to the wall

PostgreSQL uses the POSIX function getaddrinfo(3) to resolve localhost.

Since PostgreSQL knows quite well that it is important to have a working statistics collection, it will loop through all the addresses returned by that call, create a UDP socket and test it until it has a socket that works.

So we know that IPv6 must have worked when PostgreSQL was started!

Further analysis revealed that IPv6 was disabled during the boot sequence, but there was a race condition:
Sometimes IPv6 would be disabled before PostgreSQL started, sometimes afterwards. And these latter machines were the ones where the statistics collector stopped working and tables got bloated!

After changing the boot sequence to always disable IPv6 before starting PostgreSQL, the problem was fixed.

Conclusion

This shows (again) how the cause of a computer problem can be in an entirely different place than you would suspect at first glance.

It also shows why a good database administrator needs to know the operating system well.

There are couple of different ways to implement database encryption – commonly on operating system, filesystem, file or column level, leaving out transport level encryption which is supported since 15 years. Each of those approaches counters a different threat model, and one can easily imagine that in the case of databases, where the systems were originally not designed with encryption in mind, it is not exactly easy to first agree on a certain way of doing things – and then there would be for sure a lot of technical pitfalls on the way. But today, after a lot of work in co-operation with a client, we’re really glad to announce that our PostgreSQL instance-level encryption patch mentioned some months ago is now ready for download and use!

How does it work?

The high-level idea behind the patch is to store all the files making up a PostgreSQL cluster on disk in encrypted format (data-at-rest encryption) and then decrypt blocks as they are read from disk into shared buffers. As soon as a block is written out to the disk from shared buffers again, it will be encrypted automatically. This requires firstly that the database is initialized (initdb) with encryption in mind and secondly that during startup, the server needs access to the encryption-key, which can be provided in two ways – through an environment variable or through a new “pgcrypto.keysetup_command” parameter. Note that the entire instance is encrypted, so in some sense this implementation is comparable with filesystem or partition level encryption.

Sample setup

1. Build the Postgres code with the patch applied, install also “contrib”.

2. Initialize the cluster by setting the encryption key and then calling initdb


read -sp "Postgres passphrase: " PGENCRYPTIONKEY

export PGENCRYPTIONKEY=$PGENCRYPTIONKEY

initdb –data-encryption pgcrypto --data-checksums -D cryptotest

3. Starting the server

3.1. Set the PGENCRYPTIONKEY environment variable before starting the server (in our case it’s already there but not so after opening a new console)


export PGENCRYPTIONKEY=topsecret pg_ctl -D cryptotest start

3.2. Optionally, instead of feeding the encryption key from environment variable one could implement a custom and more secure key reading procedure via the new “pgcrypto.keysetup_command” postgresql.conf parameter. This parameter will be executed as a Bash command by the Postgres server process and it needs to return a string in format of “encryptionkey=…” with a key part of 64 hex characters. In this way pg_ctl can be used as normally.

Details

For those more technically interested – encryption algorithm used is 128-bit AES in XTS mode, sometimes called also XTS-AES. It’s a block cipher with a “tweak” for extra security and basically military-grade standard stuff so you shouldn’t worry about the mechanics too much but rather how to keep your encryption-key safe and how to safely feed it to Postgres when the server starts up. The key needs to be provided to the server during every startup and when it doesn’t match (a known value kept encrypted in the controlfile – visible under “Data encryption fingerprint” when calling out pg_controldata – will be decrypted and compared) server will refuse to start.

Encrypted will be more or less everything – heap files (tables, indexes, sequences), xlog (as they also contain data), clog, temporary files being generated during execution of a larger query.

The main encryption/decryption code itself is placed into the existing “pgcrypto” module, extending it with functions like “pgcrypto_encrypt_block” and “pgcrypto_decrypt_block” and adding some other setup functions but also the storage manager and more than a dozen other files had to be changed.

Expectations on performance

Naturally, one agrees to compromise on performance when going for encryption, as there are no free lunches. Here things are a bit fuzzy – one can expect a very considerable performance hit if your workload is IO-oriented (logging etc). On the other hand, on typical server hardware, when the active dataset stays more or less in shared buffers, the performance penalty will be very minor and not noticeable.

To get a better picture I ran 5 different types of “pgbench” workloads (avg. of 3 runs of 10min each, 4 concurrent clients) and the results can be seen below. To conjure up a “typical workload” I’m using here pgbench in  “–skip-updates” mode, giving us basically 1:3 read-write ratio.

Hardware: AWS i2.xlarge (4 vCPUs, 2.5 GHz, Intel Xeon E5-2670v2, 30.5 GiB memory, 1 x 800 GB SSD XFS-formatted)

WorkloadWithout encryptionWith encryptionPenalty
Bulk insert (pgbench init on scale 200, ~2.6GB of data)32s75s134%
Read only from shared_buffers21657 TPS21462 TPS0.9%
Read-write (1:3 ratio) fitting into shared buffers3600 TPS3154 TPS12%
Read-only not fitting into shared buffers19876 TPS12328 TPS38%
Read-write (1:3 ratio) not fitting into shared buffers3418 TPS2685 TPS21%

As we can see, the picture is pretty non-linear here, with performance difference jumping from 1% to 134% for different tests, with an average of 16% for a typical business application and with an average penalty <10% when things fit into shared buffers. This all means that encryption performance is very workload specific and sensitive against the amount of pages moved between shared buffers and disk (cache ratio) as well as pure CPU power for the number crunching, so it’s quite difficult to conclude something final here besides stating that performance decreases for typical use cases is very acceptable in my opinion, with some room for future optimizations (starting to use AES-NI i.e. hardware accelerated encryption instructions supported by all modern CPUs).

All in all, Postgres now has a new and viable encryption option available, and in some environments it could well be the simplest approach for securing your data. As always, we would appreciate your feedback and questions! Download here.

In the past couple of years replication has been made easier and easier. Setting up streaming replication is pretty easy these days. When doing training I am getting a lot of positive feedback and people seem to like the way entire database instances can be cloned.

In many cases DBAs use a straight forward approach to creating base backups: pg_basebackup.

Potential showstoppers on the way

While this is simple there can be a small issue, which pops up from time to time: What if the I/O capacity of two server is low while the network bandwidth is large? In case of a simple base backup can lead to high disk wait on both servers. The problem is that pg_basebackup running at full speed can easily cause response times to go up because the disk is just to busy reading all the data.

Some customers have reported that this kind of problem even feels like downtime.

The solution to the problem is to control the speed of pg_basebackup. By reducing the backup speed to a reasonable level the disk on the master has enough time to serve normal requests.

The -r flag of pg_basebackup can be used for exactly that:

-r, --max-rate=RATE    maximum transfer rate to transfer data directory

Controlling the speed of your base backups can be a good idea if I/O bandwidth is an issue on your systems. It makes sure that your systems stay responsive all the time – even during a backup.

PostgreSQL administration and performance tuning COMPACT

Duration:3 Days
Level:Intermediate
Price:contact us today
Audience:This course is especially suitable for database administrators (Linux / Windows / Solaris / Mac OS X) and sysadmis, with less time, who cannot attend the 5 days training.
Date:14th – 16th October 2013 (in German; English on request)

 

Day 1:

Installing PostgreSQL

  • Creating database instances
  • Creating databases
  • Encoding issues

Using indexes

  • Simple btree indexes
  • The PostgreSQL optimizer
  • Full Text search
  • Fuzzy search

Day 2:

Understanding transactions

  • Concurrency issues
  • Using savepoints
  • PostgreSQL locking behavior
  • Transaction isolation

MVCC and space management

  • Understanding MVCC
  • PostgreSQL disk layout
  • VACUUM: Reclaiming space
  • Autovacuum

PostgreSQL monitoring

  • Understanding PostgreSQL system tables
  • PostgreSQL system statistics
  • Creating logfiles

Windowing and analytics

  • Analyzing data
  • Using analytics

Day 3:

PostgreSQL performance tuning

  • Tweaking checkpoints
  • Configuring shared_buffers
  • Additional memory parameters

Backup and recovery

  • Running backups
  • Configuring Point-In-Time-Recovery

Utilizing replication

  • Streaming replication
  • Synchronous and asynchronous replication
  • High-Availability considerations

PostgreSQL stored procedures

  • Writing SQL functions
  • Using PL/pgSQL
  • Working with triggers