There is a couple of different ways to implement database encryption – commonly on operating system, filesystem, file or column level, leaving out transport level encryption, which is supported since 15 years. Each of those approaches counters a different threat model, and one can easily imagine that in the case of databases, where the systems were originally not designed with encryption in mind, it is not exactly easy to agree on a certain way of doing things – and then there would be a lot of technical pitfalls on the way. But today, after a lot of work in co-operation with a client, we’re really glad to announce that our PostgreSQL instance-level encryption patch mentioned some months ago is now ready for download and use!
How does it work?
The high-level idea behind the patch is to store all the files, making up a PostgreSQL cluster on disk in encrypted format (data-at-rest encryption) and then decrypt blocks, as they are read from disk into shared buffers. As soon as a block is written out to the disk from shared buffers again, it will be encrypted automatically. This requires firstly that the database is initialized (initdb) with encryption in mind and secondly that during startup, the server needs access to the encryption-key, which can be provided in two ways – through an environment variable or through a new “pgcrypto.keysetup_command” parameter. Note that the entire instance is encrypted, so in some sense this implementation is comparable with filesystem or partition level encryption.
Sample setup
1. Build the Postgres code with the patch applied, install also “contrib”.
2. Initialize the cluster by setting the encryption key and then calling initdb
read -sp "Postgres passphrase: " PGENCRYPTIONKEY export PGENCRYPTIONKEY=$PGENCRYPTIONKEY initdb –data-encryption pgcrypto --data-checksums -D cryptotest
3. Starting the server
3.1. Set the PGENCRYPTIONKEY environment variable before starting the server (in our case it’s already there but not so after opening a new console)
export PGENCRYPTIONKEY=topsecret pg_ctl -D cryptotest start
3.2. Optionally, instead of feeding the encryption key from environment variable one could implement a custom and more secure key reading procedure via the new “pgcrypto.keysetup_command” postgresql.conf parameter. This parameter will be executed as a Bash command by the Postgres server process and it needs to return a string in format of “encryptionkey=…” with a key part of 64 hex characters. In this way pg_ctl can be used as normally.
Details
For those who are more technically interested – encryption algorithm used is 128-bit AES in XTS mode, sometimes called also XTS-AES. It’s a block cipher with a “tweak” for extra security and basically military-grade standard stuff so you shouldn’t worry about the mechanics too much but rather how to keep your encryption-key safe and how to safely feed it to Postgres when the server starts up. The key needs to be provided to the server during every startup and when it doesn’t match (a known value kept encrypted in the control file – visible under “Data encryption fingerprint” when calling out pg_controldata – will be decrypted and compared) the server will refuse to start.
More or less everything will be encrypted – heap files (tables, indexes, sequences), xlog (as they also contain data), clog, temporary files being generated during execution of a larger query.
The main encryption/decryption code itself is placed into the existing “pgcrypto” module, extending it with functions like “pgcrypto_encrypt_block” and “pgcrypto_decrypt_block” and adding some other setup functions but also the storage manager and more than a dozen other files had to be changed.
Expectations on performance
Naturally, one agrees to compromise on performance when going for encryption, as there are no free lunches. Here things are a bit fuzzy – one can expect a very considerable performance hit if your workload is IO-oriented (logging etc). On the other hand, on typical server hardware, when the active dataset stays more or less in shared buffers, the performance penalty will be very minor and not noticeable.
To get a better picture, I ran 5 different types of “pgbench” workloads (avg. of 3 runs of 10min each, 4 concurrent clients) and the results can be seen below. To conjure up a “typical workload” I’m using here pgbench in “–skip-updates” mode, giving us basically 1:3 read-write ratio.
Hardware: AWS i2.xlarge (4 vCPUs, 2.5 GHz, Intel Xeon E5-2670v2, 30.5 GiB memory, 1 x 800 GB SSD XFS-formatted)
Workload | Without encryption | With encryption | Penalty |
Bulk insert (pgbench init on scale 200, ~2.6GB of data) | 32s | 75s | 134% |
Read only from shared_buffers | 21657 TPS | 21462 TPS | 0.9% |
Read-write (1:3 ratio) fitting into shared buffers | 3600 TPS | 3154 TPS | 12% |
Read-only not fitting into shared buffers | 19876 TPS | 12328 TPS | 38% |
Read-write (1:3 ratio) not fitting into shared buffers | 3418 TPS | 2685 TPS | 21% |
As we can see, the picture is pretty non-linear here, with performance-difference jumping from 1% to 134% for different tests, with an average of 16% for a typical business application and with an average penalty <10% when things fit into shared buffers. This all means that encryption performance is very workload specific and sensitive against the amount of pages moved between shared buffers and disk (cache ratio), as well as pure CPU power for the number crunching, so it’s quite difficult to conclude something final here besides stating that performance decreases for typical use cases is very acceptable in my opinion, with some room for future optimizations (starting to use AES-NI i.e. hardware accelerated encryption instructions supported by all modern CPUs).
All in all, Postgres now has a new and viable encryption option available, and in some environments it could well be the simplest approach for securing your data. As always, we would appreciate your feedback and questions! Download here.