November 2016

By Kaarel Moppel - Look at the processes list on a server where Postgres is running, and you'll see two writer processes among other background and client processes.

PgSQL
krl@e5470:~$ ps -efH | grep postgres
 
postgres   662     1  0 11:47 ?        00:00:01   /usr/lib/postgresql/9.5/bin/postgres -D 
/var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf
 
...
 
postgres   666   662  0 11:47 ?        00:00:00     postgres: writer process
 
postgres   667   662  0 11:47 ?        00:00:00     postgres: wal writer process
 
...

PgSQL

krl@e5470:~$ ps -efH | grep postgres

postgres 662 1 0 11:47 ? 00:00:01 /usr/lib/postgresql/9.5/bin/postgres -D

/var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf

...

postgres 666 662 0 11:47 ? 00:00:00 postgres: writer process

postgres 667 662 0 11:47 ? 00:00:00 postgres: wal writer process

...

One process is more specifically named wal writer. It's still confusing that there are 2 similarly named writers. What do they do? From my experiences from training, this similar naming always causes a lot of confusion. So I thought it wouldn’t hurt to lay out the working principles of these processes again in simple terms. I'll describe them, together with the pertinent server configuration parameters.

The writer a.k.a. background writer process

Note that we assume that the high level concept of “checkpoints” together with the checkpointer process and its parameters are already familiar to you (as they have far more effect, compared to the writers). When not, I’d recommend digging into the Postgres documentation here.

So to the writer. The introductory sentence in the documentation tells us:

There is a separate server process called the background writer, whose function is to issue writes of "dirty" (new or modified) shared buffers. It writes shared buffers so server processes handling user queries seldom or never need to wait for a write to occur. However, the background writer does cause a net overall increase in I/O load, because while a repeatedly-dirtied page might otherwise be written only once per checkpoint interval, the background writer might write it several times as it is dirtied in the same interval. ...

In short - the writer moves some of the changed data (dirty buffers) already to the disk in the background, so that checkpoint process, happening at regular intervals, would have less work to do. All of this with the point that in the end user/application queries wouldn’t need to suffer too much when checkpointer kicks in with its heavy IO requirements, when there are lots of buffers to be processed or checkpoint_completion_target is set too small. All this is relevant of course only when we’re running a relatively busy database - for idling databases it wouldn’t be a problem at all.

A small gotcha

But did you also catch the possible pitfall from the wording? Basically it warns you about the following – repeatedly dirtied page might be written to disk multiple times during one checkpoint interval, causing unnecessary extra IO load! So here, one should thoroughly think about the database workload first, before tuning any relevant parameters to offload the checkpointer. Typical problem scenario would be for example where you have some “counter” like tables that get incremented for every page view or event, resulting in writer re-writing the same page many-many times during one checkpoint cycle. Due to this possible pitfall Postgres default parameters for the writer are also pretty non-intrusive. So my advice is to generally let them stay as they are, but nevertheless a short “translation” for the parameters visible from documentation:

bgwriter_delay [200ms by default , 10ms – 10s possible] – time to sleep after cleaning up a maximum of bgwriter_lru_maxpages (see below) dirty pages.

bgwriter_lru_maxpages [100 pages by default, 0 – 1000 possible] – maximum amount of pages cleaned per one writer activity round.

bgwriter_lru_multiplier [2.0 by default, 0-10.0 possible] – multiplier ratio determining how many pages should be cleaned for every incoming dirty page, based on counts from last delay periods. Everything > 1.0 means we try to clean more pages than actually dirtied, up to the “maxpages” limit.

From those default settings, we can also calculate the maximum amount of bytes per second that the writer can clean. 200ms delay translates to 5 times per second, multiplied with 100 8KB pages gives us ~ 4MB. Pretty conservative indeed! But due to good reasoning as mentioned previously.

krl@postgres=# select pg_size_pretty(100 * 5 * 8192::numeric);

pg_size_pretty

────────────────

4000 kB

(1 row)

krl@postgres=# select pg_size_pretty(100 * 5 * 8192::numeric);

pg_size_pretty

────────────────

4000 kB

(1 row)

The “wal writer” process

Now what’s the deal here? This is a simpler story actually – first, wal writer has only meaning when “synchronous_commit” is set to “off”, with default being “on”, so most setups can already ignore it. More about “synchronous_commit” and it’s different values can be read from one of our recent blogposts here.

But in short - with “synchronous_commit=off” (generally used as a performance crutch) Postgres commits are managed by this wal writer process. They are actually not flushed directly to the transaction log (a.k.a. WAL or XLog), but rather just sent to the operating system and fsync (i.e. real commit, guaranteeing durability) is requested only after the “wal_writer_delay” period has passed from the last real commit. The default value is here 200ms again, same as “bgwriter_delay”. One could increase this parameter if needed, but please take into account the friendly warning from the docs, that in worst case the delay could triple. Only set it the number of seconds/milliseconds (1ms – 10s range available) that you’re willing to lose data in case of a crash. (You needn't fear of data corruption here, though.) This is what setting “synchronous_commit=off” means. Hope it helps!

In version 9.6 PostgreSQL introduced parallel queries. The ability to use more than just one CPU core per query is a giant leap forward and has made PostgreSQL an even more desirable database. With parallel queries many workloads can be sped up considerably. However, in this article, I want to point out a small missing feature, which bugged one of our support clients recently: lack of support for SERIALIZABLE. UPDATE Sept. 2023: This restriction was lifted as of v12, see this note in the documentation.

Consider the following example:

test=# CREATE TABLE t_demo AS
SELECT         id
FROM           generate_series(1, 10000000) AS id;

SELECT 10000000

test=# CREATE TABLE t_demo AS

SELECT id

FROM generate_series(1, 10000000) AS id;

SELECT 10000000

The sample table contains 10 million rows.

To tell the optimizer about the content of the table, running ANALYZE seems like a good idea:

test=# ANALYZE t_demo;
ANALYZE

1 2	test=# ANALYZE t_demo; ANALYZE

Then users can tell PostgreSQL about the maximum number of cores allowed for the query we are about to run:

test=# SET max_parallel_workers_per_gather TO 4;
SET

1 2	test=# SET max_parallel_workers_per_gather TO 4; SET

After that we can already run a simple aggregation:

test=# explain SELECT count(*) FROM t_demo;
                                        QUERY PLAN
-------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=76498.57..76498.58 rows=1 width=8)
  ->  Gather  (cost=76498.15..76498.56 rows=4 width=8)
      Workers Planned: 4
      ->  Partial Aggregate  (cost=75498.15..75498.16 rows=1 width=8)
      ->  Parallel Seq Scan on t_demo  (cost=0.00..69248.12 rows=2500012 width=0)
(5 rows)

test=# explain SELECT count(*) FROM t_demo;

QUERY PLAN

-------------------------------------------------------------------------------------------

Finalize Aggregate (cost=76498.57..76498.58 rows=1 width=8)

-> Gather (cost=76498.15..76498.56 rows=4 width=8)

Workers Planned: 4

-> Partial Aggregate (cost=75498.15..75498.16 rows=1 width=8)

-> Parallel Seq Scan on t_demo (cost=0.00..69248.12 rows=2500012 width=0)

(5 rows)

PostgreSQL will execute a so called parallel sequential scan using 4 CPU cores (= worker processes). The system will scale nicely and significantly better execution times can be observed.

SERIALIZABLE and parallelism

We have seen a couple of people use SERIALIZABLE transactions for analytical requests. While this might not be a good idea for other reasons, there are also implications when it comes to the use of parallel queries. At this point (as of PostgreSQL 9.6) there is some code in PostgreSQL which is not yet fully parallel safe. Therefore parallel queries are not yet supported if you happen to use SERIALIZABLE. If you are affected by this, consider using REPEATABLE READ.

Here is what happens:

test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN
test=# explain SELECT count(*) FROM t_demo;
                               QUERY PLAN
------------------------------------------------------------------------
 Aggregate  (cost=169248.60..169248.61 rows=1 width=8)
  ->  Seq Scan on t_demo  (cost=0.00..144248.48 rows=10000048 width=0)
(2 rows)

test=# COMMIT;
COMMIT

test=# BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

BEGIN

test=# explain SELECT count(*) FROM t_demo;

QUERY PLAN

------------------------------------------------------------------------

Aggregate (cost=169248.60..169248.61 rows=1 width=8)

-> Seq Scan on t_demo (cost=0.00..144248.48 rows=10000048 width=0)

(2 rows)

test=# COMMIT;

COMMIT

As you can see, only a single core is used.

The code inside the query planner (planner.c) is very clear about this:

We can't use parallelism in serializable mode because the predicate locking code is not parallel-aware. It's not catastrophic if someone tries to run a parallel plan in serializable mode; it just won't get any workers and will run serially. But it seems like a good heuristic to assume that the same serialization level will be in effect at plan time and execution time, so don't generate a parallel plan if we're in serializable mode.

Future versions of PostgreSQL will surely relax this at some point so that all isolation levels can benefit from parallel queries.

UPDATE:

This limitation was removed as of PostgreSQL v12: see the documentation on "Allow parallelized queries when in SERIALIZABLE isolation mode (Thomas Munro)" for more information.

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Facebook or LinkedIn.

By Kaarel Moppel - During the last training session, a curious participant asked if there’s a way to easily expose the PostgreSQL database logs to users - and indeed, there’s a pretty neat way for SQL-based access! So this time, I'll give you a quick demo on that. The approach, taking advantage of the File Foreign Data Wrapper extension, is actually even brought out in the official docus, but still not too well-known, although mentioned as an “obvious use case” 😉 I must say that this approach is best suited for development setups, as under normal circumstances you would most probably want to keep the lid on your logs.

Setup steps

1.

First you need to change the server configuration (postgresql.conf) and enable CSV logging as described in detail here. This might result in some overhead on busy systems, as compared to ‘sysout’, as all the “columns” or info that Postgres has on the logged event is logged, especially problematic with log_statement = 'all'.

# set in postgresql.conf + restart/reload server (chaning logging_collector needs restart)
log_destination = 'csvlog'
logging_collector = on
log_filename = 'postgresql.log'	# NB! File will grow indefinitely and needs external truncating
					# Also note that file extension needs to stay .log

# set in postgresql.conf + restart/reload server (chaning logging_collector needs restart)

log_destination = 'csvlog'

logging_collector = on

log_filename = 'postgresql.log' # NB! File will grow indefinitely and needs external truncating

# Also note that file extension needs to stay .log

2.

Install the “file_fdw” extension (“contrib” package needed) and create a foreign file server and a foreign table, linking to our above configured log file name.

CREATE EXTENSION file_fdw;

CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE pglog (
  log_time timestamp(3) with time zone,
  user_name text,
  database_name text,
  process_id integer,
  connection_from text,
  session_id text,
  session_line_num bigint,
  command_tag text,
  session_start_time timestamp with time zone,
  virtual_transaction_id text,
  transaction_id bigint,
  error_severity text,
  sql_state_code text,
  message text,
  detail text,
  hint text,
  internal_query text,
  internal_query_pos integer,
  context text,
  query text,
  query_pos integer,
  location text,
  application_name text
) SERVER pglog
OPTIONS ( filename 'pg_log/postgresql.csv', format 'csv' );

CREATE EXTENSION file_fdw;

CREATE SERVER pglog FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE pglog (

log_time timestamp(3) with time zone,

user_name text,

database_name text,

process_id integer,

connection_from text,

session_id text,

session_line_num bigint,

command_tag text,

session_start_time timestamp with time zone,

virtual_transaction_id text,

transaction_id bigint,

error_severity text,

sql_state_code text,

message text,

detail text,

hint text,

internal_query text,

internal_query_pos integer,

context text,

query text,

query_pos integer,

location text,

application_name text

) SERVER pglog

OPTIONS ( filename 'pg_log/postgresql.csv', format 'csv' );

3.

Grant access as needed, or if you want that every user can see only his/her own entries, bring views into play, with security_barrier set when security matters. For 9.5+ servers one could even use the flashy Row Level Security mechanisms to set up some more obscure row visibility rules. The downside is that you need to set up a parent-child relationship then, as RLS cannot work with the “virtual” table directly.

GRANT SELECT ON pglog TO public;

# or the view based way

CREATE OR REPLACE view v_pglog WITH (security_barrier=true) AS SELECT * FROM pglog WHERE 
user_name = current_user;
GRANT SELECT ON v_pglog TO public;

GRANT SELECT ON pglog TO public;

# or the view based way

CREATE OR REPLACE view v_pglog WITH (security_barrier=true) AS SELECT * FROM pglog WHERE

user_name = current_user;

GRANT SELECT ON v_pglog TO public;

4.

And another additional idea - a handy way to expose and physically keep around (automatic truncation) only 7 days of logs is to define 7 child tables for a master one. Process would then look something like that:

# set in postgresql.conf + reload conf
log_filename = 'postgresql-%a.csv'	# Keep 7d of logs in files ‘postgresql-Mon.csv’ etc.

CREATE TABLE pglog (
  log_time timestamp(3) with time zone,
  ...
  application_name text
);

CREATE FOREIGN TABLE pglog_mon (
    …
    )  SERVER pglog
    OPTIONS ( filename 'pg_log/postgresql-Mon.csv', format 'csv' );
ALTER TABLE  pglog_mon INHERIT  pglog;

# rinse and repeat for Tue...Sun

# set in postgresql.conf + reload conf

log_filename = 'postgresql-%a.csv' # Keep 7d of logs in files ‘postgresql-Mon.csv’ etc.

CREATE TABLE pglog (

log_time timestamp(3) with time zone,

...

application_name text

);

CREATE FOREIGN TABLE pglog_mon (

…

) SERVER pglog

OPTIONS ( filename 'pg_log/postgresql-Mon.csv', format 'csv' );

ALTER TABLE pglog_mon INHERIT pglog;

# rinse and repeat for Tue...Sun

Not a "one size fits all" solution

The only problem with the approach I've laid out is that it might not be a perfect fit if you need relatively frequent monitoring queries on the logs, since queries need to read through all of the logfiles every single time.

We can see this via EXPLAIN:

krl@postgres=# explain select * from pglog where error_severity in 
('ERROR', 'FATAL') order by log_time desc limit 1;

                                 QUERY PLAN                                  
─────────────────────────────────────────────────────────────────────────────
 Limit  (cost=13587.31..13587.31 rows=1 width=556)
   ->  Sort  (cost=13587.31..13589.27 rows=782 width=556)
         Sort Key: log_time DESC
         ->  Foreign Scan on pglog  (cost=0.00..13583.40 rows=782 width=556)
               Filter: (error_severity = ANY ('{ERROR,FATAL}'::text[]))
               Foreign File: pg_log/postgresql.csv
               Foreign File Size: 45645532
(7 rows)

krl@postgres=# explain select * from pglog where error_severity in

('ERROR', 'FATAL') order by log_time desc limit 1;

QUERY PLAN

─────────────────────────────────────────────────────────────────────────────

Limit (cost=13587.31..13587.31 rows=1 width=556)

-> Sort (cost=13587.31..13589.27 rows=782 width=556)

Sort Key: log_time DESC

-> Foreign Scan on pglog (cost=0.00..13583.40 rows=782 width=556)

Filter: (error_severity = ANY ('{ERROR,FATAL}'::text[]))

Foreign File: pg_log/postgresql.csv

Foreign File Size: 45645532

(7 rows)

In such cases, a typical approach would be to write some kind of simple logs importing Python cronjob that scans and parses the CSV logfiles and inserts entries into an actual table (typically on a dedicated logging database), where the “log_time” column could be indexed for better performance. Or another direction (if you’re not super worried about privacy) would be to use a 3rd party SaaS provider like Loggly or Scalyr, which have log exporting means available.

There are a few different ways to implement database encryption - commonly on the operating system, filesystem, file or column level. That leaves out transport level encryption, which has been supported since 15 years. Each of these approaches counters a different threat model, and one can easily imagine that in the case of databases, where the systems were originally not designed with encryption in mind, it is not exactly easy to agree on a certain way of doing things - and then there would be a lot of technical pitfalls on the way. But today, after a lot of work in co-operation with a client, we’re really glad to announce that our PostgreSQL instance-level encryption patch mentioned some months ago is now ready for download and use!

How does instance level encryption work?

The high-level idea behind the patch is to store all the files which make up a PostgreSQL cluster on disk in encrypted format (data-at-rest encryption) and then decrypt blocks, as they are read from disk into shared buffers. As soon as a block is written out to the disk from shared buffers again, it will be encrypted automatically. This requires: firstly, that the database is initialized (initdb) with encryption in mind, and secondly, that during startup, the server needs access to the encryption-key. The encryption key can be provided in two ways – through an environment variable or through a new “pgcrypto.keysetup_command” parameter. Note that the entire instance is encrypted, so in some sense this implementation is comparable with filesystem or partition-level encryption.

Sample setup for instance level encryption

1. Build the Postgres code with the patch applied, also, install “contrib”.

2. Initialize the cluster by setting the encryption key and then calling initdb

read -sp "Postgres passphrase: " PGENCRYPTIONKEY
 
export PGENCRYPTIONKEY=$PGENCRYPTIONKEY
 
initdb –data-encryption pgcrypto --data-checksums -D cryptotest

read -sp "Postgres passphrase: " PGENCRYPTIONKEY

export PGENCRYPTIONKEY=$PGENCRYPTIONKEY

initdb –data-encryption pgcrypto --data-checksums -D cryptotest

3. Starting the server

3.1. Set the PGENCRYPTIONKEY environment variable before starting the server (in our case it’s already there but not so after opening a new console)

export PGENCRYPTIONKEY=topsecret pg_ctl -D cryptotest start

1	export PGENCRYPTIONKEY=topsecret pg_ctl -D cryptotest start

3.2. Optionally, instead of feeding the encryption key from the environment variable, one could implement a custom and more secure key reading procedure via the new “pgcrypto.keysetup_command” postgresql.conf parameter. This parameter will be executed as a Bash command by the Postgres server process. It needs to return a string in the format of “encryptionkey=...” with a key part of 64 hex characters. In this way, pg_ctl can be used as it normally would.

Details

For those who are more technically interested - the encryption algorithm used is 128-bit AES in XTS mode, sometimes called also XTS-AES. It’s a block cipher with a “tweak” for extra security and basically military-grade standard stuff so you shouldn’t worry about the mechanics too much but rather how to keep your encryption-key safe and how to safely feed it to Postgres when the server starts up. The key needs to be provided to the server during every startup and when it doesn’t match (a known value kept encrypted in the control file - visible under “Data encryption fingerprint” when calling out pg_controldata - will be decrypted and compared) the server will refuse to start.

More or less everything will be encrypted – heap files (tables, indexes, sequences), xlog (as they also contain data), clog, temporary files being generated during execution of a larger query.

The main encryption/decryption code itself is placed into the existing “pgcrypto” module, extending it with functions like “pgcrypto_encrypt_block” and “pgcrypto_decrypt_block” and adding some other setup functions but also the storage manager and more than a dozen other files had to be changed.

Expectations for performance

Naturally, one agrees to compromise on performance when going for encryption, as there are no free lunches. Here things are a bit fuzzy - one can expect a very considerable performance hit if your workload is IO-oriented (logging etc). On the other hand, on typical server hardware, when the active dataset stays more or less in shared buffers, the performance penalty will be very minor and not noticeable.

To get a better picture, I ran 5 different types of “pgbench” workloads (avg. of 3 runs of 10min each, 4 concurrent clients) and the results can be seen below. To conjure up a “typical workload” I’m using here pgbench in “--skip-updates” mode, giving us basically 1:3 read-write ratio.

Hardware: AWS i2.xlarge (4 vCPUs, 2.5 GHz, Intel Xeon E5-2670v2, 30.5 GiB memory, 1 x 800 GB SSD XFS-formatted)

Workload	Without encryption	With encryption	Penalty
Bulk insert (pgbench init on scale 200, ~2.6GB of data)	32s	75s	134%
Read only from shared_buffers	21657 TPS	21462 TPS	0.9%
Read-write (1:3 ratio) fitting into shared buffers	3600 TPS	3154 TPS	12%
Read-only not fitting into shared buffers	19876 TPS	12328 TPS	38%
Read-write (1:3 ratio) not fitting into shared buffers	3418 TPS	2685 TPS	21%

Performance difference

As we can see, the picture is pretty non-linear here, with the performance difference jumping from 1% to 134% for different tests, with an average of 16% for a typical business application and with an average penalty <10% when things fit into shared buffers. This all means that encryption performance is very workload specific and sensitive against the amount of pages moved between shared buffers and disk (cache ratio), as well as pure CPU power for the number crunching, so it’s quite difficult to conclude something final here besides stating that performance decreases for typical use cases is very acceptable in my opinion, with some room for future optimizations (starting to use AES-NI i.e. hardware accelerated encryption instructions supported by all modern CPUs).

All in all, Postgres now has a new and viable encryption option available, and in some environments it could well be the simplest approach for securing your data. As always, we would appreciate your feedback and questions! Download here.

In case you need any assistance, please feel free to contact us.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.