© Laurenz Albe 2023
This is a short war story from a customer problem. It serves as a warning that there are special considerations when running software in a Docker container.
The problem description
The customer is running PostgreSQL in Docker containers. They are not using the “official” image, but their own.
Sometimes, under conditions of high load, PostgreSQL crashes with
LOG: server process (PID 84799) was terminated by signal 13: Broken pipe LOG: terminating any other active server processes
This causes PostgreSQL to undergo crash recovery, during which the service is not available.
Why crash recovery?
SIGPIPE (signal 13 on Linux) is a rather harmless signal: the kernel sends that signal to a process that tries to write to a pipe if the process at the other end of the pipe no longer exists. Crash recovery seems like a somewhat excessive reaction to that. If you look at the log entry, the message level is
LOG and not
PANIC (an error condition that PostgreSQL cannot recover from).
The reason for this excessive reaction is that PostgreSQL does not expect a child process to die from signal 13. A careful scrutiny of the PostgreSQL code shows that all PostgreSQL processes ignore
SIGPIPE. So if one of these processes dies from that signal, something must be seriously out of order.
The role of the postmaster process
In PostgreSQL, the postmaster process (the parent of all server processes) listens for incoming connections and starts new server processes. It takes good care of its children: it respawns background processes that terminated in a controlled fashion, and it watches out for children that died from “unnatural causes”. Any such event is alarming, because all PostgreSQL processes use shared buffers, the shared memory segment that contains the authoritative copy of the table data. If a server process runs amok, it can scribble over these shared data and corrupt the database. Also, something could interrupt a server process in the middle of a “critical section” and leave the database in an inconsistent state. To prevent that from happening, the postmaster treats any irregular process termination as a sign of danger. Since shared buffers might be affected, the safe course is to interrupt processing and to restore consistency by performing crash recovery from the latest checkpoint.
(If this behavior strikes you as oversensitive, and you are less worried about data integrity, you might prefer more cavalier database systems like Oracle, where a server crash – euphemistically called ORA-00600 – does not trigger such a reaction.)
Hunting the rogue process in the Docker container
To understand and fix the problem, it was important to know which server process died from signal 13. All we knew is the process ID from the error message. We searched the log files for messages by this process, which is easy if you log the process ID with each entry. However, that process never left any trace in the log, even when we cranked up
An added difficulty was that the error condition could not be reproduced on demand. All that we could do is to increase the load on the system by starting a backup, in the hope that the problem would manifest.
The next idea was to take regular “
ps” snapshots in the hope to catch the offending process red-handed. The process remained elusive. Finally, the customer increased the frequency of those snapshots to one per second, and in the end we got a mug shot of our adversary.
The process turned out not to be a server process at all. Rather, it was a
psql process that gets started inside the container to run a monitoring query on the database. Now
psql is a client program that does not ignore
SIGPIPE, so that mystery is solved. But how can
psql be a PostgreSQL server process?
ps snapshot that helped solve the Docker problem
The snapshot in question like this:
PID STIME USER COMMAND 1 Mar13 postgres /usr/lib/postgresql/14/bin/postgres -D /postgresqldata/14 332 Mar13 postgres postgres: logger 84507 11:09 postgres postgres: checkpointer 84508 11:09 postgres postgres: background writer 84509 11:09 postgres postgres: walwriter 84510 11:09 postgres postgres: autovacuum launcher 84511 11:09 postgres postgres: archiver 84512 11:09 postgres postgres: stats collector 84513 11:09 postgres postgres: logical replication launcher 84532 11:09 postgres postgres: logrep_user mydb 10.0.0.42(36270) idle 84557 11:09 postgres postgres: postgres postgres 10.0.0.232(37434) idle 84756 11:10 postgres postgres: appuser mydb 10.0.0.12(38600) idle 84773 11:10 postgres postgres: appuser mydb 10.0.0.12(38610) idle 84799 11:10 root /usr/lib/postgresql/14/bin/psql -U postgres -t -c SELECT 1=1
The last line is the offending process, which is about to receive signal 13. This is very clearly not a server process; among other things, it is owned by the
root user instead of
postgres. Unfortunately, the snapshot does not include the parent process ID. However, since the postmaster (in the first line) recognized the rogue process as its child, it must be the parent.
Unplanned adoption in a Docker container
The key observation is that the process ID of the postmaster is 1. In Unix, process 1 is a special process: it is the first user land process that the kernel starts. This process then starts other processes to bring the system up. It is the ancestor of all other processes, and every other process has a parent process. There is another special property of process 1: if the parent process of a process dies, the kernel automatically assigns process 1 as parent to the orphaned process. Process 1 has to “adopt” all orphans.
Normally, process 1 is a special
init executable specifically designed for this purpose. But in a Docker container, process 1 is the process that you executed to start the container. As you can see, that was the postmaster. The postmaster handles one of the tasks of the
init process admirably: it waits for its children and collects the exit status when one of them dies. This keeps zombie processes from lingering for any length of time. However, the postmaster is less suited to handle another
init task: remain stoic if one of its children dies horribly. That is what caused our problem.
How can we avoid this problem?
Once we understand the problem, the solution is simple: don’t start the container with PostgreSQL. Rather, start a different process, which in turn starts the postmaster. Either write your own or use an existing solution like dumb-init. The official PostgreSQL docker image does it right.
This problem also could not have occurred if the
psql process hadn’t been started inside the container. It is good practice to consider a container running a service as a closed unit: you shouldn’t start jobs or interactive sessions inside the container. I can understand the appeal of using the container’s
psql executable to avoid having to install the PostgreSQL client anywhere else, but it is a shortcut that you shouldn’t take.
It turned out that the cause of our problem was that the postmaster served as process 1 in the Docker container. The
psql process that ran a monitoring query died from a
SIGPIPE under high load. The postmaster, which had inadvertently inherited that process, noticed this unusual process termination and underwent crash recovery to stay on the safe side.
While running a program in a Docker container is not very different from running it outside in most respects, there are some differences that you have to be aware of if you want your systems to run stably.
You can read more about PostgreSQL and Docker in this article, “Running Postgres in Docker Why and How”.