CYBERTEC Logo

Docker and sudden death for PostgreSQL

05.2023 / Category: / Tags: |
PostgreSQL complaining about a child process it got from Docker
© Laurenz Albe 2023

 

This is a short war story from a customer problem. It serves as a warning that there are special considerations when running software in a Docker container.

The problem description

The customer is running PostgreSQL in Docker containers. They are not using the “official” image, but their own.

Sometimes, under conditions of high load, PostgreSQL crashes with

This causes PostgreSQL to undergo crash recovery, during which the service is not available.

Why crash recovery?

SIGPIPE (signal 13 on Linux) is a rather harmless signal: the kernel sends that signal to a process that tries to write to a pipe if the process at the other end of the pipe no longer exists. Crash recovery seems like a somewhat excessive reaction to that. If you look at the log entry, the message level is LOG and not PANIC (an error condition that PostgreSQL cannot recover from).

The reason for this excessive reaction is that PostgreSQL does not expect a child process to die from signal 13. A careful scrutiny of the PostgreSQL code shows that all PostgreSQL processes ignore SIGPIPE. So if one of these processes dies from that signal, something must be seriously out of order.

The role of the postmaster process

In PostgreSQL, the postmaster process (the parent of all server processes) listens for incoming connections and starts new server processes. It takes good care of its children: it respawns background processes that terminated in a controlled fashion, and it watches out for children that died from “unnatural causes”. Any such event is alarming, because all PostgreSQL processes use shared buffers, the shared memory segment that contains the authoritative copy of the table data. If a server process runs amok, it can scribble over these shared data and corrupt the database. Also, something could interrupt a server process in the middle of a “critical section” and leave the database in an inconsistent state. To prevent that from happening, the postmaster treats any irregular process termination as a sign of danger. Since shared buffers might be affected, the safe course is to interrupt processing and to restore consistency by performing crash recovery from the latest checkpoint.

(If this behavior strikes you as oversensitive, and you are less worried about data integrity, you might prefer more cavalier database systems like Oracle, where a server crash – euphemistically called ORA-00600 – does not trigger such a reaction.)

Hunting the rogue process in the Docker container

To understand and fix the problem, it was important to know which server process died from signal 13. All we knew is the process ID from the error message. We searched the log files for messages by this process, which is easy if you log the process ID with each entry. However, that process never left any trace in the log, even when we cranked up log_min_messages to debug3.

An added difficulty was that the error condition could not be reproduced on demand. All that we could do is to increase the load on the system by starting a backup, in the hope that the problem would manifest.

The next idea was to take regular “ps” snapshots in the hope to catch the offending process red-handed. The process remained elusive. Finally, the customer increased the frequency of those snapshots to one per second, and in the end we got a mug shot of our adversary.

The process turned out not to be a server process at all. Rather, it was a psql process that gets started inside the container to run a monitoring query on the database. Now psql is a client program that does not ignore SIGPIPE, so that mystery is solved. But how can psql be a PostgreSQL server process?

The ps snapshot that helped solve the Docker problem

The snapshot in question like this:

The last line is the offending process, which is about to receive signal 13. This is very clearly not a server process; among other things, it is owned by the root user instead of postgres. Unfortunately, the snapshot does not include the parent process ID. However, since the postmaster (in the first line) recognized the rogue process as its child, it must be the parent.

Unplanned adoption in a Docker container

The key observation is that the process ID of the postmaster is 1. In Unix, process 1 is a special process: it is the first user land process that the kernel starts. This process then starts other processes to bring the system up. It is the ancestor of all other processes, and every other process has a parent process. There is another special property of process 1: if the parent process of a process dies, the kernel automatically assigns process 1 as parent to the orphaned process. Process 1 has to “adopt” all orphans.

Normally, process 1 is a special init executable specifically designed for this purpose. But in a Docker container, process 1 is the process that you executed to start the container. As you can see, that was the postmaster. The postmaster handles one of the tasks of the init process admirably: it waits for its children and collects the exit status when one of them dies. This keeps zombie processes from lingering for any length of time. However, the postmaster is less suited to handle another init task: remain stoic if one of its children dies horribly. That is what caused our problem.

How can we avoid this problem?

Once we understand the problem, the solution is simple: don't start the container with PostgreSQL. Rather, start a different process, which in turn starts the postmaster. Either write your own or use an existing solution like dumb-init. The official PostgreSQL docker image does it right.

This problem also could not have occurred if the psql process hadn't been started inside the container. It is good practice to consider a container running a service as a closed unit: you shouldn't start jobs or interactive sessions inside the container. I can understand the appeal of using the container's psql executable to avoid having to install the PostgreSQL client anywhere else, but it is a shortcut that you shouldn't take.

Conclusion

It turned out that the cause of our problem was that the postmaster served as process 1 in the Docker container. The psql process that ran a monitoring query died from a SIGPIPE under high load. The postmaster, which had inadvertently inherited that process, noticed this unusual process termination and underwent crash recovery to stay on the safe side.

While running a program in a Docker container is not very different from running it outside in most respects, there are some differences that you have to be aware of if you want your systems to run stably.

You can read more about PostgreSQL and Docker in this article, "Running Postgres in Docker Why and How".

5 2 votes
Article Rating
Subscribe
Notify of
guest
4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ludovico Caldara
Ludovico Caldara
1 year ago

If [...] you are less worried about data integrity, you might prefer more cavalier database systems like Oracle

By reading this sentence, one might think that Oracle is more leaning to data inconsistency. That is not the case.
Oracle has mechanisms to ensure that memory changes left by crashed sessions are cleaned up successfully before other sessions can touch the same data.
May I suggest to remove the sentence or change it to something more technically correct?
Feel free to contact me if you have questions about it (I work for Oracle).

laurenz
laurenz
1 year ago

I admit that I have somewhat hard feelings against Oracle when it comes to data integrity. Cases in point are their willingness to accept corrupt strings if client encoding = server encoding or "invalid views", but this is of course a matter of taste. So I may have been slightly unfair.
However, I doubt your assertion: if a server process encounters a bad pointer (software bug, faulty memory, memory overrun caused by the user, ...), it cannot be excluded that it writes over a section of shared memory (SGA) it has no business to modify. You may claim otherwise, but as long as I cannot see Oracle source code, we cannot really settle that question.

Ludovico Caldara
Ludovico Caldara
1 year ago

It looks that you are OK in spreading uncertainty based on what "cannot be excluded". If that's your intention, I can't do anything against it. At least I tried 🙂

laurenz
laurenz
1 year ago

Fair enough. In a half-hearted attempt at being fair, let me state clearly that I do not think that it is very likely that a crashing Oracle server process will corrupt your database.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle
    4
    0
    Would love your thoughts, please comment.x
    ()
    x
    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram