Do you want to kill a database connection? Or maybe you want to use kill -9
? On your PostgreSQL database server? Well, there are things you should be aware of before you do that. The question we want to answer in this blog therefore is: How do I kill PostgreSQL processes properly?
Table of Contents
Before we discuss process killing, it makes sense to first take a look at a simple PostgreSQL instance / cluster on the operating system level:
1 2 3 4 5 6 7 |
1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db 1200475 ? Ss 0:00 _ postgres: logger 1200476 ? Ss 0:00 _ postgres: checkpointer 1200477 ? Ss 0:00 _ postgres: background writer 1200479 ? Ss 0:00 _ postgres: walwriter 1200480 ? Ss 0:00 _ postgres: autovacuum launcher 1200481 ? Ss 0:00 _ postgres: logical replication launcher |
What we see here is the postmaster process and its children which together form the PostgreSQL core system infrastructure. Most processes in this infrastructure can be killed nicely without any problems:
[sql gutter="false"]
[hs@sample ~]$ kill 1200480
What we have just killed is the autovacuum launcher. That's not a problem at all, because the postmaster will instantly launch a new one: we can judge by the new process ID, which can be seen in the process table.
PostgreSQL will basically ensure that this thing is always running:
1 2 3 4 5 6 7 |
1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db 1200475 ? Ss 0:00 _ postgres: logger 1200476 ? Ss 0:00 _ postgres: checkpointer 1200477 ? Ss 0:00 _ postgres: background writer 1200479 ? Ss 0:00 _ postgres: walwriter 1200481 ? Ss 0:00 _ postgres: logical replication launcher 1200524 ? Ss 0:00 _ postgres: autovacuum launcher |
To understand this better, we have to understand what kill
does. It does not actually kill anything, it sends a signal to an operating system process. The name "kill" is still appropriate, since any signal that the target process does not explicitly handle will lead to its immediate termination.
Keep in mind that we have used a "normal" kill here (which sends a SIGTERM
signal). However, not all kills are equal. A "normal" kill can be caught and handled by the application. What has basically happened is that the autovacuum launcher process has caught the SIGTERM
signal and terminated itself properly. "Properly" means that all shared resources are left in a consistent, clean state and that no leftovers could make their way through the system to cause issues at some random place. However, a normal kill is not what kill -9
does. The problem is: "-9
" sends the SIGKILL
signal which the target process cannot catch and handle; it is simply brutally destroyed by the operating system, potentially leaving shared memory and other resources in an inconsistent state. This is an extremely important difference which must be kept in mind when killing processes.
-9
” wayBut what happens if we brutally kill a database connection? The last process is a standard connection to PostgreSQL. User “hs
” is connected to the postgres
database via a local IPv6 connection. The following listing has the details:
1 2 3 4 5 6 7 8 |
1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db 1200475 ? Ss 0:00 _ postgres: logger 1200476 ? Ss 0:00 _ postgres: checkpointer 1200477 ? Ss 0:00 _ postgres: background writer 1200479 ? Ss 0:00 _ postgres: walwriter 1200481 ? Ss 0:00 _ postgres: logical replication launcher 1200524 ? Ss 0:00 _ postgres: autovacuum launcher 1200623 ? Ss 0:00 _ postgres: hs postgres ::1(40498) idle |
The user connection is brutally eliminated, but look carefully and see what has happened:
1 2 3 4 5 6 7 8 9 |
[hs@sample ~]$ kill -9 1200623 [hs@sample ~]$ ps axf | grep post 1200474 ? Ss 0:00 /usr/pgsql-15/bin/postgres -D sample_db 1200475 ? Ss 0:00 _ postgres: logger 1200628 ? Ss 0:00 _ postgres: checkpointer 1200629 ? Ss 0:00 _ postgres: background writer 1200630 ? Ss 0:00 _ postgres: walwriter 1200631 ? Ss 0:00 _ postgres: autovacuum launcher 1200632 ? Ss 0:00 _ postgres: logical replication launcher |
Previously, I mentioned that a “-9
” sends a non-catchable signal which implies that the process cannot react to the kill. Therefore, the process might terminate at any point, which in turn means that we cannot rely on the fact that it left the database in a consistent state. The process might have left us in any state. Operations might have been terminated during a change in shared memory, or on disk, or anywhere else. That means that we cannot trust the content of the shared memory anymore. Some random blocks somewhere inside shared buffers might be corrupted, or something else might have happened which could lead to potential corruption further down the road.
The way PostgreSQL copes with this situation is to immediately drop all connections, clear out memory and initiate WAL recovery from the latest checkpoint. Basically, PostgreSQL does the same thing as it does during a “normal database crash”.
The main advantage is that you might lose a couple of database connections and you won’t face database corruption. However, if you “kill -9” on a busy system, you will potentially eliminate thousands of database connections unintentionally, and that might have unforeseen consequences. Therefore, it's usually not a good idea to run kill -9
in the first place. If you do, it's important to understand what's actually happening behind the scenes and to understand the implications.
See also:
In case you need any assistance, please feel free to contact us.
+43 (0) 2622 93022-0
office@cybertec.at
You are currently viewing a placeholder content from Facebook. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More InformationYou are currently viewing a placeholder content from X. To access the actual content, click the button below. Please note that doing so will share data with third-party providers.
More Information
Leave a Reply