© Laurenz Albe 2020
In my daily work, I see many databases with a bad setting for
max_connections. There is little advice out there for setting this parameter correctly, even though it is vital for the health of a database. So I decided to write up something.
According to the documentation,
max_connections determines the maximum number of concurrent connections to the database server. But don’t forget that
superuser_reserved_connections of these connections are for superusers only (so that superusers can connect even if all other connection slots are blocked). With the default setting of 100 and 3 for these parameters, there are 97 connections open for application purposes.
Since PostgreSQL reserves shared memory for all possible connections at server start, you have to restart PostgreSQL after changing
Why do people choose high values for
There are several reasons:
- Since you need a restart to increase
max_connections, people want to be “on the safe side”.
- People got the error
ERROR: remaining connection slots are reserved for non-replication superuser connections
and raised the limit so that they don’t get the error anymore.
- The application developers convinced the DBA that they need many database connections for best performance.
It seems like an easy enough solution to increase the number of connections.
What are the problems of high values for
max_connections to a high value can have severe consequences:
Risk of overloading the database
As long as all but three of your 500 database sessions are idle, not much harm is done. Perhaps taking the snapshot at the beginning of each query is a little slower, but you probably won’t notice that.
But there is nothing that can prevent 100 of the connections from becoming active at the same time. Perhaps a particular event occurred (everybody wants to buy a ticket the minute after sales started). Perhaps a rogue statement kept a lock too long and processes ready to run “piled up” behind the lock.
If that happens, your CPU and/or I/O subsystem will be overloaded. The CPU will be busy switching between the processes or waiting for I/O from the storage subsystem, and none of your database sessions will make much progress. The whole system can “melt down” and become unresponsive, so that all you can do is reboot.
Obviously, you don’t want to get into that situation, and it is not going to improve application performance either.
Insufficient resources for each database connection
There is a limited amount of RAM in your machine. If you allow more connections, you can allot less RAM to each connection. Otherwise, you run the danger of running out of memory. The private memory available for each operation during query execution is limited by
work_mem. So you will have to set this parameter low if you use a high value for
work_mem has a direct influence on query performance: sorting will be faster if it can use enough RAM, or PostgreSQL may prefer a faster hash join or hash aggregate and avoid a sort at all.
max_connections high will make queries perform slower than they could, unless you want to risk running out of memory.
An upper limit for
You want to utilize your resources without overloading the machine. So your setting should satisfy
max_connections < max(num_cores, parallel_io_limit) / (session_busy_ratio * avg_parallelism)
num_coresis the number of cores available
parallel_io_limitis the number of concurrent I/O requests your storage subsystem can handle
session_busy_ratiois the fraction of time that the connection is active executing a statement in the database
avg_parallelismis the average number of backend processes working on a single query.
The first two numbers are easy to obtain, and you usually have a good idea if you will use a lot of parallel query or not. But
session_busy_ratio can be tricky to estimate. If your workload consists of big analytical queries,
session_busy_ratio can be up to 1, and
avg_parallelism will be one more than
max_parallel_workers_per_gather. If your workload consists of many short statements,
session_busy_ratio can be close to 0, and
avg_parallelism is 1.
Often you will have to use a load test to determine the best value of
max_connections by experiment. The aim is to keep the database busy, but not overloaded.
For some workloads, this formula or an experiment would lead to a high setting for
max_connections. But you still shouldn’t do that, as stated above: the more connections you have, the bigger the danger of many connections suddenly becoming active and overloading the machine. The better solution in this case is to use a connection pool to increase
Using a connection pool
A connection pool is a piece of software that keeps a number of persistent database connections open. It uses these connections to handle database requests from the front-end. There are two kinds of connection pools:
- Connection pools built into the application or the application server. Rather than opening a database connection, application processes request a connection from the pool. When they are done, they return it to the pool rather than closing the connection.
- External connection pools like pgBouncer. Such a connection pool looks like a like a database server to the front end. SQL statements from the application are executed over a limited number of backend connections to the database.
Connection pools provide an artificial bottleneck by limiting the number of active database sessions. Using them increases the
session_busy_ratio. This way, you can get by with a much lower setting for
max_connections without unduly limiting your application.
It is important for the health and performance of your application not to set
max_connections too high.
If you need to be able to handle many database connections due to architecture limitations (the application server has no connection pool, or there are too many application servers), use a connection pooler like pgBouncer. Then you can give each connection the resources it needs for good performance.