Estimating connection pool size with PostgreSQL database statistics

02.2021 | Category: How To | Tags: administration, statistics

02.2021

Category: How To

Sizing a connection pool the wrong way — © Laurenz Albe 2021

Table of Contents

PostgreSQL v14 has new connection statistics in pg_stat_database. In this article, I want to explore one application for them: estimating the correct size for a connection pool.

New connection statistics in v14

Commit 960869da080 introduced some new statistics to pg_stat_database:

session_time: total time spent by sessions in the database
active_time: time spent executing SQL statements in the database
idle_in_transaction_time: time spent idle in a transaction in the database
sessions: total number of sessions established to the database
sessions_abandoned: number of sessions to the database that were terminated by lost client connections
sessions_fatal: number of sessions to the database terminated by fatal errors
sessions_killed: number of sessions to the database killed by an operator

Some applications are obvious: for example, you may want to keep the number of fatal errors or operator interventions low, or you may want to fix your application to properly close database connections.

But I want to show you another, less obvious, application for these statistics.

Connection pooling modes

In the following, I assume that the application connects to the database with a single database user. That means that there will be one connection pool per database.

If you use session level connection pooling, a client gets a session from the pool for the whole duration of the database session. In this case, sizing the connection pool is simple: the pool has to be large enough to accommodate the maximum number of concurrent sessions. The number of connections to a database is available in numbackends in pg_stat_database, and many monitoring systems capture this value. Session level pooling is primarily useful if the client's database connections are short.

However, it is normally more efficient to use transaction level connection pooling. In this mode, a client gets a session only for the duration of a database transaction. The advantage is that multiple clients can share the same pooled session. Sharing reduces the required number of connections in the pool. Having few database connections is a good thing, because it reduces contention inside the database and limits the danger of overloading the CPU or storage subsystem.

Of course, transaction level pooling makes it more difficult to determine the correct size of the connection pool.

The ideal connection pool size

The ideal size for a connection pool is

large enough that no client has to wait for a connection during normal load
as small as possible within the above limit

In my article about max_connections, I established the following upper limit for the number of connections:

connections < min(num_cores, parallel_io_limit) /
              (session_busy_ratio * avg_parallelism)

1 2	connections < min(num_cores, parallel_io_limit) / (session_busy_ratio * avg_parallelism)

Where

num_cores is the number of cores available
parallel_io_limit is the number of concurrent I/O requests your storage subsystem can handle
session_busy_ratio is the fraction of time that the connection is active executing a statement in the database
avg_parallelism is the average number of backend processes working on a single query.

Using the new statistics to find an upper limit for the connection pool size

Now all these numbers are easy to determine – that is, all except for session_busy_ratio.

With the new database statistics, that task becomes trivial:

SELECT datname,
       active_time /
       (active_time + idle_in_transaction_time) AS session_busy_ratio
FROM pg_stat_database
WHERE active_time > 0;

1

2

3

4

5

SELECT datname,

active_time /

(active_time + idle_in_transaction_time) AS session_busy_ratio

FROM pg_stat_database

WHERE active_time > 0;

Conclusion

The new database statistics in PostgreSQL v14 make it easier to get an estimate for the safe upper limit for the size of a connection pool. To learn more about connection pools and authentication, see my post on pgbouncer.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

3 responses to “Estimating connection pool size with PostgreSQL database statistics”

Fabrice says:

November 9, 2022 at 6:13 pm

I I followed your talk on connections pooling at pgconfEU in Berlin (1000 applications user). There is a point in a slide that I did not understand: many open session with idle status on postgresql instance could slow down the “snapshot” created for each SQL statement. How could you explain that?

Reply
- laurenz says:
  
  November 9, 2022 at 10:40 pm
  
  Because it has to iterate through the array of processes. See the transam README for a description. For the implementation (including enlightening comments) see GetSnapshotData().
  
  Reply
  - Fabrice says:
    
    November 10, 2022 at 7:57 am
    
    Thanks Laurenz for answering
    
    Reply

Estimating connection pool size with PostgreSQL database statistics

New connection statistics in v14

Connection pooling modes

The ideal connection pool size

Using the new statistics to find an upper limit for the connection pool size

Conclusion

3 responses to “Estimating connection pool size with PostgreSQL database statistics”

Leave a Reply Cancel reply

Laurenz Albe

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts