Connection pooling intro - PgBouncer and pgpool-II

10.2016 | Category: Performance | Tags: connections, pgbouncer

10.2016

When doing PostgreSQL consulting the other day, the talk went to the topic of connection pools - namely what approaches and products are commonly used and perform well? The topic is pretty wide in itself but mostly well-known for old-timers. Nevertheless it is worth a small write up on basic concepts and a small comparison of the two most common "near to Postgres" products that you should know about - PGBouncer and pgpool-II.

Table of Contents

First a basic intro - connection pools are middleware that speak the database protocol and cache database connections so that clients could spare the time used to negotiate the connection, do authentication and set client defaults (encoding, work_mem) when opening a new connection and also to relieve the database server from storing too much client state in memory. Thus applications connect to the pool, thinking it's the database.

Common approaches for deploying pools:

Integrated - via in-process, language native libraries e.g. HikariCP for Java or Psycopg for Python
Application co-located - pooling server/product is located on the same node as application. Both approaches make it harder to limit the total amount of connections on the DB.
Independent - pool is on a separate machine. Most flexible, allows transparent switching of the underlying DB.
DB co-located - pool runs on the same machine as DB. This would hurt high-availability normally, client would notice the DB going away
Mixed solutions - one could for example cascade above approaches and also throw in HAProxy to ensure better availability

The usual suspects - PgBouncer and pgpool-II

When talking about separate pooling servers in Postgres context, two products stand out: PgBouncer and pgpool-II. Both products are written in C and seem to be actively maintained, with pgpool-II seeing more action as it also has a higher number of ambitious features. Besides source code, packages for common Linux distros are available and deployment it pretty simple: customize the configuration files and start the daemon. Redirect your clients to the pools instead of the real DB (both use non-standard ports by default though - 6432 and 9999 accordingly). Based on the available online documentation (PgBouncer in current version 1.7.2 and pgpool-II in version 3.5.4, the latter one having sadly some outdated parts) I compiled the following outline of the features so that you can decide on the suitability for your needs yourself.

pgpool-II features

connection pooling
queuing of incoming connections
load balancing of SELECT statements
* embedded Postgres query parser enabling advanced things e.g. Oracle-style hints like '/*NO LOAD BALANCE*/' possible
* different weights for different servers
* function white/blacklisting (as functions are called via SELECT)
* configurable replication delay threshold, if exceeded master will be used
statement level replication by sending the same query to many (master) servers
* compares number of affected rows
* default timestamp column handling when doing inserts
* CURRENT_TIMESTAMP, CURRENT_DATE, now() will be replaced with constants
* a mechanism (table locking) to ensure same ID's for INSERTs on tables with SERIALs
HA features
* auto failover (configurable callback scripts)
* graceful adding/removing replicas
* watchdog with quorum support for failing over Pgpool itself (moving of virtual IP)
* one click/command provisioning of replicas (sample scripts provided)
query caching
* in-memory or "memcached" relayed
* DML and time based invalidation schemes
elaborate pool management options
* basic pool status via normal SHOW commands
* pcp_* command line utils for starting, stopping, etc
* pgpool_adm extension for pool management via SQL
* pgpoolAdmin web interface
SSL support
optional authentication and access filtering (pg_hba.conf format) layer
online config reload for most settings

pgpool-II gotchas

session-based pooling only
pooling for one cluster only
no multi-statement queries
using pg_terminate_backend() to stop a backend will trigger a failover!
no multi-byte encoding translations, client must know server encoding

PgBouncer features

lightweight (event based architecture) connection pooling
3 pooling modes
* session (default)
* transaction
* statement
graceful connection re-direction to a new node (for non SSL connections, *nix only)
can pool multiple clusters/databases
pausing the pool and queuing of incoming connections for example to restart a database without clients noticing
simple management interface when connecting to the special "pgbouncer" database
* aggregated statistics
SSL support
optional authentication and access filtering (pg_hba.conf format) layer
online config reload for most settings

PgBouncer gotchas

no automation
non-obvious configuration of real connection limits to the underlying database (max_client_conn, default_pool_size, max_db_connections, max_user_connections, min_pool_size, reserve_pool_size)
connect_query (executed before connection given to client, e.g. setting of encoding, work_mem etc) errors are ignored

Testing performance

From sceptical reasons, to test the claims of my colleague Ants that PgBouncer is significantly faster than pgpool-II. I also decided to run a quick set of tests with all the components running on my laptop. Test setup - a small 13MB "pgbench" in-memory dataset in "--select-only" mode to get fast responses, as we want to test here only connection overhead. Pools were configured without SSL and so that the tested amount of 8 concurrent connections would always be kept cached by the pools and no connection re-establishing would take place during test. For PgBouncer default "session pooling" was used.

pgbench -i -s 1 bench	# init the bench schema ~13MB
for port in 5432 6432 9999 ; do
  for i in {1..3} ; do
	pgbench  --select-only --connect -T300 -c8 -j2 -p $port bench
  done
done

1

2

3

4

5

6

pgbench -i -s 1 bench # init the bench schema ~13MB

for port in 5432 6432 9999 ; do

for i in {1..3} ; do

pgbench --select-only --connect -T300 -c8 -j2 -p $port bench

done

Side note - before I could really fire off with testing I ran into a distro-specific problem where connections started to fail after some time, and it required changing some kernel parameters. More info here.

Results (as always, given with a YMMV disclaimer) were such:

no pooling 356 avg. TPS
Pgpool-II 3939 avg. TPS (10x general improvement)
PgBouncer 6626 avg. TPS (17x general improvement, 75% improvement over Pgpool)

Summary

Both well-known and battle-tested products, PgBouncer and pgpool-II, provide a good way to grab that low-hanging fruit for performance (very noticeable difference when doing very short and simple transactions) and also to add some flexibility to your setup by hiding the database from direct access, making it easier to do minor maintenance. For most usecases (no replicas or using external HA solutions) PgBouncer would be my pick, due to its lightweight architecture and superior performance.

In case you have any questions, feel free to contact us.

4 responses to “Connection pooling intro - PgBouncer and pgpool-II”

brainwashedGreekOld says:

November 30, 2016 at 7:37 pm

Great writeup my friend!

Reply
jvleminc says:

February 1, 2018 at 3:30 pm

Cheers for this article.

Reply
harisai hari says:

March 12, 2019 at 4:34 am

nice explanation, can i have difference between pgpool and pg bouncer,
can i use any one in my environment ?

Reply
- Pavlo Golub says:
  
  March 12, 2019 at 7:12 am
  
  You should compare features listed in this article and choose product suits for your needs.
  
  Reply

Connection pooling intro - PgBouncer and pgpool-II

The usual suspects - PgBouncer and pgpool-II

pgpool-II features

PgBouncer features

Testing performance

Summary

4 responses to “Connection pooling intro - PgBouncer and pgpool-II”

Leave a Reply Cancel reply

CYBERTEC Guest

Blog Tags

NEWSLETTER

Articles by our PostgreSQL Experts