October 2016

By Kaarel Moppel - Auto-rebuild bloated tables with pg_squeeze: One of the few areas where out-of-the-box functionality by PostgreSQL is not 100% satisfying, is the “bloat problem”. Combating bloat, or just trying to ensure that your table data is physically ordered according to some column(s) (a.k.a. clustering) required accepting some inconvenient compromises until now. Extended periods of full table locking (no read or write activities) with built-in VACUUM FULL or CLUSTER commands or involving third party tooling, usually meaning “pg_repack”, were necessary. “pg_repack” offers good benefits like a lot smaller full-lock time, ordering by specific columns, but needs a bit of fiddling around - installing the extension, identifying bloated tables, running their command line client, and for larger tables it could also temporarily grow the disk size unnecessarily as it uses triggers to store the modifications made to tables during the pre-building phase.

To alleviate the situation, on behalf of the Cybertec development team, I’m really glad to announce a new bloat-painkiller called "pg_squeeze"! I myself, with my stereotypically calm Nordic temper, don’t usually get too excited by a piece of software, but this time as a day-to-day PostgreSQL user I must say that I’m really impressed - absolutely great piece of work! And also I wonder why nothing like that came about earlier.

What does pg_squeeze do exactly?

pg_squeeze is a PostgreSQL extension implementing a background worker process (one per DB) that periodically monitors tables defined by the user and when it detects a table crossing the “bloat threshold”, it kicks in and rebuilds that table automatically! Rebuilding happens concurrently in the background with minimal storage and computational overhead due to using Postgres’ built-in replication slots together with logical decoding to extract possible table changes happening during the rebuild from XLOG. Bloat threshold is configurable and bloat ratio calculation is based on the free space map or under certain conditions based on concepts of “pgstattuple” extension. Additionally minimum table size can be set, with smaller tables being ignored. Additional requirement for the table to be considered for rebuilding is that they need to have a primary key or unique constraint defined.

Sample setup

# Download and install the extension

git clone …

export PGCONFIG=/usr/bin/pg_config       # point it to your desired Postgres installation

make && sudo make install

cat <<-EOF>> testcluster/postgresql.conf

wal_level=logical

max_replication_slots = 1

shared_preload_libraries = 'pg_squeeze'

EOF

pg_ctl -D  testcluster restart

psql -c “CREATE EXTENSION pg_squeeze”

psql -c “INSERT INTO squeeze.tables 

(tabschema, tabname, first_check) 

VALUES ('public', 'foo', now());”

psql -c “SELECT squeeze.start_worker()”    # PS! not needed when we define the list of “squeezed”

# databases in postgresql.conf

# Download and install the extension

git clone …

export PGCONFIG=/usr/bin/pg_config # point it to your desired Postgres installation

make && sudo make install

cat <<-EOF>> testcluster/postgresql.conf

wal_level=logical

max_replication_slots = 1

shared_preload_libraries = 'pg_squeeze'

EOF

pg_ctl -D testcluster restart

psql -c “CREATE EXTENSION pg_squeeze”

psql -c “INSERT INTO squeeze.tables

(tabschema, tabname, first_check)

VALUES ('public', 'foo', now());”

psql -c “SELECT squeeze.start_worker()” # PS! not needed when we define the list of “squeezed”

# databases in postgresql.conf

Details

In addition to the above-mentioned option to list databases and tables taking part in the auto-rebuild, also following “rebuild decision” aspects can be configured for every table by adjusting values in the “squeeze.tables” table. NB! Only this table and start/stop_worker() calls are meant to be the only “user interface” provided for the extension, with other tables/functions meant for internal use (although it is possible to launcher).

Bloat threshold in percentage (“free_space_extra” param), defaulting to 50%
Minimum disk space in megabytes the table must occupy to be eligible for processing
An index name to physically order tuples according to keys of that index
New tablespace for table/indexes can be specified
Minimum time between two rebuilds (safety)

Additionally provided on the global (database) level:

squeeze.log – table storing rebuild events and their durations to see which tables get bloated the most.
squeeze.errors - table contains errors that happened during squeezing. Normally errors should only be of type “DDL being performed” (adding of a new column for example) or “max_xlock_time” reached.
squeeze.max_xlock_time – parameter specifying maximum exclusive lock holding time during the “table switch”.

Also due to using logical decoding, this means that only newer versions of PostgreSQL starting from version 9.4 can be taken into consideration.

Currently “pg_squeeze” supports the latest PostgreSQL 16, see the Github page for the latest info.

Configuring automatic start of “squeeze” workers

If you want the background worker to start automatically during startup of the whole PostgreSQL cluster, add entries like this to “postgresql.conf”, selecting appropriate databases and a role for the worker. More explanations on that are available from the README.

squeeze.worker_autostart = 'mydb1 mydb2'

squeeze.worker_role = postgres

squeeze.worker_autostart = 'mydb1 mydb2'

squeeze.worker_role = postgres

Grab the code here and try it out! Questions and feedback welcome.

In case you need any assistance, please feel free to contact us.

Last week a new PostgreSQL major version with the number 9.6 was released and I already covered my favorite parts from the official release notes from DBA’s point of view in a blogpost here. Now I would like to look at the same list of changes from a different angle, transforming into a database developer in my mind 🙂 Some features of shared interest are of course re-listed, but mostly new stuff, so put on your dev-hat and enjoy.

My favourites from PostgreSQL 9.6 from a Dev’s point of view

Parallel execution of sequential scans, joins and aggregates.

Needs enabling via “max_parallel_workers_per_gather” parameter, which can luckily be done on the user level, thus making per-query parallelization possible. But one thing to note here is that the total number worker processes is limited with “max_worker_processes” parameter so this might need increasing (default is 8) on good hardware when doing parallel queries from lots of concurrent sessions.

postgres_fdw now supports remote joins, sorts, UPDATEs, and DELETEs

Transparent sharding ahoy! That was already possible with 9.5 (foreign table inheritance + exclusion constraints) but now things also perform well for the non-simplistic usecases. Great stuff.

Allow extension-provided operators and functions to be sent for remote execution, if the extension is whitelisted in the foreign server's options

Allows efficient execution of remote queries involving extension operators. Again, great for sharding scenarios.

Synchronous replication now allows multiple standby servers for increased reliability.

Together with the new "remote_apply" setting for the "synchronous_commit" parameter user have power to create “mirrored” multi-machine clusters. Awesome feature!

A new index type called “Bloom filter”

A pilot feature for generic index access methods, this new index type enables Bloom filters in Postgres. Basically Bloom index can definitely tell you if your entry “is not” there, with “is there” being lossy and giving false-positives. More info on the algorithm here. Support only equality queries and “int” and “text” datatypes but is a lot more efficient than B-tree for multi-column query conditions. Contrib module.

Phrase search for full-text search

Meaning order of your “search words” is respected.

Substring similarity search for pg_trgm

Previously only whole words could be fuzzy-compared with pg_trgm, now also inputs with best matching fractions can be determined with “word_similarity” function. Contrib module.

Add jsonb_insert() function to insert a new element into a jsonb array, or a not-previously-existing key into a jsonb object
Allow “pythonic” array slice specifiers, e.g. array_col[3:]
postgres_fdw - support for declaring extensions supported by the foreign server so that for example pg_trgm and btree_gin/gist operations would be performed remotely. Huge performance wins possible here
ALTER TABLE ADD COLUMN IF NOT EXISTS - less errors in migration scripts 🙂
New built-in role called “pg_signal_backend” to enable this previously superuser-only functionality.

This would enable canceling running queries and terminating sessions of other database users. More built-in roles to be expected in future.

Add pg_size_bytes() function to convert human-readable size strings to numbers
"psql" supports multiple -c and -f command-line options.

The specified operations are carried out in the order in which the options are given, and then psql terminates.

“psql” - add errverbose command that shows the last server error at full verbosity

This is useful after getting an unexpected error

"pgbench" - allow weighted script selection when running a test
"pgbench" - collect statistics for each script in a multi-script run

With those two pgbench changes one can now build and test "close to real life" testcases faster and easier.

Last week a new PostgreSQL major version with the number 9.6 was released! The announcement, release notes and the official “What’s new” overview can be found here, here and here – it’s highly recommended for reading, so check them out. But as always, there's also a slew of blog-posts from exited members of the global Postgres community follows (check out Planet PostgreSQL here if not yet subscribed), each with a bit of a different angle. Now I would like to add my own impressions on the most interesting/relevant features, summarized for easy digestion.

PostgreSQL 9.6 - general things

As always, users who upgrade or initialize a fresh cluster, will enjoy huge performance wins (avoid scanning frozen pages unnecessarily during vacuum freeze, scalability on multi-CPU-socket servers, checkpoint writes in sorted order, index-only scans for partial indexes) out of the box without doing, or being able to do anything, but here I would like to look at the things that you won't get out of the box, but you actually need to take some steps to start benefiting from them. The list below highlights PostgreSQL 9.6 features compiled from a DBA’s viewpoint. This week a similar article where we look at the changes from a developer's point of view will follow.

Upgrading considerations

First, here's a list of things that could most likely cause problems when migrating to PostgreSQL 9.6 from an older version. Before migrating one should of course test on a separate replica and go through the full list of possibly incompatible changes from the release notes.

Columns for pg_stat_activity (information on active sessions) have changed, providing more details on the lock types blocking the given process.

"waiting" column has been replaced with "wait_event_type" and "wait_event".

Treat role names beginning with "pg_" as reserved.

Pretty simple stuff, "pg_upgrade" will give you an error "The old cluster contains roles starting with 'pg_'".

psql's -c option no longer implies –no-psqlrc.

In case there exists a ".psqlrc" file this could cause your Cron scripts to generate some unwanted output (translating to emails usually), even with the "-q/--quiet" flag.

My favourites from a DBA’s point of view

Parallel execution of sequential scans, joins and aggregates.

Needs enabling via “max_parallel_workers_per_gather” parameter, which can luckily be done on the user level too, thus making per-query parallelization possible. But one thing to note here is that the total number worker processes is limited with “max_worker_processes” parameter, so this might need increasing (default is 8) on good hardware when doing parallel queries from lots of concurrent sessions.

Time-based limiting of maximum MVCC snapshot age via the “old_snapshot_threshold” parameter.

Beyond the threshold, old data may be vacuumed away, and users will get a “snapshot too old” error when trying to read such old rows. Warning! From the documentation - “When this feature is enabled, freed space at the end of a relation cannot be released to the operating system”...sothis is basically a double-edged sword basically and it's not enabled by default.

Synchronous replication now allows multiple standby servers for increased reliability.

Together with the new "remote_apply" setting for the "synchronous_commit" parameter user have power to create “mirrored” multi-machine clusters. Awesome feature!

Add pg_stat_progress_vacuum system view to provide progress reporting for VACUUM operations
Improve pg_rewind so that it can work when the target timeline changes

From release notes - “This allows, for example, rewinding a promoted standby back to some state of the old master’s timeline”. Meaning you could promote a replica, do some migration testing say and then convert it back into a normal replica. Great!

Add pg_control_system() and some other pg_control_* functions

Enables to read information equivalent to “pg_controldata” utility via SQL. Previously one had to work around it via a custom PL/Pythonu stored procedure or even custom extension, when wanting to expose “database system identifier” for monitoring queries for example.

Add pg_blocking_pids() function

As release notes formulated it well "Historically users have obtained such information using a self-join on the pg_locks view. However, it is unreasonably tedious...", this means one can hugely simplify monitoring scripts and ad-hoc troubleshooting, paired with some joins to pg_stat_activity.

Allow sessions to be terminated automatically if they are in idle-in-transaction state for too long via “idle_in_transaction_session_timeout” parameter.

Useful to prevent forgotten transactions from holding locks or preventing vacuum cleanup for too long. Bye-bye Cron scripts trying to do the same by reading pg_stat_activity regularly and terminating misbehaving transactions.

Merge the “archive” and “hot_standby” values of the “wal_level” configuration parameter into a single new value “replica”

Postgresql.conf needs adjusting while migrating and already not using “logical”!

Add a CASCADE option to CREATE EXTENSION to automatically create any extensions the requested one depends on
New built-in role called “pg_signal_backend” to enable this previously superuser-only functionality.

More built-in roles to be expected in future.

Add pg_size_bytes() function to convert human-readable size strings to numbers
"psql" supports multiple -c and -f command-line options.

The specified operations are carried out in the order in which the options are given, and then psql terminates.

When doing PostgreSQL consulting the other day, the talk went to the topic of connection pools - namely what approaches and products are commonly used and perform well? The topic is pretty wide in itself but mostly well-known for old-timers. Nevertheless it is worth a small write up on basic concepts and a small comparison of the two most common "near to Postgres" products that you should know about - PGBouncer and pgpool-II.

First a basic intro - connection pools are middleware that speak the database protocol and cache database connections so that clients could spare the time used to negotiate the connection, do authentication and set client defaults (encoding, work_mem) when opening a new connection and also to relieve the database server from storing too much client state in memory. Thus applications connect to the pool, thinking it's the database.

Common approaches for deploying pools:

Integrated - via in-process, language native libraries e.g. HikariCP for Java or Psycopg for Python
Application co-located - pooling server/product is located on the same node as application. Both approaches make it harder to limit the total amount of connections on the DB.
Independent - pool is on a separate machine. Most flexible, allows transparent switching of the underlying DB.
DB co-located - pool runs on the same machine as DB. This would hurt high-availability normally, client would notice the DB going away
Mixed solutions - one could for example cascade above approaches and also throw in HAProxy to ensure better availability

The usual suspects - PgBouncer and pgpool-II

When talking about separate pooling servers in Postgres context, two products stand out: PgBouncer and pgpool-II. Both products are written in C and seem to be actively maintained, with pgpool-II seeing more action as it also has a higher number of ambitious features. Besides source code, packages for common Linux distros are available and deployment it pretty simple: customize the configuration files and start the daemon. Redirect your clients to the pools instead of the real DB (both use non-standard ports by default though - 6432 and 9999 accordingly). Based on the available online documentation (PgBouncer in current version 1.7.2 and pgpool-II in version 3.5.4, the latter one having sadly some outdated parts) I compiled the following outline of the features so that you can decide on the suitability for your needs yourself.

pgpool-II features

connection pooling
queuing of incoming connections
load balancing of SELECT statements
* embedded Postgres query parser enabling advanced things e.g. Oracle-style hints like '/*NO LOAD BALANCE*/' possible
* different weights for different servers
* function white/blacklisting (as functions are called via SELECT)
* configurable replication delay threshold, if exceeded master will be used
statement level replication by sending the same query to many (master) servers
* compares number of affected rows
* default timestamp column handling when doing inserts
* CURRENT_TIMESTAMP, CURRENT_DATE, now() will be replaced with constants
* a mechanism (table locking) to ensure same ID's for INSERTs on tables with SERIALs
HA features
* auto failover (configurable callback scripts)
* graceful adding/removing replicas
* watchdog with quorum support for failing over Pgpool itself (moving of virtual IP)
* one click/command provisioning of replicas (sample scripts provided)
query caching
* in-memory or "memcached" relayed
* DML and time based invalidation schemes
elaborate pool management options
* basic pool status via normal SHOW commands
* pcp_* command line utils for starting, stopping, etc
* pgpool_adm extension for pool management via SQL
* pgpoolAdmin web interface
SSL support
optional authentication and access filtering (pg_hba.conf format) layer
online config reload for most settings

pgpool-II gotchas

session-based pooling only
pooling for one cluster only
no multi-statement queries
using pg_terminate_backend() to stop a backend will trigger a failover!
no multi-byte encoding translations, client must know server encoding

PgBouncer features

lightweight (event based architecture) connection pooling
3 pooling modes
* session (default)
* transaction
* statement
graceful connection re-direction to a new node (for non SSL connections, *nix only)
can pool multiple clusters/databases
pausing the pool and queuing of incoming connections for example to restart a database without clients noticing
simple management interface when connecting to the special "pgbouncer" database
* aggregated statistics
SSL support
optional authentication and access filtering (pg_hba.conf format) layer
online config reload for most settings

PgBouncer gotchas

no automation
non-obvious configuration of real connection limits to the underlying database (max_client_conn, default_pool_size, max_db_connections, max_user_connections, min_pool_size, reserve_pool_size)
connect_query (executed before connection given to client, e.g. setting of encoding, work_mem etc) errors are ignored

Testing performance

From sceptical reasons, to test the claims of my colleague Ants that PgBouncer is significantly faster than pgpool-II. I also decided to run a quick set of tests with all the components running on my laptop. Test setup - a small 13MB "pgbench" in-memory dataset in "--select-only" mode to get fast responses, as we want to test here only connection overhead. Pools were configured without SSL and so that the tested amount of 8 concurrent connections would always be kept cached by the pools and no connection re-establishing would take place during test. For PgBouncer default "session pooling" was used.

pgbench -i -s 1 bench	# init the bench schema ~13MB
for port in 5432 6432 9999 ; do
  for i in {1..3} ; do
	pgbench  --select-only --connect -T300 -c8 -j2 -p $port bench
  done
done

pgbench -i -s 1 bench # init the bench schema ~13MB

for port in 5432 6432 9999 ; do

for i in {1..3} ; do

pgbench --select-only --connect -T300 -c8 -j2 -p $port bench

done

Side note - before I could really fire off with testing I ran into a distro-specific problem where connections started to fail after some time, and it required changing some kernel parameters. More info here.

Results (as always, given with a YMMV disclaimer) were such:

no pooling 356 avg. TPS
Pgpool-II 3939 avg. TPS (10x general improvement)
PgBouncer 6626 avg. TPS (17x general improvement, 75% improvement over Pgpool)

Summary

Both well-known and battle-tested products, PgBouncer and pgpool-II, provide a good way to grab that low-hanging fruit for performance (very noticeable difference when doing very short and simple transactions) and also to add some flexibility to your setup by hiding the database from direct access, making it easier to do minor maintenance. For most usecases (no replicas or using external HA solutions) PgBouncer would be my pick, due to its lightweight architecture and superior performance.

In case you have any questions, feel free to contact us.