It’s been exactly half a year since the last major release of the pgwatch2 Open Source PostgreSQL monitoring tool, and I’m glad to announce that another huge set of useful features and improvements have found their way into the pgwatch2 code repository! The new version is incremented to v1.7 and, continuing the naming tradition, I’m also calling it “Feature Pack 6” as it’s mostly about new features mixed with some bugfixes, although it’s been surprisingly error-free for the last year. Note that listed below are only the most significant changes since last major version, v1.6, so please go through the GitHub changelog in detail if you’re upgrading from a previous version.
Highlights of pgwatch2 v.1.7.0 – log parsing, recommendations engine, real-time stats tracking and new dashboards
The biggest feature for me is the log parsing support. This means that when the pgwatch2 agent is deployed locally on the DB server (also called in “push” mode), it can “tail” the logs and send event counts to the metric store. And when is that useful? Well, mostly for all alerting and health-check purposes. Note that we’re not storing error message contents to avoid security and performance problems, but only the event counts, grouped by severity.
Other notable features are the “recommendations engine” and some infrastructure changes to treat metrics with “real-time” in their name a bit differently so that they could be executed very often (every second even) thus providing a near real-time glimpse into database activities, with the main practical usage of tracking live (currently executing) server processes conveniently there where all other metrics are viewed – in Grafana. Also included is possible locking info, utilizing “pg_stat_activity” and “pg_locks” data. But not only that – also for example live “explain plan” tracking for longer-lasting queries is now possible.
Call for feedback
And as always, please do let us know on GitHub if you’re still missing something in the tool or are experiencing any operational difficulties – any feedback would be highly appreciated!
Most important changes since v1.6
* New feature – server log parsing
Logs are continuously parsed for errors, warnings, etc and these counts are regularly sent to the metric store. Assumes that the pgwatch2 collector is set up locally and has access to the log file folder – paths are auto-detected when sufficient privileges are there. Note that out-of-the-box it only works in conjunction with the CSVLOG log format (which is actually the best format if you don’t plan to log absolutely all queries, events, etc as it can generate quite some volume then) – if you’re not using that you need to provide a customized regex in Golang syntax.
* New feature – recommendations engine
Some new metrics and a dashboards to list some common “best practice” violations like: too many superusers for possible threats, overly long-running queries, possible autovacuum issues, “untightened” security definer functions, “untightened” public schemas, unused indexes, too deeply nested views and also possibly missing indexes! The latter is based on the pg_qualstats extension which needs to be installed / configured independently on all target databases. But as the object of the recommendations / opinions is to allow that everyone to have their own view, the system is set up so that these recommendations can easily be extended / changed via SQL (as with normal metrics).
* New feature – real-time session activity monitoring via Grafana
Well, near-real-time of course, as active session data (based on pg_stat_activity with additional locking info) first needs to be sent to the metric store and only then it’s selected from Grafana as usual. But there are some optimizations on the storage level (unlogged tables, short partitions) so it should be fine with intervals in seconds. Note that as it still can be quite a “costly” feature, it’s not included in the default “preset configs” and needs to be enabled separately – according metric is called “stat_activity_realtime”. See below for a picture of the new dashboards.
* New “Postgres Version Overview” dashboard
Show the current PostgreSQL version for all monitored DBs, highlighting versions that are too old and out of the official support window. See below for a picture.
* New “Stat Statements SQL Search” dashboards
Shows execution statistics (runtime, calls, mean time) for queries matching some user-inputted SQL fragments. Note that this requires the “stat_statements” metric to be enabled and also remember that by default pgwatch converts queries to lower case and cuts off the end if it’s more than 16k chars. See below for a picture.
* Make pgwatch2 “superuser” aware
As of now,for some metrics there are defined “direct access SQL-s” that superusers / pg_monitor roles can use so that less helper / wrapper functions are required for example if you’re doing a local setup where the collector runs on the database server.
* Gatherer improvement – support password/cert authentication for Patroni and etcd
* Gatherer improvement – sync list of monitored DBs back to metric store regularly
This means that the list of configured to-be-monitored DBs data is also present together with normal metrics data so that one can easily create downtime checks without cross-checking with the config DB / YAML files. See the “Global health” dashboard for a sample implementation (for Postgres metric store only).
* Phase out Python 2 in favor of Python 3, since the former is already officially EOL
If Python 2 is still required users need to replace the “plpython3u” line with “plpythonu” in metric fetching helpers.
* Added a HELM chart to deploy pgwatch2 easily on Kubernetes
* New “ping” mode to quickly check connectivity to all DBs.
Use the gatherer “–ping” flag to see that all configured (to be monitored) DB-s are reachable. No metrics are collected, and the program exits after these connection checks.
* Usability – schema and dashboard rollout scripts
Rolling out the Config DB or Metrics DB schema or importing all pgwatch2 Grafana dashboards are now one-liners. Also the ready-made RPM / DEB / Tar packages now include the Grafana dashboards, so no internet connection is required after installing the packages.
* Usability – filter out pgwatch2 generated metric fetching queries in Stat Statements Top
This means it’s easier to detect expensive “real” queries, coming from the users. For mostly idle systems it can actually happen that monitoring queries are the top time consumers. This behaviour can be also changed via a dropdown on the dashboard.