With summer fading away it’s time to get busy again – last couple of weeks I’ve taken time to work on our Open Source PostgreSQL monitoring tool pgwatch2 and partly on request from a couple of bigger organizations using the tool on a larger scale, added another batch of useful management / scaling features and some more minor enhancements from the Github queue as well. By the way, this is already the 4th “Feature Pack” in one and a half years, so after having implemented the below features we’re considering the software now “ripe”, with no important features missing. Also we’re glad that quite some people have given their feedback recently, helping to improve the software even further and thus hopefully helping to provide more value to the PostgreSQL community. But read on for a more detailed overview on the most important features from this v1.4.0 update.
Getting friendly with Ansible & Co
Similar to the last update we have tried to make pgwatch2 easier to deploy on a larger scale. This time nothing new on the containerization front though, but we’ve added the possibility to make repeatable, configuration based deployments possible! Meaning – one can add a config file(s) with connect strings, metric selections / intervals and the metric definitions themselves to some version control / configuration management / application deployment system and deploy the metrics collector easily to each required DB node, pushing metrics directly to InfluxDB or Graphite. This works better also for firewalled environments.
The previously supported centrally managed metrics gatherer / configuration database apporach works as before, but for the case when the amount of servers gets too large (hundreds and above) to be handeled by one central gatherer without lag, one can now add a logical grouping label to the monitored hosts and then deploy separate gatherers for subset(s) of hosts based on that label. There are also other performance changes like batching of metric storage requests and connection pooling, helping to increase throughput.
Metrics / Dashboards
As usually there are also a couple of new pre-defined metrics, most notably “psutil” based system statistics (CPU, RAM, disk information), also 2 “preset configs” (the “unprivileged” one for regular login user / developers might be the most useful) and new dashboards to go along with those metrics. As as reminder – one doesn’t need to work with the provided dashboards “as is”, but they can just be used as templates or inspiration source for user modifications.
Some other dashboards (e.g. DB overview) got also some minor changes to make them more beginner-friendly.
Ad-hoc monitoring of a single DB
For those quick troubleshooting sessions for a shorter period, where you really don’t want to spend too much time on setting up something temporary, we’ve added a flag / env. variable to start monitoring based on a standard JDBC connect string input. This works especially well for superusers as all needed “helper functions” will be then created automatically. NB! Unprivileged users might also want to add the PW2_ADHOC_CONFIG=unprivileged env. variable to below sample and also start with the according “DB overview – Unprivileged” dashboard. See here for more.
docker run --rm -p 3000:3000 --name pw2 \ -e PW2_ADHOC_CONN_STR="postgresql://[email protected]/pgwatch2" \ cybertec/pgwatch2
Most important changes for v1.4.0
- File based mode
No central config DB strictly required anymore.
- Ad-hoc mode
“Single command launch” based on JDBC connect string for temporary monitoring sessions.
- A new “group” label for logical grouping / sharding
For cases where the amount of monitored DBs grows too big for one gatherer daemon to handle or there are different criticality requirements.
- Continuous discovery of new DBs
The gatherer daemon can now periodically scan for new DBs on the cluster and start monitoring them automatically.
- Custom tags
Now users can add any fixed labels / tags (e.g. env/app names) to be stored for all gathered metric points on a specific DB.
- A stats/health interface for the gatherer daemon
Dumps out JSON on metrics gathering progress.
- New dashboard – DB overview Developer / Unprivileged
Uses data only from metrics that are visible to all Postgres users who able to connect to a DB, with only “pg_stat_statements” additionally available.
- New dashboard – System Stats
Python “psutil” package + “PL/Python” required on the monitored DB host. Provides detailed CPU / Memory / Disk information with the help of according helpers / metrics.
- New dashboard – Checkpointer/Bgwriter/Block IO Stats
To visualize checkpoint frequencies, background writer and block IO based on Postgres internal metrics.
- Gatherer daemon now supports < 1s gathering intervals
- Connection pooling on monitored Dbs
Big improvement for very small gathering intervals.
- Batching of InfluxDB metric storage requests
Improves metrics arrival lag manyfold when latency to InfluxDB is considerable.