CYBERTEC Logo

After almost half a year since last updates I'm glad to announce that another set of useful features and improvements have found their way to pgwatch2 - our Open Source PostgreSQL monitoring tool of choice. New version is incremented to 1.6 but continuing the naming tradition I'm also calling it "Feature Pack 5" as it's mostly a feature release (with a couple of bugfixes as well). So here is a quick overview on changes in v1.6 – for the "full monty" and when updating from a previous version, please turn to the GitHub changelog link below.

Highlights – Prometheus and Patroni support

The choice of data storage "backends" has been widened again (previous version added PostgreSQL) – it's now possible to leverage the very popular Prometheus TSDB to store metrics gathered by pgwatch2! But a recommendation - I'd use Prometheus only when it's already running in your organization and would strongly consider the alternatives (Postgres, InfluxDB) if planning to only monitor Postgres and starting a fresh monitoring setup as Prometheus is not able to store string values which is not optimal for Postgres monitoring. Also remember that we're talking about the "pull" model now – previously metrics were pushed to the datastore. To get quickly started there's also a suitable "predefined config" provided and an "DB Overview" dashboard similar to Postgres / InfluxDB.

The second big feature add support for our favourite Postgres HA cluster manager – Patroni, and should especially appeal to people running Postgres in a "cloudy" manner. In short the feature means that you can provide connection information to your Distributed Consensus Store (etcd, Zookeeper, Consul) that's powering Patroni and then pgwatch2 will periodically consult this DCS and fetch the active "to be monitored" instances listing from there. Remember - the standard way is to exactly tell on which IP-s / hostnames your Postgres instances are available. For the time being though to keep it simple access is only possible to DCS-s that are not using passwords (which is the common case as no real secrets are stored there).

Call for feedback

And as always, please do let us know on Github if you’re still missing something in the tool or are experiencing difficulties - any feedback would be highly appreciated!

Project GitHub link – here.
Full changelog – here.

Most important changes for v1.6

Due to quite some requests we decided to also support one of the most popular metrics gathering solutions – Prometheus. When --datastore=prometheus then pgwatch2 doesn't do any interval-based fetching but listens on port 9187 (changeable) for scrape requests configured and performed on Prometheus side. Returned metrics belong to "pgwatch2" namespace (a prefix basically) and is changeable via the --prometheus-namespace flag. Also, important to note – in this mode the pgwatch2 agent should be now run on all individual DB hosts, not centrally. While technically possible though would counter the core idea of Prometheus and would make scrapes also longer, risking timeouts. FYI – the functionality has overlap with the existing "postgres_exporter" project but also provides more flexibility in metrics configuration and all changes are "online".

pgwatch2 can automatically track and monitor all members of a HA cluster via a DCS connect string. Use the new "Host config" column to provide the DCS info according to help texts when using the Web UI or "host_config" map when using YAML. To track a single DB on all cluster members set "DB type" to " patroni" and to track all DB-s use "patroni-continuous-discovery".

Previously all declared hosts in "enabled" state were monitored but now if we're now only interested in the active master / primary then some storage / network bytes can be saved, and switchover detection will be automatic.

Powered by "goreleaser". Releases will now have DEB / RPM packages of the "gatherer daemon" + needed config / SQL files available on GitHub releases page.

This was an oversight and makes life a lot easier.

Previously one had to have all pieces (host, port, etc...) as separate fields.

Not all but most important settings are tracked when the "settings" metric is activated. Visualized on the "Change events" dashboard.

This enables cluster level summarization of all DB-based metrics. See --add-system-identifier / system-identifier-field params. PG 10+ only due to pg_control_system() requirement.

Previously one had to change all metrics for that. Feature is useful for plain informational purposes or for advanced (corporate) use cases when wanting to "join" metrics gathered by some other monitoring system that works with real database names. See --add-real-dbname / real-dbname-field params.

For most out-of-the-box dashboards it doesn't change much besides a bit shinier visuals, but this means that old v5 dashboards won't get any more updates!

As there have been quite some questions on alerting (which is out of scope basically though), we've added something that people could take a look at and go from there. Postgres backend only.

Most changes went to "Health-check" – now showing more info: PG version, uptime, transaction wraparound, longest autovacuum. "Change events" dash now also displays "server restarted / crashed" event annotations. Many other minor dashboard adjustments.

 

pgwatch2 is constantly being improved and new features are added. Learn more >>

 

Screenshot of the updated "Health check" dashboard in pgwatch2 v1.6

Screenshot of the new "Alert template" dashboard

Cluster encryption can be used if the DBA cannot or does not rely on the file system in terms of confidentiality.

If cluster encryption is enabled, PostgreSQL encrypts data  (both relations and write-ahead log) when writing to disk, and decrypts it when reading. The encryption is transparent, so the applications see no difference between the encrypted and unencrypted clusters.

PostgreSQL 9.6 with TDE on Ubuntu

In this blog, we go through the basic steps used to install PostgreSQL 9.6 with Transparent Data Encryption (TDE) on Ubuntu. You can download the patch for this here.

Create data directory for PostgreSQL:

Just for example, I am creating it at the default location:

 

sudo mkdir -p /usr/local/pgsql/data
sudo chown postgres:postgres /usr/local/pgsql/data

Install the libraries for readline, bison, flex, openssl, zlib and crypto:

 

sudo apt-get install libreadline8 libreadline-dev zlibc zlib1g-dev bison flex libssl-dev openssl

Run configure from the source code directory:

You can choose where all files will be installed by passing --prefix. Default is /usr/local/pgsql which I am using here for example. Make sure you enable openssl by passing --with-openssl:

 

sudo ./configure --prefix=/usr/local/pgsql --with-openssl
sudo make

 

The above two commands should run without errors. Now we are ready to install:

 

sudo make install

 

We can now proceed to initialize the cluster. For that, let's switch to the postgres user:

 

sudo su - postgres

As a good practice, lets add the PostgreSQL binaries to PATH:

 

export PATH=$PATH:/usr/local/pgsql/bin

To create an encrypted cluster, use the -K option to pass the initdb utility. For example:

 

initdb -D /usr/local/pgsql/data -K/usr/local/pgsql/keypass

 

Here /usr/local/pgsql/keypass is an executable file that returns either encryption key or encryption password with the appropriate prefix. In this case, we are passing the encryption_password in 8-16 characters in a simple executable file which outputs: 

 

encryption_password=<8-16_passphrase>

$ chmod 755 /usr/local/pgsql/keypass
$ cat /usr/local/pgsql/keypass
echo encryption_password=UrOs2k11CHiGo

 

Internally, PostgreSQL always uses the encryption key. If the encryption key command returns a password then a key will be generated from the password using a built-in key derivation function. Optionally, you can pass encryption_key as a hex encoded 256 bit key from any key store. 

 

$cat /usr/local/pgsql/keypass
echo encryption_key=<sh /location/return-key.sh>

 

On completion, initdb stores the encryption key command to postgresql.conf. Thus, the user can control the cluster using pg_ctl without passing the encryption command  again and again.

 

If encryption is enabled, full_page_writes must not be turned off, otherwise the server refuses to start. This is because the encryption introduces differences between data within a page, and therefore a server crash during a disk write can result in more serious damage of the page than it would do without encryption. The whole page needs to be retrieved from WAL in such a case to ensure reliable recovery.

vi postgresql.conf

full_page_writes = on

 

Once the PostgreSQL server is running, client applications should recognize no difference from an unencrypted clusterexcept that data_encryption configuration variable is  set.

Unlike pg_ctl, some of the server applications (for  example pg_waldump) do not need the -K because they are not able to process the postgresql.conf file. 

 

Since WAL is encrypted, any replication solution based on log shipping assumes that all standby servers are encrypted using the same key as their standby server. On the other hand,  logica -replication allows replication between encrypted  and unencrypted clusters, or between clusters encrypted with different keys.

 

To read more about the ongoing discussion on adding TDE in core PostgreSQL, please check here.

PostgreSQL 12 is just around the corner and therefore we already want to present some of the new features we like. One important new feature gives users and devops the chance to control the behavior of the PostgreSQL optimizer. Prepared plans are always a major concern (especially people moving from Oracle seem to be most concerned) and therefore it makes sense to discuss the way plans are handled in PostgreSQL 12.

Firing up a PostgreSQL test database

To start I will create a simple table consisting of just two fields:

Then some data is loaded:

Note that 1 million names are identical (“hans”) and just two people are called “paul”. The distribution of data is therefore quite special, which has a major impact as you will see later in this post.

To show how plans can change depending on the setting, an index on “name” is defined as shown in the next listing:

The PostgreSQL query optimizer at work

Let us run a simple query and see what happens:

In this case PostgreSQL decided to ignore the index and go for a sequential scan. It has even seen that the table is already quite large and opted for a parallel query. Still, what we see is a sequential scan. All data in the table has to be processed. Why is that? Remember: Most people in the table have the same name. It is faster to read the entire table and kick out those other ones instead of having to read almost the entire index. The planner figures (correctly) that running a sequential scan will be faster.

What you can take away from this example is that an index is not used because it exists - PostgreSQL uses indexes when they happen to make sense. If we search for a less frequent value, PostgreSQL will decide on using the index and offer us the optimal plan shown in the next listing:

Optimizer statistics: Fueling good performance

If you are looking for good performance, keeping an eye on optimizer statistics is definitely a good idea. The main question now is: Which data does the optimizer keep? pg_stats contains information about each column:

PostgreSQL keeps track over the percentage of NULL entries in the table (null_frac). The average width of a column, the estimated number of distinct values (are all different, are all values the same). Then PostgreSQL keeps a list of the most frequent entries as well as their likelihood. The histogram_bounds column will contain the statistical distribution of data. In our example you will only find entries in this field if you are looking for the “id” column. There are only two names so keeping a histogram is basically pointless. The correlation column will tell us about the physical order of rows on disk. This field can be pretty important because it helps the optimizer to estimate the amount of I/O.

Preparing plans manually

If you send a query to PostgreSQL it is usually planned when the query is sent. However, if you explicitly want to prepare a query, you can make use of the PREPARE / EXECUTE commands. Here is how it works:

As you can see the following query will give us an indexscan:

If we fall back to the more common value we will again get a parallel sequential scan:

Why is that the case? In PostgreSQL life is not so straight forward. Even if we prepare explicitly we will still get a “fresh plan” before a generic plan is created. What is a generic plan? A generic plan is made assuming some constant parameters. The idea is to keep the plan and execute it multiple times in the hope that overall performance goes up due to lower planning overhead. Up to PostgreSQL 11 this process has been a bit “obscure” to most people.

Here is how the “obscure” thing works in detail. There are two ways PostgreSQL can choose for executing a prepared statement:

By default, PostgreSQL chooses a "middle road": it will generate a "custom plan"
during the first 5 executions of the prepared statement that takes the parameter values
into account. From the sixth execution on, it will check if the generic plan would
have performed as well (by comparing the estimated execution costs of the custom and
the generic plan). If it thinks that the generic plan would have done just as well,
the prepared statement will always use the generic plan from that point on.
PostgreSQL 12 introduces a new variable, which allows users to control the behavior more explicitly. Let us try the same thing again and enforce a generic plan:

What you see here is that the plan is constant and PostgreSQL does not attempt replanning. Planning time will be cut BUT it does not necessarily mean that you always win. You might save on some CPU cycles to optimize the query but this of course means that the plan you are using is not necessarily optimal for your parameters.

plan_cache_mode: Valid parameters

If you want to play around with plan_cache_mode you can try the following values:

"auto", which is the default value, resembles the traditional behavior of letting
PostgreSQL choose whether to use a generic plan or not.
You might ask what good it could be to use a prepared statement with "force_custom_plan".
The main reason is that using prepared statements is the best way to prevent SQL injection
attacks, so it may be worth using them even if you don't save on planning time.
If you want to learn more about PostgreSQL 12, consider checking out our blog post about optimizer support functions.

If you are relying heavily on the PostgreSQL COPY command to load data into PostgreSQL quickly, PostgreSQL 12 might offer a feature, which is most likely very beneficial to you. Bulkloading is an important operation and every improvement in this area is certainly going to help many people out there, who want to import data into PostgreSQL as fast as possible.

COPY: Loading and unloading data as fast as possible

When taking a closer look at the syntax of the COPY command in PostgreSQL 12 you will quickly see two things:

• h will now point to the correct page in the documentation
• COPY now supports a WHERE condition

Here is the complete syntax overview:

While having a link to the documentation around is certainly beneficial, the WHERE condition added to PostgreSQL 12 might even be more important. What is the purpose of this new feature? So far it was possible to completely import a file. However, in some cases this has been a problem: More often than not people only wanted to load a subset of data and had to write a ton of code to filter data before the import or once data has been written into the database already.

COPY … WHERE: Applying filters while importing data

In PostgreSQL data can be filtered while importing easily. The COPY become is pretty flexible and allows a lot of trickery. To show you, how the new WHERE clause works, I have compiled a simple example:

First of all 1000 rows are generated to make sure that we got some data to play. Then we export the content of this table to a file:

Finally, we can try to import this data again:

As you can see filtering data is pretty simple and very straight forward. One important thing to note here is: I exported an “id” column and imported it as “x”. Keep in mind that the text file does not know the data structure of our target table - you have to make sure that you filter on the column name of the table you want to import.

Old gems revisited …

If you are new to PostgreSQL in general I also want to present one of the older features, which I like a lot personally. COPY can send data to the UNIX pipe or read data from a pipe. Here is how it works:

In some cases, you might want to do more than to just export data. In this case I decided to compress the data while exporting. Before the data is imported again it is uncompressed and again filtered. As you can see it is pretty simple to combine those features in a flexible way.

If you want to learn more about PostgreSQL and loading data in general, check out our post about rules and triggers. If you want to learn more about COPY, checkout the PostgreSQL documentation.

CYBERTEC Logo white
CYBERTEC PostgreSQL International GmbH
Römerstraße 19
2752 Wöllersdorf
Austria

+43 (0) 2622 93022-0
office@cybertec.at

Get the newest PostgreSQL Info & Tools


    This site is protected by reCAPTCHA and the Google Privacy Policy & Terms of Service apply.

    ©
    2024
    CYBERTEC PostgreSQL International GmbH
    phone-handsetmagnifiercrosscross-circle linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram