December 2018

By Kaarel Moppel - PostgreSQL 11 new features - It’s been a bit over a month since the last Postgres major version was released (and also the 1st minor update is out) so it’s not exactly fresh out of the oven...and as usual there has been already a barrage of articles on the most prominent features. Which is great, as I can save some keyboard strokes on those. But there are of course some other little gems that didn’t get any spotlight (by the way release notes are about 13 pages long so a lot of stuff!)...and now luckily had some “aluminium tube time” to have a second look on some more interesting/useful little features and improvements. So here my findings (somewhat logically grouped).

General PostgreSQL 11 features

Add column pg_stat_activity.backend_type

This makes it possible to report the “connected/active users” correctly as just a “count(*)” could lie due to the “parallel query” features added already 2 versions ago.

Allow ALTER TABLE to add a column with a non-null default without doing a table rewrite

Adding new columns with DEFAULT values to large and active tables is a classical beginner mistake, basically halting operation. But no more – Postgres is now a lot more beginner-friendly!

Performance

Fill the unused portion of force-switched WAL segment files with zeros for improved compressibility

My favourite low-hanging fruit from this release...and I wonder why it took so long. Why is it cool? Well, one can now set the “archive_timeout” so low that RPO-s from 1 second (lower values currently not allowed) are doable without a replica, with plain WAL-shipping. A great alternative for semi-important systems where losing a couple of last records is not so tragic.

Improve performance of monotonically increasing index additions

The most common use case for "serial" ID columns got a 30-50% boost!

Allow bitmap scans to perform index-only scans when possible

Bitmap index scans (which are quite different from normal index scans) could get quite costly, so this is very good news.

Add support for large pages on Windows

Mostly known as "huge pages", this feature (at least on Linux) is quite recommended for machines with more than 128GB of RAM. Needs to be enabled on OS side also though.

Replication

Exclude unlogged tables, temporary tables, and pg_internal.init files from streaming base backups

This could be quite a time-saver for huge Data Warehouse type of databases where “unlogged” is used quite often for “staging” tables.

Allow checksums of heap pages to be verified during streaming base backup

And actually if checksums are enabled on the master this happens by default! Can be disabled by “--no-verify-checksums” flag...but you should not. And there was actually also a “sister”-feature added where pg_verify_checksums tool can now run on an offline cluster which could be useful in some cases.

Replicate TRUNCATE activity when using logical replication

A huge one for LR users. Previously there were no good workarounds except creating a DML trigger to catch truncations on the publisher side and to error out.

Partioning / "sharding"

Allow INSERT, UPDATE, and COPY on partitioned tables to properly route rows to foreign partitions

This is supported only by postgres_fdw foreign tables but a great addition when you’re doing “home-grown sharding”.

Allow postgres_fdw to push down aggregates to foreign tables that are partitions

Again great if you’re doing sharding - less data fetched equals faster queries, especially when network is a bit laggy.

Allow partition elimination during query execution

Previously, partition elimination only happened at planning time, meaning many joins and prepared queries could not use partition elimination and it resulted in some unnecessary scans.

Auxiliary tools

Add an approximately Zipfian-distributed random generator to pgbench

Quite a nice feature at least for me personally as I’m a big fan and user of pgbench for quick performance gauging. The feature allows tests to be closer to real-life use cases where we typically have a cold/hot data scenario and active data stays generally more or less in shared buffers. To use the feature one needs to change the default test script though, there's no flag.

Allow the random seed to be set in pgbench

More repeatable test runs, yay! I’ve made use of it already for example here.

Allow initdb to set group read access to the data directory

This is actually useful for 3rd party file-based backup software, such which previously needed “root” or “postgres” privileges, which are of course not good for the cluster’s operational security.

Reduce the number of files copied by pg_rewind

Good news for HA-tools that use pg_rewind automatically (Patroni for example). Helps to quickly restore cluster redundancy.

Allow extension pg_prewarm to restore the previous shared buffer contents on startup

The extension can now be used to automatically reduce the time of degraded performance that usually lasts for a couple of minutes after a restart. Shared buffers are then periodically (or only on shutdown) stored in files and fetched back into shared buffers after a (re)start by 2 background workers. For bigger shared buffers sizes you also want to make sure that the Postgres $DATADIR has enough disk space.

Add psql command gdesc to display the names and types of the columns in a query result

Well that was that for PostgreSQL 11. Please write in the comments section if you found some other small but cool additions for version 11 🙂

PostgreSQL 11 was a while ago. Find out about the latest new features of the current PostgreSQL version in our What's New blog spot.

After years of software development, some might still wonder: What is a NULL value? What does it really mean and what is its purpose? The general rule is: NULL basically means “undefined”. Many books state that NULL means “empty” but I think that is not the ideal way to see things: If you wallet is empty, your financial situation is perfectly defined (= you are broke). But, “undefined” is different. It means that we don’t know the value. If we don’t know how much cash you got, you might still be a millionaire. So using the word “unknown” to describe NULL in SQL is really better than to use word “empty”, which can be pretty misleading in my judgement.

NULL values in PostgreSQL: Basic rules

First of all: NULL is a super useful thing in SQL and people should be aware of the details associated with it. Before digging deeper into NULL it is necessary to take a look at the most basic rules. The following example shows a mistake commonly made by many developers:

test=# SELECT 10 = NULL;
 ?column?
----------

(1 row)

test=# SELECT 10 = NULL;

?column?

----------

(1 row)

Many people assume that the output of this query is actually “false”, which is not correct. The result is NULL. Why is that? Suppose you got 10 bucks in your left pocket and nobody knows how much cash you got in your right pocket. Is the amount of cash in your pockets the same? We don’t know. It might be very well so, but we simply don’t know. Thus the result of this query has to be NULL.

Let's try something else:

test=# SELECT NULL = NULL;
 ?column?
----------

(1 row)

test=# SELECT NULL = NULL;

?column?

----------

(1 row)

The same is true for this query. The result has to be NULL. We don’t know how much cash is in your left pocket and we got no idea how much there is in your right pocket. Is it identical? Again: We have absolutely no idea – the result is undefined.

To figure out if two values are actually NULL we have to use the following syntax:

test=# SELECT NULL IS NULL;
 ?column?
----------
 t
(1 row)

test=# SELECT NULL IS NULL;

?column?

----------

(1 row)

In this case the result is true because “IS” actually check if both value are indeed NULL.
Consequently the next query is going to return false:

test=# SELECT 10 IS NULL;
 ?column?
----------
 f
(1 row)

test=# SELECT 10 IS NULL;

?column?

----------

(1 row)

However, there is more to NULL than just simple operations. NULL is key and therefore it is important to also check some of the lesser known aspects and corner cases.

row() and NULL handling

Some of my readers might already have seen the row() function, which can be used to form a tuple on the fly. In general pretty much the same rules will apply in this case.

Consider the following example:

test=# SELECT row(NULL, NULL) = row(NULL, NULL);
 ?column?
----------

(1 row)

test=# SELECT row(NULL, NULL) = row(NULL, NULL);

?column?

----------

(1 row)

As expected the result is NULL because all values on both sides are “undefined” and therefore there is no way the output of this query can ever be true.

What is important to see is that a row can be compared to a single NULL value. In short: The entire tuple is considered to be NULL by PostgreSQL:

test=# SELECT row(NULL, NULL) IS NULL;
 ?column?
----------
 t
(1 row)

test=# SELECT row(NULL, NULL) IS NULL;

?column?

----------

(1 row)

This is not true for row(10, NULL) - in this case the query returns false. True is only returned if all fields are NULL. However, there is one thing which might as a surprise to some people. The “IS” keyword won't work if you are comparing the output of two “row” functions:

test=# SELECT row(NULL, NULL) IS row(NULL, NULL);
ERROR: syntax error at or near 'row'
LINE 1: SELECT row(NULL, NULL) IS row(NULL, NULL);

test=# SELECT row(NULL, NULL) IS row(NULL, NULL);

ERROR: syntax error at or near 'row'

LINE 1: SELECT row(NULL, NULL) IS row(NULL, NULL);

PostgreSQL will immediately issue a syntax error.

NULL handling in LIMIT clauses

Some time ago I saw some people using NULL in LIMIT / OFFSET clauses. That is somewhat scary but still and interesting issue to think about. Consider the following example:

test=# CREATE TABLE demo (id int);
CREATE TABLE
test=# INSERT INTO demo VALUES (1), (2), (3);
INSERT 0 3

test=# CREATE TABLE demo (id int);

CREATE TABLE

test=# INSERT INTO demo VALUES (1), (2), (3);

INSERT 0 3

The table simply contains 3 rows. Here is what LIMIT NULL does:

test=# SELECT * FROM demo LIMIT NULL;
 id
----
1
2
3
(3 rows)

test=# SELECT * FROM demo LIMIT NULL;

----

(3 rows)

As you can see the entire resultset will be returned. That makes sense because PostgreSQL does not really know when to stop returning rows. Thus, the query is equivalent to “SELECT * FROM demo”. In PostgreSQL there is also an ANSI SQL compliant way to limit the result of a query: FETCH FIRST … ROWS ONLY” is the “proper” way to limit the result of a query. In PostgreSQL 11 “FETCH FIRST ROWS ONLY” will also accept NULL and behave the same way as LIMIT NULL. Here is an example:

test=# SELECT * FROM demo FETCH FIRST NULL ROWS ONLY;
 id
----
1
2
3
(3 rows)

test=# SELECT * FROM demo FETCH FIRST NULL ROWS ONLY;

----

(3 rows)

Mind that this was not always the case. Old versions of PostgreSQL did not accept a NULL value here.

NULL handling in ORDER BY clauses

NULL values are especially tricky if you want to sort data. Usually NULL values appear at the end of a sorted list. The following listing shows an example:

test=# INSERT INTO demo VALUES (NULL);
INSERT 0 1
test=# SELECT * FROM demo ORDER BY id DESC;
 id
----

3
2
1
(4 rows)

test=# INSERT INTO demo VALUES (NULL);

INSERT 0 1

test=# SELECT * FROM demo ORDER BY id DESC;

----

(4 rows)

The point is: Suppose you want to sort products by price. The most expensive ones or most likely not the ones without a price. Therefore it is usually a good idea to out NULL values at the end of the list if you are ordering descending.

Here is how it works:

test=# SELECT * FROM demo ORDER BY id DESC NULLS LAST;
 id
----
3
2
1

(4 rows)

test=# SELECT * FROM demo ORDER BY id DESC NULLS LAST;

----

(4 rows)

Having the NULL values at the end is somewhat more intuitive and usually offers better user experience.

NULL and sum, count, avg, etc.

The way NULLs are handled are also important if you want to run a more analytical type of workload. In general rule is simple: Aggregate functions will simply ignore NULL values. The only exception to the rule is count(*). Here is an example:

test=# SELECT count(*), count(id) FROM demo;
 count | count
-------+-------
     4 |     3
(1 row)

test=# SELECT count(*), count(id) FROM demo;

count | count

-------+-------

4 | 3

(1 row)

count(*) will count ALL the rows - regardless of its content. count(column) will only count the not-NULL values inside a column, which is simply a different thing than just counting everything. Let us take a look at the next example:

test=# SELECT sum(id), avg(id) FROM demo;
 sum | avg
-----+--------------------
   6 | 2.0000000000000000
(1 row)

test=# SELECT sum(id), avg(id) FROM demo;

sum | avg

-----+--------------------

6 | 2.0000000000000000

(1 row)

As I have stated before: The aggregates do not count the NULL values, which means that the average of those 4 rows will be 2 and not 1.5.

The fact that count(*) counts all rows can create subtle bugs when used in an outer join. Consider the following example:

SELECT name, count(*)
FROM   person AS a LEFT JOIN house AS b
       ON a.id = b.person_id
GROUP BY name;

SELECT name, count(*)

FROM person AS a LEFT JOIN house AS b

ON a.id = b.person_id

GROUP BY name;

In this case every count will be at least 1 - even if the person in the list has no house. Keep in mind: The LEFT JOIN will add NULL values to the right side of the join. count(*) will count those NULL values and therefore even the poorest fellow will end up with at least one house. count(*) and outer joins are usually an alarm signal and should be handled with care.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

Prepared transactions are disabled in PostgreSQL by default, since the parameter max_prepared_transactions has the default value 0.

You don't need prepared transactions in most cases. However, they can cause nasty problems, so I think that everybody who runs a PostgreSQL database should understand them.

To illustrate these problems, I'll show you how to use prepared transactions to get a PostgreSQL into an inaccessible state.

What are prepared transactions?

Normally, a database transaction that spans multiple statements is ended with COMMIT or ROLLBACK. With prepared transactions, another step is added:

BEGIN or START TRANSACTION: starts a transaction as usual.
PREPARE TRANSACTION 'name': prepares the transaction for commit or rollback and assigns a name to it.
{ COMMIT | ROLLBACK } PREPARED 'name': commits or rolls back a previously prepared transaction.

The PREPARE TRANSACTION step performs all actions that may fail during COMMIT. That way, both COMMIT PREPARED and ROLLBACK PREPARED are guaranteed to succeed once a transaction is prepared. Moreover, PREPARE TRANSACTION persists the still open transaction, so that it will survive a crash or server restart.

Once a transaction is prepared, it is complete. Subsequent SQL statements belong to different transactions. You cannot do anything with a prepared transaction except COMMIT PREPARED and ROLLBACK PREPARED.

What is the use of prepared transactions?

Prepared transactions are used to implement “distributed transactions”.
Distributed transactions are transactions that affect more than one data source.
The protocol is as follows:

Start a transaction on all data sources involved in the transaction.
Modify data in the data sources. If there is a problem, ROLLBACK all involved transactions.
Once you are done, PREPARE all involved transactions.
If the PREPARE step fails in any of the transactions, issue ROLLBACK PREPARED everywhere.
If the PREPARE step succeeds everywhere, COMMIT PREPARED all involved transactions.

This so-called “two-phase commit protocol” guarantees that the distributed transaction either succeeds or is rolled back everywhere, leaving the whole system consistent.

To make that works reliably, you need a “distributed transaction manager”.
That is software that keeps track of all distributed transactions, persisting their state to survive crashes and other interruptions.
That way it can complete all interrupted distributed transactions as soon as operation is resumed.

Problems caused by prepared transactions

Normally, no transaction should be in the prepared state for longer than a split second. But software bugs and other disruptions can cause a transaction to remain in the prepared state for a longer time. This causes the problems associated with long running transactions in general:

locks that are held for a long time, blocking other sessions and increasing the risk of a deadlock
VACUUM cannot clean up dead tuples created after the start of the transaction

These problems are exacerbated by the fact that prepared transactions and their locks stay around even after the database server is restarted.

Implementation details

Preparing a transaction will write a WAL record, so the prepared transaction can be restored during crash recovery. This requires forcing the WAL to disk, just like a normal commit does.

During a checkpoint, the state of the prepared transaction is persisted in a file in the pg_twophase subdirectory of the data directory. The name of the file is the hexadecimal transaction ID.

On startup, all prepared transactions are restored from pg_twophase.

The file is deleted when the prepared transaction is committed or rolled back.

Getting rid of “orphaned” prepared transactions

You can examine all prepared transactions in the PostgreSQL database cluster using the view pg_prepared_xacts.

If a prepared transaction is “orphaned” because the transaction manager failed to close it, you will have to do that manually. Connect to the correct database and run COMMIT PREPARED or ROLLBACK PREPARED.

Locking up a database with prepared transactions

Warning: Don't try this on your production database!

As a database superuser, run the following:

BEGIN;

LOCK pg_catalog.pg_authid;

PREPARE TRANSACTION 'locked';

BEGIN;

LOCK pg_catalog.pg_authid;

PREPARE TRANSACTION 'locked';

Then disconnect from the database.

pg_authid, the table that contains the database users, is required to authenticate a database session. Since this table is locked by the prepared transaction, all future connection attempts will hang.

Restarting the database won't help, because the prepared transaction will be retained.

Before you read on to the next part that contains the solution, let me invite you to try and get out of this dilemma yourself.

Enter Houdini!

Your first reaction will probably be the same as mine: Start PostgreSQL in single user mode. Alas, no luck:

$ pg_ctl stop
waiting for server to shut down.... done
server stopped
$ postgres --single postgres
LOG:  recovering prepared transaction 571 from shared memory
^C
FATAL:  canceling statement due to user request

$ pg_ctl stop

waiting for server to shut down.... done

server stopped

$ postgres --single postgres

LOG: recovering prepared transaction 571 from shared memory

FATAL: canceling statement due to user request

Single user mode just hangs until I send it a SIGINT by pressing Ctrl+C, which shuts down the server.

But we can easily find a way to recover by reviewing the implementation details above:
571 in hexadecimal is 23b, so while PostgreSQL is shut down, we can remove the prepared transaction as follows:

$ rm $PGDATA/pg_twophase/0000023B

1	$ rm $PGDATA/pg_twophase/0000023B

This will essentially roll back the transaction, and its effects will be undone when PostgreSQL is restarted.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

By Kaarel Moppel: Some weeks ago I wrote about some common concepts / performance hacks related to how you can (relatively) easily scale to a terabyte cluster or more. Based on my experience visiting customers from various industries, 80% of them don't even reach that threshold. However, I wanted to write another post showing that a couple of terabytes are not the “end station” for Postgres, given you're ready to roll up your sleeves and get your "hands dirty". So let’s look at some additional Postgres-like projects for cases where you want to make use of your Postgres know-how and SQL skills over big amounts of data. Postgres is ready for multi-terabyte scaling.

Scaling PostgreSQL

Be warned, the road will now get bumpy: we usually need to change the applications and also the surrounding bits. We’re sharding, meaning data does not live on a single node anymore; SQL aggregates over all data can get quirky. Also, we’re mostly extending the rock-solid core PostgreSQL with 3rd-party extensions or using forks with constraining characteristics. You might have to re-define and re-import the data. You might also need to learn some new query constructs and forget some standard PostgreSQL ones. Generally, be prepared to pull out a bit of hair, if you’ve got any left:) But OK, here are some projects that you should know of.

Postgres extensions/derivatives for multi-terabyte scale-out

Sharding via PL/Proxy stored procedures

This kind of “old school” solution was created and battle-tested in Skype (huge user of Postgres by the way!) by scaling an important cluster to 32 nodes, so it obviously works pretty well. The main upside, on the other hand, is that all data and data access is sharded for you automatically after you pick a stored procedure parameter as a shard key. Plus, you can use all of the standard Postgres features...with the downside that, well, all data access needs to go over PL/pgSQL stored procedures. Most developers, I guess, are not so versed in that. In short PL/Proxy is similar to a kind of glue to get the stored procedure call to reach the correct shard so the performance penalty is minimal. Does not support Postgres 11 yet though ...

Sharding with Postgres-XL

Postgres-XL could perhaps be described as a “PostgreSQL-based” sharding framework. And it actually lives somewhat under the umbrella of the PostgreSQL Global Development Group. It lags a bit behind a Postgres major version or two though, and the setup is by far not as easy as “apt install postgresql” due to the distributed nature – with coordinators, transaction managers and data nodes in the picture. It can, however, help you to manage and run queries on tens of terabytes of data with relatively few restrictions! Of course there are some caveats (e.g. no triggers, constraints on FK-s) as with any PostgreSQL derivative. Also, you can’t also expect the same level of support from the community when countering technical problems.

Nevertheless, it’s actively maintained and a good choice if you want to stay “almost” Postgres with your 50TB+ of data. Biggest cluster I’ve heard of by the way has 130 TB of data on it, so this is worth checking out!

Sharding with Citus - another form of scaling

Citus is an extension to standard Postgres, in that sense, it's a bit of a lighter concept compared to the previous contenders. It's more “up to date”, but it's not a transparent drop-in replacement for all scaling needs. It adds “distributed table” features similar to Postgres-XL, with simpler architecture (only data and coordinator nodes). According to the documentation, it is especially well-suited for multi-tenant and "realtime analytics" use cases. It has some caveats like all the others – e.g. “shard-local” constraints, no subqueries in the WHERE clause, no window functions on sharded/partitioned tables. But defining the tables works as usual: you just needs some function calls to activate the distributed behavior. The project is also under very active development with a decent company behind it for those who require support, so this might be a good choice for your next 50 TB+ project.

Greenplum – a PostgreSQL fork for Data Warehousing

Greenplum – a massively parallel processing (MPP) database system might just be the oldest active Postgres fork alive. It started based on version 8.2, and was developed behind closed doors for a long time. It's been Open Source for a couple of years. It’s now making up for lost time, and trying to modernize itself to include features from the latest Postgres versions. The architecture seems quite a bit more complex compared to the above-mentioned alternatives. It needs thorough studying though. You’d also be giving up on some more advanced/recent SQL features, but I can imagine the architecture decisions are made with certain performance aspects in mind, so it might be a worthwhile trade-off. Also, behind the product stands a huge (publicly noted) consulting company named Pivotal, so again, it's a serious alternative.

Final words

To conclude – don’t be afraid to scale up with Postgres! There are quite a few options (I’m pretty sure I forgot some products) for every taste, also professional support is available.

Scaling in the documentation

One more remark to the Postgres community though - I think that making the “scaling” topic a bit more discoverable for newcomers would do a lot of good for general Postgres adoption and adding a few words to the official documentation might even be appropriate. Currently there's a bit on HA and replication here, but the word “scaling” is not even mentioned in this context.

In order to receive regular updates on important changes in PostgreSQL, subscribe to our newsletter, or follow us on Twitter, Facebook, or LinkedIn.

General PostgreSQL 11 features

Performance

Replication

Partioning / "sharding"

Auxiliary tools

NULL values in PostgreSQL: Basic rules

Let's try something else:

row() and NULL handling

Consider the following example:

NULL handling in LIMIT clauses

NULL handling in ORDER BY clauses

NULL and sum, count, avg, etc.

What are prepared transactions?

What is the use of prepared transactions?

Problems caused by prepared transactions

Implementation details

Getting rid of “orphaned” prepared transactions

Locking up a database with prepared transactions

Enter Houdini!

Scaling PostgreSQL

Postgres extensions/derivatives for multi-terabyte scale-out

Sharding via PL/Proxy stored procedures

Sharding with Postgres-XL

Sharding with Citus - another form of scaling

Greenplum – a PostgreSQL fork for Data Warehousing

Final words

Scaling in the documentation