After years of software development, some might still wonder: What is a NULL value? What does it really mean and what is its purpose? The general rule is: NULL basically means “undefined”. Many books state that NULL means “empty” but I think that is not the ideal way to see things: If you wallet is empty, your financial situation is perfectly defined (= you are broke). But, “undefined” is different. It means that we don’t know the value. If we don’t know how much cash you got, you might still be a millionaire. So using the word “unknown” to describe NULL in SQL is really better than to use word “empty”, which can be pretty misleading in my judgement.
NULL values in PostgreSQL: Basic rules
First of all: NULL is a super useful thing in SQL and people should be aware of the details associated with it. Before digging deeper into NULL it is necessary to take a look at the most basic rules. The following example shows a mistake commonly made by many developers:
test=# SELECT 10 = NULL; ?column? ---------- (1 row)
Many people assume that the output of this query is actually “false”, which is not correct. The result is NULL. Why is that? Suppose you got 10 bucks in your left pocket and nobody knows how much cash you got in your right pocket. Is the amount of cash in your pockets the same? We don’t know. It might be very well so, but we simply don’t know. Thus the result of this query has to be NULL.
Let’s try something else:
test=# SELECT NULL = NULL; ?column? ---------- (1 row)
The same is true for this query. The result has to be NULL. We don’t know how much cash is in your left pocket and we got no idea how much there is in your right pocket. Is it identical? Again: We have absolutely no idea – the result is undefined.
To figure out if two values are actually NULL we have to use the following syntax:
test=# SELECT NULL IS NULL; ?column? ---------- t (1 row)
In this case the result is true because “IS” actually check if both value are indeed NULL.
Consequently the next query is going to return false:
test=# SELECT 10 IS NULL; ?column? ---------- f (1 row)
However, there is more to NULL than just simple operations. NULL is key and therefore it is important to also check some of the lesser known aspects and corner cases.
row() and NULL handling
Some of my readers might already have seen the row() function, which can be used to form a tuple on the fly. In general pretty much the same rules will apply in this case.
Consider the following example:
test=# SELECT row(NULL, NULL) = row(NULL, NULL); ?column? ---------- (1 row)
As expected the result is NULL because all values on both sides are “undefined” and therefore there is no way the output of this query can ever be true.
What is important to see is that a row can be compared to a single NULL value. In short: The entire tuple is considered to be NULL by PostgreSQL:
test=# SELECT row(NULL, NULL) IS NULL; ?column? ---------- t (1 row)
This is not true for row(10, NULL) – in this case the query returns false. True is only returned if all fields are NULL. However, there is one thing which might as a surprise to some people. The “IS” keyword won’t work if you are comparing the output of two “row” functions:
test=# SELECT row(NULL, NULL) IS row(NULL, NULL); ERROR: syntax error at or near "row" LINE 1: SELECT row(NULL, NULL) IS row(NULL, NULL);
PostgreSQL will immediately issue a syntax error.
NULL handling in LIMIT clauses
Some time ago I saw some people using NULL in LIMIT / OFFSET clauses. That is somewhat scary but still and interesting issue to think about. Consider the following example:
test=# CREATE TABLE demo (id int); CREATE TABLE test=# INSERT INTO demo VALUES (1), (2), (3); INSERT 0 3
The table simply contains 3 rows. Here is what LIMIT NULL does:
test=# SELECT * FROM demo LIMIT NULL; id ---- 1 2 3 (3 rows)
As you can see the entire resultset will be returned. That makes sense because PostgreSQL does not really know when to stop returning rows. Thus, the query is equivalent to “SELECT * FROM demo”. In PostgreSQL there is also an ANSI SQL compliant way to limit the result of a query: FETCH FIRST … ROWS ONLY” is the “proper” way to limit the result of a query. In PostgreSQL 11 “FETCH FIRST ROWS ONLY” will also accept NULL and behave the same way as LIMIT NULL. Here is an example:
test=# SELECT * FROM demo FETCH FIRST NULL ROWS ONLY; id ---- 1 2 3 (3 rows)
Mind that this was not always the case. Old versions of PostgreSQL did not accept a NULL value here.
NULL handling in ORDER BY clauses
NULL values are especially tricky if you want to sort data. Usually NULL values appear at the end of a sorted list. The following listing shows an example:
test=# INSERT INTO demo VALUES (NULL); INSERT 0 1 test=# SELECT * FROM demo ORDER BY id DESC; id ---- 3 2 1 (4 rows)
The point is: Suppose you want to sort products by price. The most expensive ones or most likely not the ones without a price. Therefore it is usually a good idea to out NULL values at the end of the list if you are ordering descending.
Here is how it works:
test=# SELECT * FROM demo ORDER BY id DESC NULLS LAST; id ---- 3 2 1 (4 rows)
Having the NULL values at the end is somewhat more intuitive and usually offers better user experience.
NULL and sum, count, avg, etc.
The way NULLs are handled are also important if you want to run a more analytical type of workload. In general rule is simple: Aggregate functions will simply ignore NULL values. The only exception to the rule is count(*). Here is an example:
test=# SELECT count(*), count(id) FROM demo; count | count -------+------- 4 | 3 (1 row)
count(*) will count ALL the rows – regardless of its content. count(column) will only count the not-NULL values inside a column, which is simply a different thing than just counting everything. Let us take a look at the next example:
test=# SELECT sum(id), avg(id) FROM demo; sum | avg -----+-------------------- 6 | 2.0000000000000000 (1 row)
As I have stated before: The aggregates do not count the NULL values, which means that the average of those 4 rows will be 2 and not 1.5.
The fact that count(*) counts all rows can create subtle bugs when used in an outer join. Consider the following example:
SELECT name, count(*) FROM person AS a LEFT JOIN house AS b ON a.id = b.person_id GROUP BY name;
In this case every count will be at least 1 – even if the person in the list has no house. Keep in mind: The LEFT JOIN will add NULL values to the right side of the join. count(*) will count those NULL values and therefore even the poorest fellow will end up with at least one house. count(*) and outer joins are usually an alarm signal and should be handled with care.