Treat Nulls as Known Values
Episode #17 hosted by Dorian Karter
In this episode we will take a look at using is distinct from and it's counterpart is not distinct from to treat nulls as known values in our queries.
Postgres Version 9.5
Consider the following scenario - you have a table of users with a nullable field that tracks whether a user is signed up for the mailing list.
table users;
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 1 | Jack | Christensen | False | jack_c@example.com |
| 2 | Brian | Dunn | <null> | brian_d@example.com |
| 3 | Chris | Erin | False | chris_e@example.com |
| 4 | Dorian | Karter | True | dorian_k@example.com |
| 5 | Joe | Hashrocket | False | joe_h@example.com |
| 6 | Jane | Hashrocket | <null> | jane_h@example.com |
+----+--------------+-------------+------------+----------------------+
In our example table some of the records have a subscribed
value of true
meaning they have agreed to subscribe to our mailing list, false
if they did
not.
If the users never completed a form that asks them if they want to subscribe
they will have their subscribed
column set to null
. In this case null
has
a special meaning for us and we would want to treat it as a known value.
If we wanted to get all the users who have not yet subscribed to our awesome mailing list, so that we can show them a prompt the next time they sign in, we would likely write a query like this one:
select * from users where subscribed <> true;
You may be surprised that the result of the query above will only yield the following records:
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 1 | Jack | Christensen | False | jack_c@example.com |
| 3 | Chris | Erin | False | chris_e@example.com |
| 5 | Joe | Hashrocket | False | joe_h@example.com |
+----+--------------+-------------+------------+----------------------+
What's going on here?
By default the equality operators do not treat null as a known value and therefore assume it cannot be compared directly to actual values.
If you want to treat null values as known values you can query them explicitly like so:
select * from users where subscribed is null or subscribed <> true;
Which results in:
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 1 | Jack | Christensen | False | jack_c@example.com |
| 2 | Brian | Dunn | <null> | brian_d@example.com |
| 3 | Chris | Erin | False | chris_e@example.com |
| 5 | Joe | Hashrocket | False | joe_h@example.com |
| 6 | Jane | Hashrocket | <null> | jane_h@example.com |
+----+--------------+-------------+------------+----------------------+
This will work, but it is not scalable. When you are doing it once in a small query it may be OK but as queries get bigger this makes the query messy and harder to read.
Fortunately Postgres offers a more idiomatic way to check if a value does not
equal to something, including null values: is distinct from
and is not distinct
from
query parts.
We transform our example to use this syntax:
select * from users where subscribed is distinct from true;
And sure enough we get the correct results:
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 1 | Jack | Christensen | False | jack_c@example.com |
| 2 | Brian | Dunn | <null> | brian_d@example.com |
| 3 | Chris | Erin | False | chris_e@example.com |
| 5 | Joe | Hashrocket | False | joe_h@example.com |
| 6 | Jane | Hashrocket | <null> | jane_h@example.com |
+----+--------------+-------------+------------+----------------------+
Conversely, we can use the is not distinct from
syntax to get records where
the result is equal to the passed value:
select * from users where subscribed is not distinct from true;
This query will return all the rows containing subscribed with a value of true:
+------+--------------+-------------+----------------------+--------------+
| id | first_name | last_name | email | subscribed |
|------+--------------+-------------+----------------------+--------------|
| 4 | Dorian | Karter | dorian_k@example.com | True |
+------+--------------+-------------+----------------------+--------------+
This same syntax will also work for querying null:
select * from users where subscribed is not distinct from null;
+------+--------------+-------------+---------------------+--------------+
| id | first_name | last_name | email | subscribed |
|------+--------------+-------------+---------------------+--------------|
| 2 | Brian | Dunn | brian_d@example.com | <null> |
| 6 | Jane | Hashrocket | jane_h@example.com | <null> |
+------+--------------+-------------+---------------------+--------------+
Where this syntax becomes particularly useful is when you pass parameters to
your SQL query from your programming language of choice. For the sake of
example we will use prepared statements
which are a way of reusing SQL
statements in Postgres with different passed parameters, and are similar to
what your programming language is doing behind the scenes.
We will start by testing what will happen when using a normal equality operator:
prepare users_with_subscription_status(boolean) as select * from users where subscribed =
$1;
This query we just prepared will not work if we try to pass it null, it will also not treat null as a known value.
execute users_with_subscription_status(null);
Result:
+------+--------------+-------------+---------+--------------+
| id | first_name | last_name | email | subscribed |
|------+--------------+-------------+---------+--------------|
+------+--------------+-------------+---------+--------------+
SELECT 0
Now let's test the same query with the non-equal operator:
prepare users_without_subscription_status(boolean) as select * from users where
subscribed <> $1;
execute users_with_subscription_status(null);
Result:
+------+--------------+-------------+---------+--------------+
| id | first_name | last_name | email | subscribed |
|------+--------------+-------------+---------+--------------|
+------+--------------+-------------+---------+--------------+
SELECT 0
It would be really hard to rewrite these types of reusable queries with the
explicit test for null. Instead if we rewrite them with is distinct from
and
is not distinct from
it works flawlessly:
First let's discard all of our previous prepared statements:
discard all;
And then prepare them again with is distinct from
and is not distinct from
:
prepare users_with_subscription_status(boolean) as select * from users where
subscribed is not distinct from $1;
prepare users_without_subscription_status(boolean) as select * from users where
subscribed is distinct from $1;
These updated prepared statements now support both null and actual boolean values. Let's test them:
execute users_without_subscription_status(null);
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 1 | Jack | Christensen | False | jack_c@example.com |
| 3 | Chris | Erin | False | chris_e@example.com |
| 4 | Dorian | Karter | True | dorian_k@example.com |
| 5 | Joe | Hashrocket | False | joe_h@example.com |
+----+--------------+-------------+------------+----------------------+
execute users_without_subscription_status(true);
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 1 | Jack | Christensen | False | jack_c@example.com |
| 2 | Brian | Dunn | <null> | brian_d@example.com |
| 3 | Chris | Erin | False | chris_e@example.com |
| 5 | Joe | Hashrocket | False | joe_h@example.com |
| 6 | Jane | Hashrocket | <null> | jane_h@example.com |
+----+--------------+-------------+------------+----------------------+
execute users_with_subscription_status(true);
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 4 | Dorian | Karter | True | dorian_k@example.com |
+----+--------------+-------------+------------+----------------------+
execute users_with_subscription_status(null);
+----+--------------+-------------+------------+----------------------+
| id | first_name | last_name | subscribed | email |
|----+--------------+-------------+------------+----------------------|
| 2 | Brian | Dunn | <null> | brian_d@example.com |
| 6 | Jane | Hashrocket | <null> | jane_h@example.com |
+----+--------------+-------------+------------+----------------------+
That concludes our episode. Thanks for watching.
CREATE TABLE users (
id serial primary key,
first_name varchar,
last_name varchar,
email varchar not null,
subscribed boolean
);
insert into users (first_name, last_name, subscribed, email) values
('Jack', 'Christensen', false, 'jack_c@example.com'),
('Brian', 'Dunn', null, 'brian_d@example.com'),
('Chris', 'Erin', false, 'chris_e@example.com'),
('Dorian', 'Karter', true, 'dorian_k@example.com'),
('Joe', 'Hashrocket', false, 'joe_h@example.com'),
('Jane', 'Hashrocket', null, 'jane_h@example.com');