ILIKE with Case Insensitive Data
Episode #14 hosted by Josh Branchaud
In this episode we will look at the ILIKE operator and see why it may not be well suited for doing full case-insensitive string comparisons.
Postgres Version 9.5
In previous episodes, we looked at different ways of working with case-insensitive data. We wanted to query for user records by their email addresses in a case-insensitive way without sacrificing performance.
One approach was to use the lower()
function for comparisons in conjunction
with a functional index. Another approach was to pull in the citext
module.
Both of these approaches get the job done and they have pretty similar
performance characteristics. However, they required us to jump through some
hoops by either creating an additional index or pulling in an extension. You
may be wondering, wouldn't it be easier to just use the ilike
operator for
case-insensitive string comparisons?
Let's see.
We already have a users
table with 10,000 records.
\d users
select * from users limit 10;
First, let's perform an explain analyze
when querying for a user record by
email.
explain analyze select * from users where lower(email) = lower('person5000@example.com');
And now, let's do a similar query ditching the uses of the lower()
function and instead using the ilike
operator.
explain analyze select * from users where email ilike 'person5000@example.com';
The ilike
query is going to perform much more slowly. It forces Postgres
to do a full sequential scan. This is because it is not able to utilize our
functional b-tree index. It wouldn't be able to utilize the more basic
b-tree index either. The ilike
operator and its friends are intended for
pattern matching comparisons, so they are not able to take advantage of
b-tree indexes.
For these kinds of full case-insensitive string comparisons we are better off using the previously discussed approaches.
That said, you'd think we'd be able to do indexed queries when working with the pattern matching operators. The trigram module and a gist index can help us there, but that is a topic for another episode.
Until then, may your data be consistent and your queries performant.
create table users (
id serial primary key,
email varchar not null
);
create unique index users_unique_lower_email_idx on users (lower(email));
insert into users (email)
select 'person' || num || '@example.com'
from generate_series(1,10000) as num;