ILIKE with Case Insensitive Data
Episode #14 hosted by Josh Branchaud
In this episode we will look at the ILIKE operator and see why it may not be well suited for doing full case-insensitive string comparisons.
Postgres Version 9.5
In previous episodes, we looked at different ways of working with case-insensitive data. We wanted to query for user records by their email addresses in a case-insensitive way without sacrificing performance.
One approach was to use the
lower() function for comparisons in conjunction
with a functional index. Another approach was to pull in the
Both of these approaches get the job done and they have pretty similar
performance characteristics. However, they required us to jump through some
hoops by either creating an additional index or pulling in an extension. You
may be wondering, wouldn't it be easier to just use the
ilike operator for
case-insensitive string comparisons?
We already have a
users table with 10,000 records.
\d users select * from users limit 10;
First, let's perform an
explain analyze when querying for a user record by
explain analyze select * from users where lower(email) = lower('firstname.lastname@example.org');
And now, let's do a similar query ditching the uses of the
function and instead using the
explain analyze select * from users where email ilike 'email@example.com';
ilike query is going to perform much more slowly. It forces Postgres
to do a full sequential scan. This is because it is not able to utilize our
functional b-tree index. It wouldn't be able to utilize the more basic
b-tree index either. The
ilike operator and its friends are intended for
pattern matching comparisons, so they are not able to take advantage of
For these kinds of full case-insensitive string comparisons we are better off using the previously discussed approaches.
That said, you'd think we'd be able to do indexed queries when working with the pattern matching operators. The trigram module and a gist index can help us there, but that is a topic for another episode.
Until then, may your data be consistent and your queries performant.
create table users ( id serial primary key, email varchar not null ); create unique index users_unique_lower_email_idx on users (lower(email)); insert into users (email) select 'person' || num || '@example.com' from generate_series(1,10000) as num;