PG Casts

ILIKE with Case Insensitive Data

Episode #14 hosted by Josh Branchaud

In this episode we will look at the ILIKE operator and see why it may not be well suited for doing full case-insensitive string comparisons.

Postgres Version 9.5

Transcript

In previous episodes, we looked at different ways of working with case-insensitive data. We wanted to query for user records by their email addresses in a case-insensitive way without sacrificing performance.

One approach was to use the lower() function for comparisons in conjunction with a functional index. Another approach was to pull in the citext module.

Both of these approaches get the job done and they have pretty similar performance characteristics. However, they required us to jump through some hoops by either creating an additional index or pulling in an extension. You may be wondering, wouldn't it be easier to just use the ilike operator for case-insensitive string comparisons?

Let's see.

We already have a users table with 10,000 records.

\d users

select * from users limit 10;

First, let's perform an explain analyze when querying for a user record by email.

explain analyze select * from users where lower(email) = lower('person5000@example.com');

And now, let's do a similar query ditching the uses of the lower() function and instead using the ilike operator.

explain analyze select * from users where email ilike 'person5000@example.com';

The ilike query is going to perform much more slowly. It forces Postgres to do a full sequential scan. This is because it is not able to utilize our functional b-tree index. It wouldn't be able to utilize the more basic b-tree index either. The ilike operator and its friends are intended for pattern matching comparisons, so they are not able to take advantage of b-tree indexes.

For these kinds of full case-insensitive string comparisons we are better off using the previously discussed approaches.

That said, you'd think we'd be able to do indexed queries when working with the pattern matching operators. The trigram module and a gist index can help us there, but that is a topic for another episode.

Until then, may your data be consistent and your queries performant.

Resources

Setup

create table users (
  id serial primary key,
  email varchar not null
);
create unique index users_unique_lower_email_idx on users (lower(email));

insert into users (email)
select 'person' || num || '@example.com'
from generate_series(1,10000) as num;