Citext for Emails
Episode #13 hosted by Josh Branchaud
In this episode, we're going to be looking at how we can use the citext module when working with case insensitive data.
Postgres Version 9.5
In previous episodes, we explored the issues presented by case-insensitive data:
We had to jump through a number of hoops to make sure that we can efficiently query our tables as well as prevent duplicate records.
We learned how to efficiently query our tables and prevent duplicate
records, but there is an easier way -- the citext
module.
Let's look at how to use it.
First, here is the table that we left off with:
\d old.users
In this table we used the varchar
data type for our email column.
In our new table we will be replacing it with the citext
data type
provided by the citext
module. This data type works just like the text
data type, but during comparisons it internally calls the lower()
function.
Like I said, citext
is going to make things easier.
To get started,
create extension citext;
Then we create our new users
table:
create table users (
id serial primary key,
email citext not null unique
);
and we can take a quick look at the description:
\d users
Let's insert an initial record so that we have something to work with, pay attention to the casing here:
insert into users (email) values ('PERSON1@example.com');
Now we can see the case-insensitive comparison in action by querying for the record with different casing:
select * from users where email = 'person1@example.com';
We find the record and the casing from the original insertion is preserved.
And what if we try to insert a duplicate record with different casing:
insert into users (email) values ('Person1@example.com');
As expected, we get an error, the insertion was prevented by our uniqueness constraint.
Lastly, let's see how the performance compares.
We can insert some additional records to our new users
table:
insert into users (email)
select 'person' || num || '@example.com'
from generate_series(2,10000) as num;
Now we can look at the explain analyze
output for queries on each of the
tables:
explain analyze select * from old.users
where lower(email) = lower('person5000@example.com');
explain analyze select * from users
where email = 'person5000@example.com';
We can see that we get comparable performance when using the citext field with a unique index.
You may be wondering where the ilike
operator fits into all of this. We
will look at that in the following episode.
Until then, may your data be consistent and your queries performant.
create schema old;
create table old.users (
id serial primary key,
email varchar not null unique
);
create unique index users_unique_lower_email_idx on old.users (lower(email));
insert into old.users (email)
select 'person' || num || '@example.com'
from generate_series(1,10000) as num;
create schema old;
create table old.users (
id serial primary key,
email varchar not null unique
);
create unique index users_unique_lower_email_idx on old.users (lower(email));
insert into old.users (email)
select 'person' || num || '@example.com'
from generate_series(1,10000) as num;