PG Casts

Generating Fake Email Addresses

Episode #4 hosted by Josh Branchaud

In this episode, we will see how we can quickly generate a bunch of fake emails with nothing more than a fancy Postgres statement.

Postgres Version 9.5

Transcript

Sometimes you just need a bunch of fake data. Perhaps you want to compare the relative performance of various queries or maybe you need a big table to try out that new Postgres feature.

In this episode, we will see how we can quickly generate a bunch of fake emails with nothing more than a fancy Postgres statement.

We will work with a users table that has an email address field.

create table users (
  id serial primary key,
  email varchar not null unique
);

If we just need 2 or 3 records, then writing a couple insert statements should suffice. However, what if we need a lot of records, like 10,000 records?

To start, we need 10,000 of something. The generate_series() function will help us with that.

select generate_series(1,10000);

That gives us 10,000 integers. Now we need a way to turn those integers into emails. We can do that with some string concatenation.

select 'person' || num || '@example.com'
  from generate_series(1,10000) as num;

Great. We have 10,000 unique, fake emails. The next step is to insert them into our users table. For this we can use an insert statement with a select clause.

insert into users (email)
  select 'person' || num || '@example.com'
  from generate_series(1,10000) as num;

We can run the table command to see that everything was inserted as expected.

table users;

We can even take this all a bit further by adding some variation. How about some random email host names? We can do this with a little more string concatenation and a case statement nested in a subquery.

select
  'person' || num || '@' ||
  (case (random() * 2)::integer
    when 0 then 'gmail'
    when 1 then 'hotmail'
    when 2 then 'yahoo'
  end) || '.com'
from generate_series(1,10000) as num;

As each row from the generate_series is processed, we will get a new random number from 0 to 2. The case statement then produces one of the three host names based on that random number. This of course gets concatenated into the rest of the email string.

I leave the task of actually inserting these fancier emails into our table as an exercise for the viewer.

That's it for this episode. Thanks for watching.