New in Postgres 9.6: Parallel Queries
Episode #21 hosted by Jake Worth
In this episode, we'll be looking at a new feature in Postgres 9.6, parallel queries.
Postgres Version 9.6
Parallel queries provide parallel execution of sequential scans, joins, and aggregates.
To see this feature in action, let's set up a test database. And to make the performance gains obvious, we'll need a lot of data. I made a financial ledger with fifty million random dollar amounts and dates going back ten years; the setup is included at the bottom of the transcript.
Next, we'll write a read-only query that uses a sequential scan. The aggregate
functions are great candidates for this. We'll use sum
to add up the amounts
in our ledger, and tack on explain analyze
for benchmarking and explanation.
explain analyze select sum(amount) from ledger;
Reading the output, we can see that Postgres has chosen to run this query sequentially. Great, just what we need.
Parallel queries are not enabled by default. To turn them on, we need to
increase a config param called max_parallel_workers_per_gather
.
show max_parallel_workers_per_gather;
Let's raise it to four, which happens to be the number of cores on this workstation.
set max_parallel_workers_per_gather to 4;
Explaining the query again, we can see that Postgres is now choosing a parallel query. And it's about four times faster.
explain analyze sum(amount) from ledger;
To sum up, for sequential scans on Postgres 9.6, turn this feature on and enjoy the speed boost.
As I mentioned before, only sequential scans can be parallelized. Let's create a scenario where this feature is unavailable, by adding an index to the table. The indexed column changes things, as we shall see:
create index on ledger(date);
With the indexed column in the query, we don't get to use parallelism:
explain analyze sum(amount) from ledger where date = (current_date - 1);
To enable this feature permanently, add the parameter to your Postgres config file.
That's all for today's episode; thanks for watching.
create table ledger (
id serial primary key,
date date not null,
amount decimal(12,2) not null
);
insert into ledger (date, amount)
select current_date - (random() * 3650)::integer,
(random() * 1000000)::decimal(12,2) - 50000
from generate_series(1,50000000);