PG Casts

Window Functions

Episode #43 hosted by Mary Lee

Hey everyone, today we're going to explore the basics of window functions in Postgres.

Postgres Version 11

Transcript

A window function in Postgres is a function that performs a calculation across a set of rows that are somehow related to the current row.

To see this in action, let's consider our keyboards table.

\d keyboards

By reading from the table, we can see that we have a number of keyboards organized into brands.

select * from keyboards;

If we wanted to compare each keyboard's price to the average price for their brand, we could do this with a window function.

A window function starts with the function we're wanting to execute, which in our case is "avg" so we can find the average price, and is followed by an "over" clause.

The "over" clause in a window function allows us to use a "partition by" clause to specify how the rows are to be split up, or divided into partitions, for the calculation. To break the rows apart by brand, we would provide the "partition by" clause with our column name, "brand".

select *, avg(price) over (partition by brand) from keyboards;

If we provide no arguments to the over clause, the calculation will be executed within a single partition containing all of the rows in the table.

select *, avg(price) over () from keyboards;

Window functions also support an "order by" clause. The "order by" clause in a window function changes the behavior of what's called the window frame. When an "order by" clause is not present, the default window frame consists of all rows in the current partition. Once an "order by" clause is added, the window frame changes to consist of all rows from the start of the partition up to the current row, plus any rows that are equal to the current row according to the specified order.

An easy way to see this framing behavior is to use the sum function and order by price in our window function.

select *, sum(price) over (order by price) from keyboards;

Looking at the first three rows of results, we can see that the second and third rows have the same sum, and that sum is for the first three rows. This is because these two rows share the same price, so the rows fell into the same window frame and were added to the sum at the same time. We can see the same behavior anywhere the keyboards had matching prices.

It's also possible to use the "order by" clause in conjunction with the "partition by" clause. The only thing we have to do is make sure that the "partition by" clause always precedes the "order by".

select *, sum(price) over (partition by brand order by price) from keyboards;

Window functions are executed against the virtual table produced by a query's "from" clause and its filtering clauses, "where", "having", and "group by". This means that window functions cannot be part of the logic of any of those clauses; they are only permitted in the "select" list and "order by" clause.

We can see this in action with a quick example. We'll start by using a window function to determine the ranking for each product's price according to brand. We'll call this value brand_rank.

select *, rank() over (partition by brand order by price) as brand_rank 
from keyboards;

We can see from the output that the returned rows are ordered by brand then by their brand ranking. We could order by brand ranking only by passing our brand_rank alias to the order by clause at the end of the query.

select *, rank() over (partition by brand order by price) as brand_rank 
from keyboards
order by brand_rank;

However, if we try to use the brand rank as part of a where clause, Postgres throws an error, saying that the column doesn't exist. Because window functions execute after "where" clauses, our window function alias is not yet defined when we try use it in our last query.

select *, rank() over (partition by brand order by price) as brand_rank 
from keyboards
where brand_rank <= 3
order by brand_rank;

If we wanted to rank our products by price within their brand and only return the three least expensive products per brand, we could use a window function within a subselect.

select *
from (
  select *, rank() over (partition by brand order by price) as brand_price
  from keyboards
) as ranked_keyboards
where brand_price <= 3;

We can see from our output that our last query worked. This is because the subselect was executed first, allowing us to use the results of the window function in our "where" clause.

Thanks for watching!

Setup

-- Create keyboards table
create table keyboards (
  id serial primary key, 
  name varchar not null,
  brand varchar not null,
  price numeric (11, 2) not null
);

-- Add some keyboards
insert into keyboards (name, brand, price) values
  ('One 2', 'Ducky', 125.00),
  ('One 2 Mini', 'Ducky', 99.00),
  ('Shine 7', 'Ducky', 159.00),
  ('MIYA Pro', 'Ducky', 119.00),
  ('Year of the Pig', 'Ducky', 189.00),
  ('Race 3', 'Vortex', 139.00),
  ('ViBE', 'Vortex', 139.00),
  ('POK3R', 'Vortex', 119.00),
  ('Tab', 'Vortex', 119.00),
  ('Core', 'Vortex', 89.00),
  ('Yoda II', 'Tex', 299.00),
  ('Beetle', 'Tex', 99.00);