Welcome to this exploration of
Advanced SQL - Window FunctionsPart 2.
** To learn frame type - ROWS, RANGE, and GROUPS, refer to the
part 1 of this blog series
RECAP:
What are window functions?
With SQL:2003, the ISO SQL Standard introduced window functions, a new mode of row-based computation:
Input Output
Aggregate functions(sum, avg, count) group of rows → row (one per group)
window function row vicinity → row (one per row)
- Window functions operate on a set of rows ( commonly known as window or frame or row vicinity) and return a single value for each row.
- The term window describes the set of rows on which the functionoperates.
- A window function uses values from the rows in a window.
In the Part 1 of this blog series, we learnt how to create a frame using frame_types - ROWS, RANGE and GROUPS. Now we’ll see more about the other functionality that window functions provide.
1. EXCLUDE
<window_function> OVER (ORDER BY A₁,...,Aₙ [ RANGE frame EXCLUDE ])
In the above table, let's consider Q4 as the current row
Frame is BETWEEN 1 PRECEDING AND CURRENT ROW
EXCLUDE NO OTHERS -
EXCLUDE CURRENT ROW
EXCLUDE GROUP
EXCLUDE TIES
Query 1 : EXCLUDE example
SELECT w.row AS "current row",
w.a,
array_agg(w.row) OVER win AS "rows in frame",
array_agg(w.row) OVER win1 AS "EXCLUDE CURRENT ROW",
array_agg(w.row) OVER win2 AS "EXCLUDE GROUP",
array_agg(w.row) OVER win3 AS "EXCLUDE TIES"
FROM sample AS w
WINDOW win AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW),
win1 AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE CURRENT ROW),
win2 AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE GROUP),
win3 AS (ORDER BY w.a RANGE BETWEEN 1 PRECEDING AND CURRENT ROW EXCLUDE TIES)
ORDER BY w.a;
current row | a | rows in frame | EXCLUDE CURRENT ROW | EXCLUDE GROUP | EXCLUDE TIES ------------+---+---------------+---------------------+---------------+--------------- Q1 | 1 | {Q1} | | | {Q1} Q2 | 2 | {Q1,Q2} | {Q1} | {Q1} | {Q1,Q2} Q3 | 3 | {Q2,Q3,Q4,Q5} | {Q2,Q4,Q5} | {Q2} | {Q2,Q3} Q4 | 3 | {Q2,Q3,Q4,Q5} | {Q2,Q3,Q5} | {Q2} | {Q2,Q4} Q5 | 3 | {Q2,Q3,Q4,Q5} | {Q2,Q3,Q4} | {Q2} | {Q2,Q5} Q6 | 4 | {Q3,Q4,Q5,Q6} | {Q3,Q4,Q5} | {Q3,Q4,Q5} | {Q3,Q4,Q5,Q6} Q7 | 6 | {Q7,Q8} | {Q8} | | {Q7} Q8 | 6 | {Q7,Q8} | {Q7} | | {Q8} Q9 | 7 | {Q7,Q8,Q9} | {Q7,Q8} | {Q7,Q8} | {Q7,Q8,Q9}
Explanation -
Let us consider Q5 as our current row for the frame
RANGE BETWEEN 1 PRECEDING AND CURRENT ROW
Rows in frame are Q2, Q3, Q4, Q5 - (Q3, Q4 and Q5 are 1 unit
as they have the same value and the frame type is RANGE)
EXCLUDE CURRENT ROW - Q2, Q3, Q4 (Q5 is removed as it is the
current row)
EXCLUDE GROUP - Q2 (Q3, Q4 and Q5 are removed as it a
group)
EXCLUDE TIES - Q2, Q5 (Q3 and Q4 are removed as it was tied
with Q5 as a peers)
Django Query 1 : EXCLUDE example
Django does not have explicit functions that perform EXCLUDE operations within window functions.
PARTITION BY
Optionally, we may partition the input table before rows are sorted and window frames are determined:
<window_function> OVER (PARTITION BY p1 ORDER BY A₁,...,Aₙ [ RANGE frame])
1. ROWS
- Frames never cross partitions.
- BETWEEN ⋯ PRECEDING AND ⋯ FOLLOWING
- respects partition boundaries.
Query 2 : PARTITION BY example
SELECT w.row AS "current row",
w.a,
w.b AS "partition",
COUNT(*) OVER win AS "frame size",
array_agg(w.row) OVER win AS "rows in frame",
SUM(w.a) OVER win AS "∑ a (so far)"
FROM sample AS w
WINDOW win AS (PARTITION BY w.b ORDER BY w.a ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
ORDER BY w.b, w.a;
OUTPUT-
current row | a | partition | frame size | rows in frame | Σ a (so far) ------------+---+-----------+------------+------------------+-------------- Q1 | 1 | false | 1 | {Q1} | 1 => 1 Q4 | 3 | false | 2 | {Q1,Q4} | 4 => 1 + 3 Q8 | 6 | false | 3 | {Q1,Q4,Q8} | 10 => 1 + 3 + 6 Q7 | 6 | false | 4 | {Q1,Q4,Q8,Q7} | 16 => 1 + 3 + 6 + 6 Q2 | 2 | true | 1 | {Q2} | 2 => 2 Q5 | 3 | true | 2 | {Q2,Q5} | 5 => 2 + 3 Q3 | 3 | true | 3 | {Q2,Q5,Q3} | 8 => 2 + 3 + 3 Q6 | 4 | true | 4 | {Q2,Q5,Q3,Q6} | 12 => 2 + 3 + 3 + 4 Q9 | 7 | true | 5 | {Q2,Q5,Q3,Q6,Q9} | 19 => 2 + 3 + 3 + 4 + 7
Explanation -
(PARTITION BY w.b
ORDER BY w.a ROWS BETWEEN 1
PRECEDING AND 2 FOLLOWING)
PARTITION BY w.b - table is split
into 2 partitions - true partition, false partition
ORDER BY w.a - Inside each
partition, data is ordered by 'a-int' column When Q2 is the current
row, It'll not consider Q8 as 1 preceding as it is in a different
partition.
Django Query 2 : PARTITION BY example
from django.db.models import Avg, F, RowRange, Window, Count, Sum
window = {
'partition_by': [F('b')],
'frame': RowRange(start=None, end=0), // Unbounded preceding and current row
'order_by': F('a').asc()
}
qs = Sample.objects.annotate(
frame_size = Window(
expression=Count('a'), **window
),
running_sum = Window(
expression=Sum('a'), **window
),
).order_by('b')
for i in qs:
print(i.row, i.a, i.b, i.frame_size, i.running_sum)
OUTPUT-
Q1 1 False 1 1
Q4 3 False 2 4
Q7 6 False 3 10
Q8 6 False 4 16
Q2 2 True 1 2
Q3 3 True 2 5
Q5 3 True 3 8
Q6 4 True 4 12
Q9 7 True 5 19
3. LAG/LEAD
Lag/ Lead functions can be used to get the row at the offset +- n relative to the current row.
LAG/LEAD ( e, n, d ) OVER ( [ PARTITION BY p1 ... pm ]
[ ORDER BY e1 ... en ] )
Scope of LAG/LEAD is the partition. That means members/records
outside the partition cannot be accessed.
e - field of row variable
n - offset value +/- 2
d - if offset value is out of range, then return d
If d is not specified, then it returns NULL
The frame of the current row is irrelevant for
LAG/LEAD.
Query 3 : LAG/LEAD example
SELECT w.row AS "current row",
w.a AS a,
w.b AS "partition",
LAG (w.row, 1, 'no row') OVER win AS "lag",
LEAD(w.row, 1, 'no row') OVER win AS "lead"
FROM sample AS w
WINDOW win AS (PARTITION BY w.b ORDER BY w.a)
ORDER BY w.b, w.a;
OUTPUT -
current row | a | partition | lag | lead ------------+---+-----------+--------+-------- Q1 | 1 | false | no row | Q4 Q4 | 3 | false | Q1 | Q8 Q8 | 6 | false | Q4 | Q7 Q7 | 6 | false | Q8 | no row Q2 | 2 | true | no row | Q5 Q5 | 3 | true | Q2 | Q3 Q3 | 3 | true | Q5 | Q6 Q6 | 4 | true | Q3 | Q9 Q9 | 7 | true | Q6 | no row
Explanation -
WIndow function is partitioned by 'b'. Therefore, the table is split
into 2 partitions - true partition, and false partition.
For current row Q7, Lag(1) is Q8 as it is the previous value.
Lead(1) is “no row” as it is the last record in the false partition.
Django Query 3 : LAG/LEAD example
from django.db.models.functions import Lag, Lead
window = {
'partition_by': [F('b')],
'frame': RowRange(start=None, end=0),
'order_by': F('a').asc()
}
qs = Sample.objects.annotate(
previous = Window(
expression=Lag('a', 1), **window
),
next_val = Window(
expression=Lead('a', 1), **window
),
).order_by('b')
for i in qs:
print(i.row, i.a, i.b, i.previous, i.next_val)
OUTPUT -
Q1 1 False None 3
Q4 3 False 1 6
Q7 6 False 3 6
Q8 6 False 6 None
Q2 2 True None 3
Q3 3 True 2 3
Q5 3 True 3 4
Q6 4 True 3 7
Q9 7 True 4 None
This is not the end of window functions. There are more interesting
and useful functions coming up in the next part of this window
function blog series.
… to be continued