Advanced SQL Part 3 | Window Functions (RANK, ROW_NUMBER, NTILE, FIRST_VALUE).

More from Author
Akash Kumar
Akash

Engineering Lead

6 min read

Welcome to this exploration of Advanced SQL - Window FunctionsPart 3.

** To learn frame type - ROWS, RANGE, and GROUPS, refer to the part 1 of this blog series

** To Discover more about EXCLUDE, PARTITION BY, LAG and LEAD, refer to the part 2 of this blog series

RECAP:

What are window functions?

With SQL:2003, the ISO SQL Standard introduced window functions, a new mode of row-based computation:

Copy Code
              
                                              Input            Output
  Aggregate functions(sum, avg, count)    group of rows →  row (one per group)
  window function                         row vicinity →  row (one per row)
              
          
  • Window functions operate on a set of rows ( commonly known as window or frame or row vicinity) and return a single value for each row.
  • The term window describes the set of rows on which the function operates.
  • A window function uses values from the rows in a window.

In the Part 1 of this blog series, we learnt how to create a frame using frame_types - ROWS, RANGE and GROUPS.

In the part 2 of this blog series, we learnt about different functions like EXCLUDE, PARTITION BY, LAG and LEAD

Now we'll see more about the other functionality that window functions provide.

1. FIRST_VALUE, LAST_VALUE, NTH_VALUE

Aggregates reduce all rows inside a frame to a single value.
Now for something different:

Positional access to individual rows inside a frame is provided by three window functions:
FIRST_VALUE(e)
LAST_VALUE(e)
NTH_VALUE(e,n)

NTH_VALUE(A,s): will return NULL if the index is not present

advance sql window p3

NOTE: FIRST_VALUE(e)NTH_VALUE(e,1).

Query 1 : FIRST_VALUE, LAST_VALUE, NTH_VALUE example

Copy Code
            
  SELECT w."row"                       AS "current row",
         array_agg(w."row")   OVER win AS "rows in frame",
         FIRST_VALUE(w."row") OVER win AS "first row",
         LAST_VALUE(w."row")  OVER win AS "last row",
         NTH_VALUE(w."row",2) OVER win AS "second row"
  FROM   sample AS w
  WINDOW win AS (ORDER BY w.a ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING)
  ORDER BY w.a, w.row;                
            
        

OUTPUT -

  current row |  rows in frame      | first row | last row | second row
  ------------+---------------------+-----------+----------+------------
  Q1          | {Q1,Q2,Q3}          | Q1        | Q3       | Q2
  Q2          | {Q1,Q2,Q3,Q4}       | Q1        | Q4       | Q2
  Q3          | {Q1,Q2,Q3,Q4,Q5}    | Q1        | Q5       | Q2
  Q4          | {Q2,Q3,Q4,Q5,Q6}    | Q2        | Q6       | Q3
  Q5          | {Q3,Q4,Q5,Q6,Q7}    | Q3        | Q7       | Q4
  Q6          | {Q4,Q5,Q6,Q7,Q8}    | Q4        | Q8       | Q5
  Q7          | {Q5,Q6,Q7,Q8,Q9}    | Q5        | Q9       | Q6
  Q8          | {Q6,Q7,Q8,Q9}       | Q6        | Q9       | Q7
  Q9          | {Q7,Q8,Q9}          | Q7        | Q9       | Q8
        

Explanation -

For the current row Q4
Frame type - ROWS

Window - ROWS BETWEEN 2PRECEDING AND 2 FOLLOWING -
{Q2,Q3,Q4,Q5,Q6}
Q2 and Q3 are 2 preceding, Q5 and Q6 are 2 following

FIRST_VALUE = Q2 as that is the first element in the window
LAST_VALUE = Q6 as that is the last element in the window
NTH_VALUE(w.row, 2) - Q3 as it as the 2nd index

Django Query 1 : FIRST_VALUE, LAST_VALUE, NTH_VALUE example

Copy Code
            
  from django.db.models import Avg, F, RowRange, Window, Count, Sum
  from django.db.models.functions import Lag, Lead, LastValue, NthValue, FirstValue

  window = {
  'frame': RowRange(start=-2, end=2),
  'order_by': F('a').asc()
  }

  qs = Sample.objects.annotate(
            first = Window(
               expression=FirstValue('row'), **window
            ), 
		    last = Window(
              expression=LastValue('row'), **window
            ), 
		    second_val = Window(
            expression=NthValue('row', 2), **window
           ),		   
  ).order_by('a')

  for i in qs:
    print(i.row, i.a, i.b, i.first, i.last, i.second_val)
            
        

OUTPUT -

Q1 1 False Q1 Q3 Q2
Q2 2 True Q1 Q4 Q2
Q3 3 True Q1 Q5 Q2
Q4 3 False Q2 Q6 Q3
Q5 3 True Q3 Q7 Q4
Q6 4 True Q4 Q8 Q5
Q7 6 False Q5 Q9 Q6
Q8 6 False Q6 Q9 Q7
Q9 7 True Q7 Q9 Q8

2. ROW_NUMBER(), DENSE_RANK(), RANK()

Countless problem scenarios involve the number (position) or rank of the current row in an ordered sequence of rows.
Family of window functions to number/rank rows:
ROW_NUMBER()
DENSE_RANK()
RANK()

Scope is the partition (if present)—FRAME is irrelevant.

advance sql window p3

ROW_NUMBER - gives incremental row number to all the rows
DENSE_RANK - Peer rows (rows with same values) will share a common rank. No rank numbers are lost.
RANK - Peer rows (rows with same values) will share a common rank. In this case rank numbers are lost.

Query 2 : ROW_NUMBER, DENSE_RANK, RANK example

Copy Code
            
  SELECT w."row"               AS "current row",
         w.a,
         ROW_NUMBER() OVER win AS "ROW_NUMBER",
         DENSE_RANK() OVER win AS "DENSE_RANK",
         RANK()       OVER win AS "RANK"
  FROM   sample AS w
  WINDOW win AS (ORDER BY w.a)
  ORDER BY w.a;
            
        

OUTPUT -

  current row | a | ROW_NUMBER | DENSE_RANK | RANK
  ------------+---+------------+------------+------
    Q1        | 1 |          1 |          1 |    1
    Q2        | 2 |          2 |          2 |    2
    Q3        | 3 |          3 |          3 |    3
    Q4        | 3 |          4 |          3 |    3
    Q5        | 3 |          5 |          3 |    3
    Q6        | 4 |          6 |          4 |    6
    Q7        | 6 |          7 |          5 |    7
    Q8        | 6 |          8 |          5 |    7
    Q9        | 7 |          9 |          6 |    9
            
        

Django Query 2 : ROW_NUMBER, DENSE_RANK, RANK example

Copy Code
            
  from django.db.models import Avg, F, RowRange, Window, Count, Sum
  from django.db.models.functions import Lag, Lead, DenseRank, Rank, RowNumber
            
  window = {
  'order_by': F('a').asc()
  }
            
  qs = Sample.objects.annotate(
            row_number = Window(
            expression=RowNumber(), **window
            ), 
            rank = Window(
            expression=Rank(), **window
            ), 
            dense_rank = Window(
            expression=DenseRank(), **window
            ),		   
  ).order_by('a')
            
  for i in qs:
    print(i.row, i.a, i.b, i.row_number, i.dense_rank, i.rank)
                    
        

OUTPUT -

Q1 1 False 1 1 1
Q2 2 True 2 2 2
Q3 3 True 3 3 3
Q4 3 False 4 3 3
Q5 3 True 5 3 3
Q6 4 True 6 4 6
Q7 6 False 7 5 7
Q8 6 False 8 5 7
Q9 7 True 9 6 9

3. NTILE

NTILE creates equal chunks or partitions of data.

Example if NTILE(3) is specified, then chunks of 3 records are created.

Query 3 : NTILE example

Copy Code
            
  SELECT w."row"                 AS "current row",
         w.a,
         NTILE(3)       OVER win AS "NTILE(3)"
  FROM   sample AS w  
  WINDOW win AS (ORDER BY w.a)
  ORDER BY w.a;
            
        

OUTPUT -

current row | a | NTILE(3)
------------+---+----------
Q1 | 1 | 1
Q2 | 2 | 1
Q3 | 3 | 1
Q4 | 3 | 2
Q5 | 3 | 2
Q6 | 4 | 2
Q7 | 6 | 3
Q8 | 6 | 3
Q9 | 7 | 3

Django Query 3 : NTILE example

Copy Code
            
  from django.db.models.functions import Ntile

  window = {
  'order_by': F('a').asc()
  }
        
  qs = Sample.objects.annotate(
            ntile = Window(
               expression=Ntile(3), **window
            ), 		   
  ).order_by('a')
                
  for i in qs:
      print(i.row, i.a, i.b, i.ntile)
            
        

OUTPUT -

Q1 1 False 1
Q2 2 True 1
Q3 3 True 1
Q4 3 False 2
Q5 3 True 2
Q6 4 True 2
Q7 6 False 3
Q8 6 False 3
Q9 7 True 3

This is not the end of window functions. We will use all the knowledge gained till now and solve problems in the next part of this window function blog series.

… to be continued

Back To Blogs


contact us