/* WINDOW FUNCTIONS - operates on a window (set) of rows. Includes ranking, offest, aggregate, and distribution functions. The OVER clause defines a window or user-specified set of rows within a query result set, which is used before the window function then computes a value for each row in the window.
Some notable window functions:
RANKING FUNCTIONS:
RANK - returns the rank of each row within the partition of a result set. May include ties and gaps.
DENSE_RANK - returns the rank of each row within the partition of a result set. May include ties but will not include gaps.
ROW_NUMBER - returns a unique sequential row number within partition based on current order.
NTILE - distributes the rows in an ordered partition into a specified number of groups. Returns the number of the group to which the current row belongs.
OFFSET FUNCTIONS:
LAG - returns an expression from a previous row that is a defined offset from the current row. Returns NULL if no row at specified position.
LEAD - returns an expression from a later row that is a defined offset from the curret row. Returns NULL if no row at specified position.
FIRST_VALUE - returns the first value in the current window frame. Requires window ordering to be meaningful.
LAST_VALUE - returns the last value in the current window frame. Requires window ordering to be meaningful.
*/
-- RANK example, which looks at values and gives them a numeric ranking based on the value.
SELECT TOP 3 ProductID, Name, ListPrice
RANK() OVER(ORDER BY ListPrice DESC) AS RankByPrice
FROM products1
ORDER BY RankByPrice;
-- Using ROW_NUMBER to count the rows, and RANK to rank them along with a PARTION (the ranking restarts each time a new SalesPersonID comes up in the result set: TITION (allows you to reset the numbering each time a new partition comes up.
SELECT SalesPersonID, TotalDue,
ROW_NUMBER() OVER(ORDER BY SalesPersonID, TotalDue) AS 'Row Count',
RANK() OVER(PARTITION BY SalesPersonID ORDER BY SalesPersonID, TotalDue) AS 'Rank by SalesPerson'
FROM Sales.SalesOrderHeader
ORDER BY SalesPersonID;
/* Excerpt from results. Note how ROW_NUMBER is not partitioned, so it counts the rows throughout entire result set. RANK is partioned by SalesPersonID, so it restarts the ranking when a new SalesPersonID comes up:
SalesPersonID TotalDue Row Count Rank by SalesPerson
...
274 95426.0186 46 46
274 97248.3525 47 47
274 126852.1615 48 48
275 6.3484 49 1
275 12.0152 50 2
275 18.3792 51 3
278 22.3061 52 1
278 25.3868 53 2
278 27.4401 54 3
278 27.4401 54 4
...
*/
-- using an aggregate function with OVER will give you an aggregate for that window. Below, SUM on Amount ORDERed BY Territory will show total sales for that Territory
SELECT *,
SUM(Amount) OVER(ORDER BY Territory) AS 'Total Sales for Territory'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/* Results - note how 'Total Sales for Territory' shows total sales per Territory:
SalesPerson Territory SalesYear Amount Total Sales for Territory
12 1 2011 89.20 9288.48
12 1 2012 239.30 9288.48
12 1 2012 37.98 9288.48
12 1 2013 8922.00 9288.48
819 3 2011 353.20 3140.80
819 3 2014 2523.20 3140.80
819 3 2017 22.20 3140.80
819 3 2018 242.20 3140.80
224 5 2010 10.00 14931.48
224 5 2011 11.20 14931.48
2522 5 2012 347.20 14931.48
2522 5 2013 455.20 14931.48
2522 5 2013 555.20 14931.48
2522 5 2017 777.20 14931.48
2522 5 2019 346.20 14931.48
*/
-- creating running total using partion on Territory, so the cumulative count will restart at each new Territory.
SELECT *,
SUM(Amount) OVER(
PARTITION BY Territory
ORDER BY Territory, SalesYear, Amount
) AS 'Running Total Sales'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/* Results - note how 'Running Total Sales' shows cumulative total sales, and the count restarts each time a new Territory comes about:
SalesPerson Territory SalesYear Amount Running Total Sales
12 1 2011 89.20 89.20
12 1 2012 37.98 127.18
12 1 2012 239.30 366.48
12 1 2013 8922.00 9288.48
819 3 2011 353.20 353.20
819 3 2014 2523.20 2876.40
819 3 2017 22.20 2898.60
819 3 2018 242.20 3140.80
224 5 2010 10.00 10.00
224 5 2011 11.20 21.20
2522 5 2012 347.20 368.40
2522 5 2013 455.20 823.60
2522 5 2013 555.20 1378.80
2522 5 2017 777.20 2156.00
2522 5 2019 346.20 2502.20
*/
-- since you're ORDERing BY Territory, SalesYear, and Amount, you don't have to use ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. That clause is implied, and the running total will increase each time a new combination of Territory, SalesYear, and Amount comes up, which in this case is each line in the results set.
SELECT *,
SUM(Amount) OVER(
PARTITION BY Territory
ORDER BY Territory, SalesYear, Amount
) AS 'Running Total Sales'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
-- example of running total that uses less ordering which will make the running total increase less frequently. In this case, the total will partition by Territory (it'll restart with each new Territory), and it'll only increase when a new SalesYear comes up per the ORDER BY clause.
SELECT *,
SUM(Amount) OVER(
PARTITION BY Territory
ORDER BY SalesYear
) AS 'Running Total Sales'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/* Results - note how 'Running Total Sales' shows cumulative total sales, but the count only increases when a new Territory and SalesYear combination comes up. Note how there is the same 'Running Total' number for both Territory 1 SalesYear 2012 lines:
SalesPerson Territory SalesYear Amount Running Total Sales
12 1 2011 89.20 89.20
12 1 2012 239.30 366.48
12 1 2012 37.98 366.48
12 1 2013 8922.00 9288.48
819 3 2011 353.20 353.20
819 3 2014 2523.20 2876.40
819 3 2017 22.20 2898.60
819 3 2018 242.20 3140.80
224 5 2010 10.00 10.00
224 5 2011 11.20 21.20
2522 5 2012 347.20 368.40
2522 5 2013 455.20 1378.80
2522 5 2013 555.20 1378.80
2522 5 2017 777.20 2156.00
2522 5 2019 346.20 2502.20
*/
-- how to use aggregate with OVER to calculate a percentage of the window, in this case, what percentage each amount constitutes for the Total Sales for the Territory.
SELECT *,
SUM(Amount) OVER(ORDER BY Territory) AS 'Total Sales for Territory',
CAST(Amount / SUM(Amount) OVER(ORDER BY Territory) *100 AS DECIMAL(5,2)) AS 'Percentage of Territory'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/*Results:
SalesPerson Territory SalesYear Amount Total Sales for Territory Percentage of Territory
----------- ----------- ----------- ----------- ------------------------- -----------------------
12 1 2011 89.20 9288.48 0.96
12 1 2012 239.30 9288.48 2.57
12 1 2012 37.98 9288.48 0.40
12 1 2013 8922.00 9288.48 96.05
819 3 2011 353.20 12429.28 2.84
819 3 2014 2523.20 12429.28 20.30
819 3 2017 22.20 12429.28 0.17
819 3 2018 242.20 12429.28 1.94
224 5 2010 10.00 14931.48 0.06
224 5 2011 11.20 14931.48 0.07
2522 5 2012 347.20 14931.48 2.32
2522 5 2013 455.20 14931.48 3.04
2522 5 2013 555.20 14931.48 3.71
2522 5 2017 777.20 14931.48 5.20
2522 5 2019 346.20 14931.48 2.31
*/
-- how to use AVG with OVER to calculate a moving average. In this case, the average between the calues is calculated with each new SalesYear and Amount combination comes up, and the average resets with each new Territory that comes up.
SELECT *,
AVG(Amount) OVER(Partition BY Territory ORDER BY SalesYear, Amount) AS 'Moving Average for Territory'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/*Results:
SalesPerson Territory SalesYear Amount Moving Average for Territory
----------- ----------- ----------- --------------------- ----------------------------
12 1 2011 89.20 89.20
12 1 2012 37.98 63.59
12 1 2012 239.30 122.16
12 1 2013 8922.00 2322.12
819 3 2011 353.20 353.20
819 3 2014 2523.20 1438.20
819 3 2017 22.20 966.20
819 3 2018 242.20 785.20
224 5 2010 10.00 10.00
224 5 2011 11.20 10.60
2522 5 2012 347.20 122.80
2522 5 2013 455.20 205.90
2522 5 2013 555.20 275.76
2522 5 2017 777.20 359.3333
2522 5 2019 346.20 357.4571
*/
-- normally the OVER clause acts over the rows specified in the ORDER BY clause (it will recalculate each time a new combination in the ORDER BY comes up), and it will partition by the PARTITION BY clause. You can also manually specify the rows the OVER claue will act upon by including a ROWS clause. In the example below, it SUMs the current row and the 2 rows after. Note how it only acts upon rows within the partition. Two rows from the end of the partition, it only counts that row and one after, and in the last row only counts that row (since the following rows are outside the partition).
SELECT *,
SUM(Amount) OVER(
PARTITION BY Territory
ORDER BY Territory, SalesYear, Amount
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) AS 'Running Total Sales'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/*Results:
SalesPerson Territory SalesYear Amount Running Total Sales
----------- ----------- ----------- --------------------- ---------------------
12 1 2011 89.20 366.48
12 1 2012 37.98 9199.28
12 1 2012 239.30 9161.30
12 1 2013 8922.00 8922.00
819 3 2011 353.20 2898.60
819 3 2014 2523.20 2787.60
819 3 2017 22.20 264.40
819 3 2018 242.20 242.20
224 5 2010 10.00 368.40
224 5 2011 11.20 813.60
2522 5 2012 347.20 1357.60
2522 5 2013 455.20 1787.60
2522 5 2013 555.20 1678.60
2522 5 2017 777.20 1123.40
2522 5 2019 346.20 346.20
*/
-- creating a running total through entire result set by adding ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Note how 'Running Total Sales' shows cumulative total.
SELECT *,
SUM(Amount) OVER(
ORDER BY Territory
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS 'Running Total Sales'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/* Results - note how 'Running Total Sales' shows total sales per Territory:
SalesPerson Territory SalesYear Amount Running Total Sales
12 1 2011 89.20 89.20
12 1 2012 239.30 328.50
12 1 2012 37.98 366.48
12 1 2013 8922.00 9288.48
819 3 2011 353.20 9641.68
819 3 2014 2523.20 12164.88
819 3 2017 22.20 12187.08
819 3 2018 242.20 12429.28
224 5 2010 10.00 14920.28
224 5 2011 11.20 14931.48
2522 5 2012 347.20 12776.48
2522 5 2013 455.20 13231.68
2522 5 2013 555.20 13786.88
2522 5 2017 777.20 14564.08
2522 5 2019 346.20 14910.28
*/
-- LAG, allows you to compare other rows farther back in the result set. Format is LAG(expression to look at, what the offset is, default value if lagged row can't be found). In the example below, the LAG looks at the Amount column, and the 2 offset tells it to look two rows back (ORDERed BY Territory and the SalesYear). In the first two rows (before it's able to find a row 2 rows back), it returns the default of 0.
SELECT *,
LAG(Amount, 2, 0) OVER(
ORDER BY Territory, SalesYear
) AS 'Amount from 2 Rows Back'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/*Results:
SalesPerson Territory SalesYear Amount Amount from 2 Rows Back
----------- ----------- ----------- --------------------- -----------------------
12 1 2011 89.20 0.00
12 1 2012 239.30 0.00
12 1 2012 37.98 89.20
12 1 2013 8922.00 239.30
819 3 2011 353.20 37.98
819 3 2014 2523.20 8922.00
819 3 2017 22.20 353.20
819 3 2018 242.20 2523.20
224 5 2010 10.00 22.20
224 5 2011 11.20 242.20
2522 5 2012 347.20 10.00
2522 5 2013 455.20 11.20
2522 5 2013 555.20 347.20
2522 5 2017 777.20 455.20
2522 5 2019 346.20 555.20
*/
-- LEAD does the opposite of LAG, allows you to compare other rows ahead in the result set. Format is LEAD(expression to look at, what the offset is, default value if lagged row can't be found). In the example below, the LEAD looks at the Amount column, and the 3 offset tells it to look two rows ahead (ORDERed BY Territory and the SalesYear). In the last three rows (before it's able to find a row 3 rows ahead), it returns the default of 0.
SELECT *,
LEAD(Amount, 3, 0) OVER(
ORDER BY Territory, SalesYear
) AS 'Amount from 3 Rows Ahead'
FROM windowTester
ORDER BY Territory, SalesPerson, SalesYear;
/*Results:
SalesPerson Territory SalesYear Amount Amount from 3 Rows Ahead
----------- ----------- ----------- --------------------- ------------------------
12 1 2011 89.20 8922.00
12 1 2012 239.30 353.20
12 1 2012 37.98 2523.20
12 1 2013 8922.00 22.20
819 3 2011 353.20 242.20
819 3 2014 2523.20 10.00
819 3 2017 22.20 11.20
819 3 2018 242.20 347.20
224 5 2010 10.00 455.20
224 5 2011 11.20 555.20
2522 5 2012 347.20 777.20
2522 5 2013 455.20 346.20
2522 5 2013 555.20 0.00
2522 5 2017 777.20 0.00
2522 5 2019 346.20 0.00
*/
/*
LAG and LEAD can be used to evalutate one row based on the value of the previous row. You just have to make sure that the ORDER BY in the ORDER clause of the LAG/LEAD window funciton
matched the ORDER BY clause of the entire query (so that the window function is determining order in the same way that you)
*/
SELECT customer_no
, key_value
-- show the previous row's value (just for illustrative purposes, not needed for the row_next_value window function that follows)
, previous_row_value = lag(key_value, 1, 'not matched') OVER(PARTITION BY customer_no ORDER BY customer_no, last_update_dt)
-- flag the current row based on the value if the previous row, ORDER BY must match the ORDER BY of the entire query
, row_value_repeat =
CASE WHEN key_value = lag(key_value, 1, 'not matched') OVER(PARTITION BY customer_no ORDER BY customer_no, last_update_dt)
THEN 'SAME'
ELSE 'NOT SAME'
END
FROM TX_CUST_KEYWORD where customer_no IN(
ORDER BY customer_no, last_update_dt
/* Results in the row_value_repeat column:
customer_no key_value previous_row_value row_value_repeat
----------- ------------------------------ ------------------------------ ----------------
236097 1946 GRADUATE/PROFESSIONAL DEGREE NOT SAME
236108 2214764 not matched NOT SAME
236144 360 not matched NOT SAME
239830 4250 4250 SAME
239830 4250 4250 SAME
239830 4250 4250 SAME
239830 4250 4250 SAME
239830 4250 4250 SAME
239830 4250 4250 SAME
236144 2292975 360 NOT SAME
236144 360 2292975 NOT SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
236144 360 360 SAME
*/
-- how to segment by quartile by using NTILE. Format is NTILE(number of segments), so NTILE(4) will break the results into quartiles, NTILE(10) will break them into deciles, etc.
SELECT *,
NTILE(10) OVER(ORDER BY Amount) AS 'Quartile'
FROM windowTester
ORDER BY Quartile, Amount;
/* Results:
SalesPerson Territory SalesYear Amount Quartile
----------- ----------- ----------- --------------------- --------------------
224 5 2010 10.00 1
224 5 2011 11.20 1
819 3 2017 22.20 1
12 1 2012 37.98 1
12 1 2011 89.20 2
12 1 2012 239.30 2
819 3 2018 242.20 2
2522 5 2019 346.20 2
2522 5 2012 347.20 3
819 3 2011 353.20 3
2522 5 2013 455.20 3
2522 5 2013 555.20 3
2522 5 2017 777.20 4
819 3 2014 2523.20 4
12 1 2013 8922.00 4
(15 row(s) affected)
*/
-- example using NTILE to assign by decile
SELECT *,
NTILE(10) OVER(ORDER BY Amount) AS 'Decile'
FROM windowTester
ORDER BY Decile, Amount;
/* Results:
SalesPerson Territory SalesYear Amount Decile
----------- ----------- ----------- --------------------- --------------------
224 5 2010 10.00 1
224 5 2011 11.20 1
819 3 2017 22.20 2
12 1 2012 37.98 2
12 1 2011 89.20 3
12 1 2012 239.30 3
819 3 2018 242.20 4
2522 5 2019 346.20 4
2522 5 2012 347.20 5
819 3 2011 353.20 5
2522 5 2013 455.20 6
2522 5 2013 555.20 7
2522 5 2017 777.20 8
819 3 2014 2523.20 9
12 1 2013 8922.00 10
(15 row(s) affected)
*/
-- how to calculate the percentile of an array of values using PERCENTILE_DISC and PERCENTILE_CONT. PERCENTILE_DISC will find the percentile by using a value in the array, and PERCENTILE_CONT will interpolate the value (meaning if the median is between two values in the array, it will take the average of the two values). Examples of both PERCENTILE_DISC and PERCENTILE_CONT are below using the median percentile value 0.5. While the two PERCENTILE clauses leave a blank in-between the () in the OVER clause, note that you can insert a PARTION BY clause in-between the () to partition the clause - however, you can't use an ORDER BY clause within the () of the OVER clause when using one of the PERCENTILE functions.
SELECT Product, VendorID, Price,
PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY Price)
OVER () AS MedianCont,
PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY Price)
OVER () AS MedianDisc
FROM products1
ORDER BY Price, VendorID;
/* Results - note that PERCENTILE_CONT shows the actual median which is the average of 27.30 and 200.00, while PERCENTILE_DISC returns 27.30 which is an actual value in the array
Product VendorID Price MedianCont MedianDisc
---------------- ----------- ------- ----------- ------------
Computer 1 1.20 113.65 27.30
Soda 1 5.78 113.65 27.30
Seed 5 12.50 113.65 27.30
Keyboard 2 17.75 113.65 27.30
Keyboard Stand 2 27.30 113.65 27.30
Baseball Bat 3 200.00 113.65 27.30
Flag 2 287.50 113.65 27.30
Guitar 5 300.75 113.65 27.30
Computer Desk 4 500.00 113.65 27.30
Guitar 1 2320.20 113.65 27.30
(10 row(s) affected)
*/