laika222
10/27/2016 - 2:30 PM

GROUP BY, GROUPING SETS, ROLLUP, CUBE, GROUPING_ID

/*
Grouping allows you take rows and combine them based on shared values. There are different types of grouping:

GROUP BY - collapses rows that have a shared value into a single row - additional columns that have multiple rows that are collapsed into one must use an aggregate value. For example, if you GROUP BY CustomerID and that CustomerID originally had two orders, for the OrderPrice column you have to use some aggregate value such as SUM, COUNT, MAX, or AVG.

GROUPING SETS - an extention of GROUP BY that allows multiple grouping, good for things like generating grand total. When a NULL comes up in one of the cells, that NULL could be there because it's grouping multiple values (though it might all just be a NULL that's present in your data). GROUPING SETS is how you can identify how you want to group everything, where there are some shortcut groupings that do common grouping tasks like ROLLUP (shows grouping for a heirarchy) and CUBE (which groups by all possible grouping combinations).

ROLLUP - provides a shortcut for defining groups sets with combinations that assume input columns from a hierarchy. Will give grand totals and subtotals for combinations.

CUBE - like ROLLUP, but shows every possible combination. Provides a shortcut for defining goruping sets in which all possible combinations of grouping sets created. This can be good if you're not sure which grouping you want to use, since it'll show you everything (scortched earth grouping).

Finally, in summary rows, GROUPING SETS, ROLLUP, and CUBE will return a NULL value in some columns to show that this is a summary row (for example, if it's showing totals for the Category column, the Category will be listed in that row but all other columns will show NULL, showing you that this is the summary for that Category). This NULL can be indistinguishable from a NULL in the data set. Therefore, the GROUPING_ID function tells you if the NULL is due to a summary row or if there's actually a NULL in the data.
*/

--------------
-- GROUP BY --
--------------

-- GROUP BY - collapses rows that have a shared value into a single row - additional columns that have multiple rows that are collapsed into one must use an aggregate value. For example, if you GROUP BY CustomerID and that CustomerID originally had two orders, for the OrderPrice column you have to use some aggregate value such as SUM, COUNT, MAX, or AVG.
SELECT CustomerID, SUM(Price) 
FROM Orders2
GROUP BY CustomerID
ORDER BY SUM(Price) DESC;

-- more complex example of GROUP BY
SELECT T.[Group] AS N'Region', 
	T.CountryRegionCode AS N'Country', 
	S.Name AS N'Store', 
	H.SalesPersonID,
	SUM(TotalDue) AS N'Total Sales'
FROM Sales.Customer C
    INNER JOIN Sales.Store S
        ON C.StoreID  = S.BusinessEntityID
    INNER JOIN Sales.SalesTerritory T
        ON C.TerritoryID  = T.TerritoryID 
    INNER JOIN Sales.SalesOrderHeader H
        ON C.CustomerID = H.CustomerID
WHERE T.[Group] = N'Europe'
    AND T.CountryRegionCode IN(N'DE', N'FR')
    AND H.SalesPersonID IN(287, 290, 288)
    AND SUBSTRING(S.Name,1,4) IN(N'Vers', N'Spa ')
-- GROUP BY the columns in this order
GROUP BY T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID
ORDER BY T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID;

/* Results from GROUP BY above:

Region   Country Store                             SalesPersonID Total Sales
-------- ------- --------------------------------- ------------- -------------
Europe   DE      Versatile Sporting Goods Company  287           729.6344
Europe   DE      Versatile Sporting Goods Company  288           17073.0655
Europe   FR      Spa and Exercise Outfitters       287           27731.551
Europe   FR      Spa and Exercise Outfitters       290           208479.3505

*/


-------------------
-- GROUPING SETS --
-------------------

/*
GROUPING SETS - an extention of GROUP BY that allows multiple grouping, good for things like generating grand total. When a NULL comes up in one of the cells, that NULL could be there because it's grouping multiple values (though it might all just be a NULL that's present in your data). GROUPING SETS is how you can identify how you want to group everything, where there are some shortcut groupings that do common grouping tasks like ROLLUP (shows grouping for a heirarchy) and CUBE (which groups by all possible grouping combinations).

SELECT [column list with aggregate(s)
FROM [source]
GROUP BY
GROUPING SETS
(
	[columnname], -- one or more columns
	[columnname], -- one or more columns
	() -- empty parentheses if aggregating all rows
);

*/

--example of GROUP BY GROUPING SETS
SELECT T.[Group] AS N'Region', 
	T.CountryRegionCode AS N'Country', 
	S.Name AS N'Store', 
	H.SalesPersonID,
	SUM(TotalDue) AS N'Total Sales'
FROM Sales.Customer C
    INNER JOIN Sales.Store S
        ON C.StoreID  = S.BusinessEntityID
    INNER JOIN Sales.SalesTerritory T
        ON C.TerritoryID  = T.TerritoryID 
    INNER JOIN Sales.SalesOrderHeader H
        ON C.CustomerID = H.CustomerID
WHERE T.[Group] = N'Europe'
    AND T.CountryRegionCode IN(N'DE', N'FR')
    AND H.SalesPersonID IN(287, 290, 288)
    AND SUBSTRING(S.Name,1,4) IN(N'Vers', N'Spa ')
--GROUP BY GROUPING SETS ending with () which will give a grand total
GROUP BY GROUPING SETS (T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID, ())
ORDER BY T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID;

/* Results from GROUP BY GROUPING SETS above. The first line with all NULLs is a grand total. The second line with all NULLs except SalesPersonID is the total for salesperson287. The bottom line with all NULLs except for Region is the total for Europe. etc.

Region  Country Store                             SalesPersonID Total Sales
------- ------- --------------------------------- ------------- ---------------
NULL    NULL    NULL                              NULL          254013.6014
NULL    NULL    NULL                              287           28461.1854
NULL    NULL    NULL                              288           17073.0655
NULL    NULL    NULL                              290           208479.3505
NULL    NULL    Spa and Exercise Outfitters       NULL          236210.9015
NULL    NULL    Versatile Sporting Goods Company  NULL          17802.6999
NULL    DE      NULL                              NULL          17802.6999
NULL    FR      NULL                              NULL          236210.9015
Europe  NULL    NULL                              NULL          254013.6014

*/
---------------------
-- GROUP BY ROLLUP --
---------------------

-- GROUP BY ROLLUP - provides a shortcut for defining groups sets with combinations that assume input columns from a hierarchy. Will give grand totals and subtotals for combinations.

--example of GROUP BY ROLLUP
SELECT T.[Group] AS N'Region', 
	T.CountryRegionCode AS N'Country', 
	S.Name AS N'Store', 
	H.SalesPersonID,
	SUM(TotalDue) AS N'Total Sales'
FROM Sales.Customer C
    INNER JOIN Sales.Store S
        ON C.StoreID  = S.BusinessEntityID
    INNER JOIN Sales.SalesTerritory T
        ON C.TerritoryID  = T.TerritoryID 
    INNER JOIN Sales.SalesOrderHeader H
        ON C.CustomerID = H.CustomerID
WHERE T.[Group] = N'Europe'
    AND T.CountryRegionCode IN(N'DE', N'FR')
    AND H.SalesPersonID IN(287, 290, 288)
    AND SUBSTRING(S.Name,1,4) IN(N'Vers', N'Spa ')
--GROUP BY ROLLUP, showing subtotals by hierarchy
GROUP BY ROLLUP (T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID)
ORDER BY T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID;

/* Results from GROUP BY ROLLUP above. Line one with all NULLs is a grand total. Line 2 with all NULLs but Europe is a grand total for Europe, etc.

Region  Country Store                             SalesPersonID Total Sales
------- ------- --------------------------------- ------------- ---------------------
NULL    NULL    NULL                              NULL          254013.6014
Europe  NULL    NULL                              NULL          254013.6014
Europe  DE      NULL                              NULL          17802.6999
Europe  DE      Versatile Sporting Goods Company  NULL          17802.6999
Europe  DE      Versatile Sporting Goods Company  287           729.6344
Europe  DE      Versatile Sporting Goods Company  288           17073.0655
Europe  FR      NULL                              NULL          236210.9015
Europe  FR      Spa and Exercise Outfitters       NULL          236210.9015
Europe  FR      Spa and Exercise Outfitters       287           27731.551
Europe  FR      Spa and Exercise Outfitters       290           208479.3505


*/


-------------------
-- GROUP BY CUBE --
-------------------

-- GROUP BY CUBE - like ROLLUP, but shows every possible combination. Provides a shortcut for defining goruping sets in which all possible combinations of grouping sets created. This can be good if you're not sure which grouping you want to use, since it'll show you everything (scortched earth grouping).

--example of GROUP BY CUBE
SELECT T.[Group] AS N'Region', 
	T.CountryRegionCode AS N'Country', 
	S.Name AS N'Store', 
	H.SalesPersonID,
	SUM(TotalDue) AS N'Total Sales'
FROM Sales.Customer C
    INNER JOIN Sales.Store S
        ON C.StoreID  = S.BusinessEntityID
    INNER JOIN Sales.SalesTerritory T
        ON C.TerritoryID  = T.TerritoryID 
    INNER JOIN Sales.SalesOrderHeader H
        ON C.CustomerID = H.CustomerID
WHERE T.[Group] = N'Europe'
    AND T.CountryRegionCode IN(N'DE', N'FR')
    AND H.SalesPersonID IN(287, 290, 288)
    AND SUBSTRING(S.Name,1,4) IN(N'Vers', N'Spa ')
--GROUP BY CUBE, showing all possible grouping combinations
GROUP BY CUBE (T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID)
ORDER BY T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID;

/* Results from GROUP BY CUBE:

Region  Country Store                             SalesPersonID Total Sales
------- ------- --------------------------------- ------------- ---------------------
NULL    NULL    NULL                              NULL          254013.6014
NULL    NULL    NULL                              287           28461.1854
NULL    NULL    NULL                              288           17073.0655
NULL    NULL    NULL                              290           208479.3505
NULL    NULL    Spa and Exercise Outfitters       NULL          236210.9015
NULL    NULL    Spa and Exercise Outfitters       287           27731.551
NULL    NULL    Spa and Exercise Outfitters       290           208479.3505
NULL    NULL    Versatile Sporting Goods Company  NULL          17802.6999
NULL    NULL    Versatile Sporting Goods Company  287           729.6344
NULL    NULL    Versatile Sporting Goods Company  288           17073.0655
NULL    DE      NULL                              NULL          17802.6999
NULL    DE      NULL                              287           729.6344
NULL    DE      NULL                              288           17073.0655
NULL    DE      Versatile Sporting Goods Company  NULL          17802.6999
NULL    DE      Versatile Sporting Goods Company  287           729.6344
NULL    DE      Versatile Sporting Goods Company  288           17073.0655
NULL    FR      NULL                              NULL          236210.9015
NULL    FR      NULL                              287           27731.551
NULL    FR      NULL                              290           208479.3505
NULL    FR      Spa and Exercise Outfitters       NULL          236210.9015
NULL    FR      Spa and Exercise Outfitters       287           27731.551
NULL    FR      Spa and Exercise Outfitters       290           208479.3505
Europe  NULL    NULL                              NULL          254013.6014
Europe  NULL    NULL                              287           28461.1854
Europe  NULL    NULL                              288           17073.0655
Europe  NULL    NULL                              290           208479.3505
Europe  NULL    Spa and Exercise Outfitters       NULL          236210.9015
Europe  NULL    Spa and Exercise Outfitters       287           27731.551
Europe  NULL    Spa and Exercise Outfitters       290           208479.3505
Europe  NULL    Versatile Sporting Goods Company  NULL          17802.6999
Europe  NULL    Versatile Sporting Goods Company  287           729.6344
Europe  NULL    Versatile Sporting Goods Company  288           17073.0655
Europe  DE      NULL                              NULL          17802.6999
Europe  DE      NULL                              287           729.6344
Europe  DE      NULL                              288           17073.0655
Europe  DE      Versatile Sporting Goods Company  NULL          17802.6999
Europe  DE      Versatile Sporting Goods Company  287           729.6344
Europe  DE      Versatile Sporting Goods Company  288           17073.0655
Europe  FR      NULL                              NULL          236210.9015
Europe  FR      NULL                              287           27731.551
Europe  FR      NULL                              290           208479.3505
Europe  FR      Spa and Exercise Outfitters       NULL          236210.9015
Europe  FR      Spa and Exercise Outfitters       287           27731.551
Europe  FR      Spa and Exercise Outfitters       290           208479.3505

*/

-----------------
-- GROUPING_ID --
-----------------

-- GROUPING SETS, ROLLUP, and CUBE will return a NULL value in some columns to show that this is a summary row (for example, if it's showing totals for the Category column, the Category will be listed in that row but all other columns will show NULL, showing you that this is the summary for that Category). This NULL can be indistinguishable from a NULL in the data set. Therefore, the GROUPING_ID function tells you if the NULL is due to a summary row or if there's actually a NULL in the data.

--example of GROUPING_ID
SELECT T.[Group] AS N'Region', 
	T.CountryRegionCode AS N'Country', 
	S.Name AS N'Store', 
	--GROUPING_ID function, will return a 1 if a NULL in SalesPerson column below is result of grouping SUMMARY
	GROUPING_ID(H.SalesPersonID) AS N'SalesPerson G_ID',
	H.SalesPersonID AS N'SalesPerson',
	SUM(TotalDue) AS N'Total Sales'
FROM Sales.Customer C
    INNER JOIN Sales.Store S
        ON C.StoreID  = S.BusinessEntityID
    INNER JOIN Sales.SalesTerritory T
        ON C.TerritoryID  = T.TerritoryID 
    INNER JOIN Sales.SalesOrderHeader H
        ON C.CustomerID = H.CustomerID
WHERE T.[Group] = N'Europe'
    AND T.CountryRegionCode IN(N'DE', N'FR')
    AND H.SalesPersonID IN(287, 290, 288)
    AND SUBSTRING(S.Name,1,4) IN(N'Vers', N'Spa ')
GROUP BY ROLLUP (T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID)
ORDER BY T.[Group], T.CountryRegionCode, S.Name, H.SalesPersonID;

/* Results from GROUPING_ID above. 'SalesPerson G_ID' column shows a 1 if SalesPerson column is grouped. In top line, SalesPerson is NULL, but SalesPerson G_ID is 1, showing that NULL is the result of grouping. In bottom line, SalesPerson is NULL, but SalesPerson G_ID is 0, showing that the NULL is not the result of grouping, and therefore that NULL is in the data itself.

Region  Country Store                             SalesPerson G_ID SalesPerson Total Sales
------- ------- --------------------------------- ---------------- ----------- ---------------------
NULL    NULL    NULL                              1                NULL        254013.6014
Europe  NULL    NULL                              1                NULL        254013.6014
Europe  DE      NULL                              1                NULL        17802.6999
Europe  DE      Versatile Sporting Goods Company  1                NULL        17802.6999
Europe  DE      Versatile Sporting Goods Company  0                287         729.6344
Europe  DE      Versatile Sporting Goods Company  0                288         17073.0655
Europe  FR      NULL                              1                NULL        236210.9015
Europe  FR      Spa and Exercise Outfitters       1                NULL        236210.9015
Europe  FR      Spa and Exercise Outfitters       0                287         27731.551
Europe  FR      Spa and Exercise Outfitters       0                NULL        208479.3505


*/