laika222
10/28/2016 - 5:30 PM

JOINs - JOIN, INNER JOIN, LEFT JOIN, RIGHT JOIN, CROSS JOIN

Joins take columns from multiple tables and displays them next to each other. Line 1 is an INNER JOIN (also known as JOIN or a simple JOIN), which selects all rows from both tables as long as there is a match between the columns. If there are rows in the "Customers" table that do not have matches in "Orders", these customers will NOT be listed. In line 2, it tells it to pull those columns, FROM the (left) Customers table and INNER JOIN the columns from the (right) Orders table. Line 5 tells it to pull rows ON the condition that the CustomerID is found in the left table (Customers) AND the right table (Orders). RESULT: it joins those columns for all rows where the CustomerID is found in the Customers table AND the Orders table (i.e. show all CustomerIDs that have an order, include multiple rows for CustomerIDs with multiple orders, but do not show a line for CustomerIDs that don't have an order).

/***********************
OVERVIEW
***********************/

/*
JOINs take data from multiple tables and joins them together into a single results set. JOINs use an ON clause to create links between rows in both tables, so that it knows how to combine the data into the rows of the results set. There are a few types of JOIN:

JOIN (or INNER JOIN): combines data from two tables and shows only rows where a certain piece of data exists in both tables (for instance, a shared ID number). For example, if you have an Employees table and a Sales table, JOINing those two tables will bring back those employees who have a sale (you won't see employees who don't have a sale, or sales that don't have an employee). In SQL server, when you use a JOIN is starts by creating a Cartesian Product (a row for each possible combination of the values) and then it filters that result set based on the ON clause that you add to the statement. ON functions in a similar way as a WHERE clause.

LEFT JOIN (or LEFT OUTER JOIN): combines data from two tables and shows all rows from the LEFT table, even if there's no corresponding data in the RIGHT table. For example, if you have an Employees table and a Sales table, LEFT JOINing those two tables will bring back a row for all employees, even if some employees don't have a sale (employees without a sale will show up with a NULL in columns from the Sales tables. Note that this NULL could derive from not having data in the Sales table, or it could derive from a NULL actually being the value in the Sales table.).

RIGHT JOIN (or RIGHT OUTER JOIN): combines data from two tables and shows all rows from the RIGHT table, even if there's no corresponding data in the LEFT table. For example, if you have an Employees table and a Sales table, LEFT JOINing those two tables will bring back a row for all sales, even if some sales don't have a related employee (employees without a sale will show up with a NULL in columns from the Employees tables. Note that this NULL could derive from not having data in the Employees table, or it could derive from a NULL actually being the value in the Employees table.).

FULL JOIN: a combination of LEFT JOIN and RIGHT JOIN which shows all rows from both LEFT table and RIGHT table, filling in NULLs for those fields where there is no data (or where a NULL is the value).

CROSS JOIN: creates a Cartesian Product, which is a listing off all possible combinations.

SELF JOIN: like a normal JOIN, except that both JOINs reference the same table.
*/


/***********************
INNER JOIN
***********************/

-- JOIN, or INNER JOIN, or simple JOIN. Takes two tables and combines them into a single result set. This matches the columns in the left table with the columns in the right table using a common piece of data found in both tables, such as a customer ID number.
SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerID, Customers.LastName, Customers.FirstName
FROM Customers
JOIN Orders
ON Orders.CustomerID=Customers.CustomerID; 

-- How to JOIN using aliases for the tables names rather than spelling the tables names out in each case. In this example, the Orders table takes the alias a, and the Customers table takes the alias b.
SELECT a.OrderID, a.OrderDate, b.CustomerID, b.LastName, b.FirstName
FROM Customers a
JOIN Orders b
ON a.CustomerID = b.CustomerID; 

-- How to use a JOIN along with a WHERE clause to further filter the results.
SELECT c.CustomerID, c.LastName, c.FirstNAME, o.OrderID
FROM customers1 c
JOIN orders1 o
ON c.CustomerID = o.CustomerID
WHERE OrderID BETWEEN 1 AND 10;

-- How to complete multiple JOINs in the same statment.
SELECT c.CustomerID, c.LastName, c.FirstNAME, o.OrderID, p.Product
FROM customers1 c
JOIN orders1 o
ON c.CustomerID = o.CustomerID
JOIN products1 p -- second JOIN being added
ON o.ProductID = p.ProductID;

-- How to JOIN from a SELECT statement, rather than joining two entire tables (a SELECT statement result set is basically just a virtual table anyway). JOINing from a SELECT can help make a query more efficient (for instance, using a WHERE clause in a SELECT to JOIN a subset of a large table rather than having to JOIN the entire table), and it allows you to pull in things like aggregate values (you can't directly JOIN an aggregate [such as pulling a.COUNT(HR)], but you can pull an aggregate value into a SELECT result set, and then JOIN that SELECT result set with another table or SELECT result set [see the example below]).
SELECT a.playerID, a.nameLast, a.nameFirst, b.allstar_games_played 
FROM (SELECT playerID, nameLast, nameFirst FROM master WHERE teamID = 'CHA') AS a
JOIN (SELECT playerID, COUNT(GP) AS 'allstar_games_played' FROM allstarfull
GROUP BY playerID) AS b
ON a.playerID = b.playerID


-- DETAIL!!! 

-- SELECTS playerID, nameLast, and nameFirst from first SELECT (alias is a), and allstar_games_played from second SELECT (alias is b). allstar_games_played is an aggregate column (COUNT), so in order to SELECT it at the top of the JOIN, you must give that aggregate column an alias (using AS) further down and then SELECT using the alias you give it.
SELECT a.playerID, a.nameLast, a.nameFirst, b.allstar_games_played 

-- SELECTs FROM the first SELECT statement, which is given alias a. A WHERE clause is included in this SELECT, which reduces the overall number of rows needed to be JOINed (can greatly increase effciency).
FROM (SELECT playerID, nameLast, nameFirst FROM master WHERE teamID = 'CHA') AS a

-- JOINs the second SELECT statement, which is given alias b. This includes an aggregate column (COUNT of the GP column), which is given the alias 'allstar_games_played'. Note that above in the first line when the columns are selected, you must refer to this aggregate column using the alias defined with the AS in this line (you must SELECT b.allstar_games_played, you can't SELECT b.COUNT(HR))
JOIN (SELECT playerID, COUNT(GP) AS 'allstar_games_played' FROM allstarfull
GROUP BY playerID) AS b

-- Matches the rows up from both SELECT statments by checking that a.playerID = b.playerID
ON a.playerID = b.playerID


/***********************
LEFT JOIN
***********************/

-- LEFT JOIN. The first table (Customers) is the LEFT table, and the next table (Orders) is the right table. This will bring back a row for each Customer, and if there is no Order for the Customer, the columns from the Orders table will show up as a NULL. This example answers the question - How many Customers have an Order, and what is the Count of those Orders. 
Select Customers.CustomerID, Customers.LastName, Customers.FirstName, Orders.OrderID, COUNT(Orders.OrderID)
FROM Customers
LEFT JOIN Orders
ON Customers.CustomerID=Orders.CustomerID
GROUP BY Customers.CustomerID;

/***********************
RIGHT JOIN
***********************/

-- RIGHT JOIN. The first table (Customers) is the LEFT table, and the next table (Orders) is the right table. This will bring back a row for each Order, and if there is no Customer for the Order, the columns from the Customer table will show up as a NULL. 
SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerID, Customers.LastName, Customers.FirstName
FROM Customers
RIGHT JOIN Orders
ON Orders.CustomerID=Customers.CustomerID; 

-- Note that if you use a LEFT JOIN or a RIGHT JOIN and you add another JOIN after that, you should usually use the same type of JOIN (rather than just using an INNER JOIN like you would usually do). This is so you don't miss out on rows with NULLs (figure out exactly what this means, but bottom line, if you use a LEFT or RIGHT JOIN and you then subsequently add another JOIN, test it to make sure you're getting all the rows you want).
SELECT p.Name AS ProductName, od.LineItem, oh.SalesOrderNumber
FROM SalesLT.Product AS p
LEFT JOIN SalesLT.SalesOrderDetail AS od
ON p.ProductID = od.ProductID
LEFT JOIN SalesLT.SalesOrderHeader AS oh -- second JOIN being added, normally could be done with an INNER JOIN, though it is after (moving to the RIGHT) a LEFT JOIN, so if you do this you'll miss all the NULLs from the previous JOIN. Therefore, after a LEFT JOIN, continue using LEFT JOINs as you move RIGHT (figure this out) in order to get all of the NULLs.
ON od.SalesOrderID = oh.SalesOrderID
ORDER BY p.ProductID;

/***********************
FULL JOIN
***********************/

-- FULL JOIN. The first table (Customers) is the LEFT table, and the next table (Orders) is the right table. This will bring back all rows from all tables, including any rows that might have NULLs in the columns from both tables.
SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerID, Customers.LastName, Customers.FirstName
FROM Customers
FULL JOIN Orders
ON Orders.CustomerID=Customers.CustomerID; 

/***********************
CROSS JOIN
***********************/

-- CROSS JOIN. Creates a Cartesian Product showing all possible combinations of the data. This is apparently only used when populating test data into a system. Note that there is no ON clause since you're not filtering the data.
SELECT Orders.OrderID, Orders.OrderDate, Customers.CustomerID, Customers.LastName, Customers.FirstName
FROM Customers
JOIN Orders;

/***********************
SELF JOIN
***********************/

-- SELF JOIN. you join a table to the same table using a differnet alias. This is the same as any other type of JOIN, but both tables beinging JOINed are the same table (that's why you have to give the same table two different aliases, so the aliases can be JOINed).

SELECT emp.EmployeeName AS Employee,
	man.ManagerName AS Manager
FROM humanresources.Employee AS emp -- note this is giving the humanresources table an alias
LEFT JOIN humanresources.Employer AS man -- note this is also giving the humanresources table a second alias
ON emp.ManagerID = man.EmployeeID


/***********************
ON CLAUSE VS WHERE CLAUSE
***********************/

/*
The ON clause works to tell SQL Server how to match up the rows from the left table to the rows
from the right table, based on a shared piece of information (like an ID number). 

ON a.customer_no = b.customer_no

In an INNER JOIN, it also serves a secondary function in that it can be used to filter down the rows retrieved, 
such as the example below where it only includes rows from the right table where primary_ind = 'Y'.

ON a.customer_no = b.customer_no AND b.primary_ind = 'Y'. 

In this example, it matches up based on customer_no but only includes rows if b.primary_ind = 'Y'.

The WHERE clause can also be used to filter the rows. In an INNER JOIN (a regular JOIN), row filtering
can therefore occur in either the ON clause or the WHERE clause. The result is exactly the same. 
This statement:

SELECT a.customer_no, a.customer_name
FROM T_CUSTOMER a
JOIN T_SALUTATIONS b
ON a.customer_no = b.customer_no AND b.primary_ind = 'Y'

is the same as this statement:

SELECT a.customer_no, a.customer_name
FROM T_CUSTOMER a
JOIN T_SALUTATIONS b
ON a.customer_no = b.customer_no 
WHERE b.primary_ind = 'Y'

Normally, WHERE is evaluated right after FROM, but if you filter from an ON clause, SQL Server is
smart enough to know that it can save a lot of cost by filtering from the ON clause first, and therefore
it'll internally rearrange the order in which it evaluates the predicates based on its own 
cost estimates. Filtering as early as possible in order to save cost is called *predicate pushdown*.

HOWEVER, ON and WHERE are not the same when using an OUTER JOIN (LEFT or RIGHT). In a LEFT JOIN for
example, all rows from the LEFT table are retained, regardless of whether the ON finds a match or not.
Therefore, the ON clause only acts in the role of telling SQL Server how to match the rows up. 
The ON does not do any filtering. Therefore, with an OUTER JOIN, you need to place any filtering 
into the WHERE clause.

SELECT a.customer_no, a.customer_name
FROM T_CUSTOMER a
LEFT JOIN T_SALUTATIONS b
ON a.customer_no = b.customer_no 
WHERE b.primary_ind = 'Y' -- filtering goes here

*/

/***********************
JOINING WITH NULL VALUES
***********************/

/*
Care must be taken when NULLs are present in the columns that you are JOINing in the ON clause. 
If a NULL is present for a row in either side of an INNER JOIN, that row will not be matched up
and will therefore not be included in the final results. For example, if the left table has a 
NULL and the right table has a NULL for a row, even though they're both NULL, they're unknown and 
therefore not 'equal', so they  won't be included in the results. This scenario can often happen 
when doing a composite JOIN (a JOIN with multiple columns being compared in the ON clause).

If you do want to perform an INNER JOIN and include the rows where both sides are NULL, there are 
a couple ways to do this.

The best (and most efficient) way is to use an AND clause and the IS NULL keyword. Using IS NULL
gets around the issue of NULLs not being equal, and instead says 'if left side IS NULL and right
side IS NULL, then include that row (even if the NULLs are considered equal)'.
*/
SELECT * from T_CUSTOMER (nolock) a
JOIN TX_CUSTOMER (nolock) b
ON a.customer_no = b.customer_no 
  -- this AND pulls where a.name_status = b.name_status, OR both sides are NULL
  AND (a.name_status = b.name_status OR (a.name_status IS NULL AND b.name_status IS NULL))

/*
A second way to do this is to use the ISNULL() fuction, switching out NULL values for some text value
that can then be evaluated as a match and therefore included in the results of the INNER JOIN.

HOWEVER, this is not usually the best method. You are having SQL Server modify one of the columns,
and once this occurs, it cannot guarantee that the pre-defined ordering laid out by any affected
index is correct. That means that it can't utilize any affected index fully and it has to rescan data. 
This can make this method very inefficient when dealing with large tables. While the end results 
set may be the same as using the OR IS NULL method above, you may take a significant performance 
hit using the ISNULL() replacement method shown below.
*/
SELECT * from T_CUSTOMER (nolock) a
JOIN TX_CUSTOMER (nolock) b
ON a.customer_NO = b.customer_no
  --ISNULL() replaces NULLs with a text string that can then be matched up in the ON clause
  AND (ISNULL(a.name_status, 'Not Found') = ISNULL(b.name_status, 'Not Found'))
  
  
/***********************
MIXING DIFFERENT JOIN TYPES
***********************/

/*
When using multiple JOINs in a statement and mixing different JOIN types (such as OUTER and INNER),
you need to be careful about the order or you could get the inadvertent removal of rows from the
final result set. Joins are evaluated in order of the ON statements and you usually place the ON
after the join, and so if you start with an outer join and then follow it with an inner join you can
inadvertently nullify the outer rows pulled in the first outer join.

For example, if you start with a LEFT JOIN which produces some rows with NULLs as placeholders, and
you follow it with an INNER JOIN it by nature discards the rows with the NULLs, which in effect
nullifies the outer rows pulled by the LEFT JOIN (in fact, the SQL Server optimizer might see this
conflict and skip the first join altogether.)

As an example, say you have three tables like so:

People table: 

PersonID  PersonName
1         Fred Flintstone
2         Barney Rubble
3         George Jetson

PetType table:

PetTypeID PetType
1         Dinosaur
2         Hopparoo

Pets table:

PetID PetTypeID PetName OwnerID
1     1         Dino    1
2     2         Hoppy   2
3     NULL      Baby    1

You start with a LEFT JOIN on People and Pets that gives you one row that has a NULL (George Jetson).
*/
select People.PersonName, Pets.PetName
from People
left outer join Pets on Pets.OwnerID = People.PersonID

/*
Results:

PersonName           PetName   
-------------------- ----------
Fred Flintstone      Dino
Barney Rubble        Hoppy
George Jetson        NULL

Then you try to add the Pet Type from the PetTypes table using an INNER JOIN.

*/
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID -- the INNER JOIN following the OUTER JOIN

/*
Results:

PersonName           PetName    PetType   
-------------------- ---------- ----------
Fred Flintstone      Dino       Dinosaur
Barney Rubble        Hoppy      Hopparoo

Notice how George Jetson is now gone. He didn't have a pet, and therefore PetTypeID was NULL,
and since PetTypeID is in the INNER JOIN ON clause, the NULL doesn't evaluate as a match and 
therefore isn't included in the results. You've lost the outer rows from the first LEFT JOIN.

There are a couple ways around this - one is if you start with a LEFT JOIN and then continue with 
LEFT JOINs all the way throughout the end of the statement.
*/
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join Pets on Pets.OwnerID = People.PersonID
left outer join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID -- this is now an OUTER JOIN

/*
Results:

PersonName           PetName    PetType   
-------------------- ---------- ----------
Fred Flintstone      Dino       Dinosaur
Fred Flintstone      Baby       NULL
Barney Rubble        Hoppy      Hopparoo
George Jetson        NULL       NULL

Notice that George Jetson is now retained as planned. We're also getting Baby from George Jetson,
which is a pet who does not have a PetType. This could be OK, but you might not want to include 
a pet with no PetType.

Therefore, you might try an alternate method which is to use a derived table that completes 
the INNER JOIN first (removing pets with no PetType) and then uses that result set to 
be LEFT JOINed to the People table.
*/
select People.PersonName, Pets.PetName, Pets.PetType
from People
left outer join
( -- begin the INNER JOIN, which is separated logically
  select Pets.ownerID, Pets.PetName, PetTypes.PetType
  from Pets
  inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
)
  AS Pets on Pets.OwnerID = People.PersonID -- Pets is the alias for the derived table, which is required

/*
Results: 

PersonName      PetName   PetType
Fred Flintstone Dino      Dinosaur
Barney Rubble   Hoppy     Hopparoo
George Jetson   NULL      NULL

Note that you now retain all of the people from the People table, but you no longer list any 
pet that doesn't have a PetTypeID.

You can alternately phrase the derived table like so:
*/
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join
  (Pets inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID) -- the derived table
on Pets.OwnerID = People.PersonID

/*
You get the same results as the first derived table: 

PersonName      PetName   PetType
Fred Flintstone Dino      Dinosaur
Barney Rubble   Hoppy     Hopparoo
George Jetson   NULL      NULL

Note that you don't specify any columns to select in the INNER JOIN, you instead just 
specify (Pets inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID) with the ON clause
telling it how to complete the INNER JOIN. 

Secondly, you can actually run this alternate method without the parenthesis like so, which
will give you the same results:
*/
select People.PersonName, Pets.PetName, PetTypes.PetType
from People
left outer join
  Pets inner join PetTypes on Pets.PetTypeID = PetTypes.PetTypeID
on Pets.OwnerID = People.PersonID

/*
The reason SQL Server allows this is it actually completes the joins based on the order of
the ON clauses. Normally you put the ON right after the JOIN clause, but in this case the
INNER JOIN has the ON clause listed first, and it's then followed by the ON clause of the 
LEFT JOIN, and therefore the INNER JOIN is completed first. However, for clarity purposes
it's best to use the parenthesis.
*/


/***********************
JOIN PERFORMANCE
***********************/

/*
When you define a primary key or unique constraint, SQL Server creates a unique index on the 
constraint columns to enfore the constraint's uniqueness property. However, when you define a 
foreign key, SQL Server doesn't create any indexes on the foreign key columsn. Such indexes can improve
the performance of JOINs based on those relationships. Therefore, it's your responsibility to
identify the cases where creating indexes on those foreign key columns can be useful.
*/