MySQL Join Tutorial
Some believe the most important part of SQL is the ability to query data. Queries typically retrieve data by joining many tables together into useful result sets. This tutorial takes the position that visibility into the data helps those new to SQL understand how joins work. To that end, the queries use Common Tabular Expressions (CTEs) instead of tables.
Default behavior of a JOIN
without a qualifying descriptor is not simple because it may return:
- A
CROSS JOIN
(or Cartesian Product) when there is noON
orUSING
subclause, or - An
INNER JOIN
when you use anON
orUSING
subclause.
The following query uses JOIN without a qualifier or an ON or USING subclause. It also uses two copies of the single CTE, which is more or less a derived table and the result of a subquery held in memory. This demonstrates the key reason for table aliases. That key reason is you can put two copies of the same table in memory under different identifiers or labels.
1 2 3 4 5 6 7 | WITH alpha AS (SELECT 'A' AS letter, 130 AS amount UNION SELECT 'B' AS letter, 150 AS amount UNION SELECT 'C' AS letter, 321 AS amount) SELECT * FROM alpha a JOIN alpha b; |
It returns a Cartesian product:
+--------+--------+--------+--------+ | letter | amount | letter | amount | +--------+--------+--------+--------+ | A | 130 | A | 130 | | B | 150 | A | 130 | | C | 321 | A | 130 | | A | 130 | B | 150 | | B | 150 | B | 150 | | C | 321 | B | 150 | | A | 130 | C | 321 | | B | 150 | C | 321 | | C | 321 | C | 321 | +--------+--------+--------+--------+ 9 rows in set (0.00 sec) |
By adding an ON
clause to line 8, the default JOIN
keyword returns an INNER JOIN
result.
1 2 3 4 5 6 7 8 | WITH alpha AS (SELECT 'A' AS letter, 130 AS amount UNION SELECT 'B' AS letter, 150 AS amount UNION SELECT 'C' AS letter, 321 AS amount) SELECT * FROM alpha a JOIN alpha b ON a.letter = b.letter; |
It displays results, like:
+--------+--------+--------+--------+ | letter | amount | letter | amount | +--------+--------+--------+--------+ | A | 130 | A | 130 | | B | 150 | B | 150 | | C | 321 | C | 321 | +--------+--------+--------+--------+ 3 rows in set (0.00 sec) |
The next example uses two CTEs. One uses letters 'A'
, 'B'
, 'C'
, and D
and the other uses letters 'A'
, 'B'
, 'C'
, and 'E'
. The letter D
only exists in the alpha
derived table and the letter 'E'
only exists in the beta
derived table. The amount
column values differ for their respective letters in the two CTE tables.
The basic query below the comma delimited CTEs joins the alpha
and beta
derived tables with an INNER JOIN
using an ON
clause based on the letter
column values found in both alpha
and beta
CTEs.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | WITH alpha AS (SELECT 'A' AS letter, 130 AS amount UNION SELECT 'B' AS letter, 150 AS amount UNION SELECT 'C' AS letter, 321 AS amount UNION SELECT 'D' AS letter, 783 AS amount) , beta AS (SELECT 'A' AS letter, 387 AS amount UNION SELECT 'B' AS letter, 268 AS amount UNION SELECT 'C' AS letter, 532 AS amount UNION SELECT 'E' AS letter, 391 AS amount) SELECT * FROM alpha a INNER JOIN beta b ON a.letter = b.letter; |
The INNER JOIN
returns only those rows in alpha
and beta
CTEs where the letter
column values match:
+--------+--------+--------+--------+ | letter | amount | letter | amount | +--------+--------+--------+--------+ | A | 130 | A | 387 | | B | 150 | B | 268 | | C | 321 | C | 532 | +--------+--------+--------+--------+ 3 rows in set (0.01 sec) |
If you change line 17 from an INNER JOIN
to a LEFT JOIN
, you return all the rows from the alpha
CTE and only those rows from the beta
CTE that have a matching letter
column value. The new line 17 for a LEFT JOIN
is:
17 | SELECT * FROM alpha a LEFT JOIN beta b |
It returns the three matching rows plus the one non-matching row from the alpha
CTE that is on the left side of the LEFT JOIN
operator. You should note that that a left outer join puts null values into the beta
CTE columns where there is no matching row for the 'D'
letter
found in the alpha CTE.
The results are shown below:
+--------+--------+--------+--------+ | letter | amount | letter | amount | +--------+--------+--------+--------+ | A | 130 | A | 387 | | B | 150 | B | 268 | | C | 321 | C | 532 | | D | 783 | NULL | NULL | +--------+--------+--------+--------+ 4 rows in set (0.01 sec) |
If you change line 17 from an LEFT JOIN
to a RIGHT JOIN
, you return all the rows from the beta
CTE and only those rows from the alpha
CTE that have a matching letter
column value. The new line 17 for a RIGHT JOIN
is:
17 | SELECT * FROM alpha a RIGHT JOIN beta b |
It returns the following result set:
+--------+--------+--------+--------+ | letter | amount | letter | amount | +--------+--------+--------+--------+ | A | 130 | A | 387 | | B | 150 | B | 268 | | C | 321 | C | 532 | | NULL | NULL | E | 391 | +--------+--------+--------+--------+ 4 rows in set (0.00 sec) |
MySQL does not support a FULL JOIN
operation but you can mimic a full join by combining a LEFT JOIN
and RIGHT JOIN
with the UNION
operator. The UNION
operator performs a unique sort operation, which reduces the two copies of matching rows returned by both the left and right join operation to a unique set.
This is the way to write the equivalent of a full join:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | WITH alpha AS (SELECT 'A' AS letter, 130 AS amount UNION SELECT 'B' AS letter, 150 AS amount UNION SELECT 'C' AS letter, 321 AS amount UNION SELECT 'D' AS letter, 783 AS amount) , beta AS (SELECT 'A' AS letter, 387 AS amount UNION SELECT 'B' AS letter, 268 AS amount UNION SELECT 'C' AS letter, 532 AS amount UNION SELECT 'E' AS letter, 391 AS amount) SELECT * FROM alpha LEFT JOIN beta ON alpha.letter = beta.letter UNION SELECT * FROM alpha right JOIN beta ON alpha.letter = beta.letter; |
It returns one copy of the matching rows, and the non-matching rows from both the alpha
and beta
CTEs:
+--------+--------+--------+--------+ | letter | amount | letter | amount | +--------+--------+--------+--------+ | A | 130 | A | 387 | | B | 150 | B | 268 | | C | 321 | C | 532 | | D | 783 | NULL | NULL | | NULL | NULL | E | 391 | +--------+--------+--------+--------+ 5 rows in set (0.00 sec) |
A NATURAL JOIN
would return no rows because it works by implicitly discovering columns with matching names in both CTEs and then joins the result set from both CTEs. While the letter
column matches rows between the CTEs the amount column doesn’t hold any matches. The combination of letter
and amount
columns must match for a NATURAL JOIN
operation to return any rows.
You also have the ability to override the cost optimizer and force a left to right join by using the STRAIGHT_JOIN
operator. As always, I hope this helps those looking for a solution with an explanation.