Basic SQL join semantics
It’s almost funny reflecting on when I first learned how to write SQL in 1985. It was using SQL/DS 1.1 (Structured Query Language/Data System) on a VMS/CMS operating system at the IBM center in the Los Angeles, California. That was in the Crocker Bank Building when I worked for First Interstate Bank. The building is still there and so are databases but both banks are long gone.
After 24 years of SQL, a quick primer on joins seems like a good idea. I have a simple example that should help illustrate various joins and SQL semantics.
Definitions
Before jumping into the SQL join statements, setting up some basic information seems like a good idea. Here are some basic definitions:
SQL JOIN Concepts:
- A table is an array of arrays, which means a row in any table contains an array. The definition of a table defines the data structure of a row or array.
- Selection is the process of identifying, accessing, and joining data into return data sets. The basic elements of selection in a SQL statement are the
FROM
andWHERE
clauses.- The ANSI 89 syntax uses a cross join or comma-delimited list of tables, and places the join in the
WHERE
clause. - The ANSI 92 syntax uses a set of key words like
INNER JOIN
,LEFT JOIN
, et cetera, and puts the join in anON
orUSING
clause, which are often referred to as subclauses.
- The ANSI 89 syntax uses a cross join or comma-delimited list of tables, and places the join in the
- Projection is the process of paring columns from filtered data sets. Projection is determined by the list of columns found in a
SELECT
clause.
SQL JOIN Logic: (Check wikipedia for more detail …)
- A SQL join without a relationship creates what’s known as a Cartesian Product, which is a result set that consists of all rows from one table matched with all row in another table. This type of join is also known as a
CROSS JOIN
. - A SQL join with a relationship may be one of two types:
- A SQL equijoin (also known as a natural or semijoin) creates a relationship between two tables based on a comparison of values found in one or a set of columns in one table and one or an equal set of columns in another table.
- A SQL non-equijoin (also known as a θ-join) effects a relation between two tables based on a filtered
CROSS JOIN
between two tables. This type of join can be a range comparison using theBETWEEN
operator or a comparison of column values that uses an inequality operator. - A SQL antijoin may be one of two things, at least typically, and they are:
- A relative complement, which is everything in another table not found in a table.
- A symmetric difference, which is the relative complement of both tables in a join.
SQL JOIN Implementations:
- A
CROSS JOIN
implements a Cartesian Product, which returns rows combined from both tables. It matches every row in one table against every row in another table. For example, if you have 3 rows in one table and 2 rows in another you’ll have 2 times 3, or 6, rows in the result set. - An
INNER JOIN
implements an equijoin, which returns rows combined from both tables when the values in the join condition match. Join conditions can include one or more columns. - A
NATURAL JOIN
implements an equijoin, which returns rows combined from both tables when the values in the join condition match. While the join conditions can include one or more columns, you have no choice in selecting the join columns. ANATURAL JOIN
checks the definition of both tables (in the data catalog or metadata) and looks for like named columns. It then joins the table based on the values of all like named columns from both tables. One might say it’s an unnatural behavior because a developer has no way to override it. I’d concur with calling it unnatural and for that matter wrote much more about it in this older post. - A
LEFT JOIN
implements an equijoin and relative complement, which returns rows combined from both tables when the values in the join condition match and the rows in the left table that aren’t found in the right table. This makes aLEFT JOIN
a combination of anINNER JOIN
and a relative complement of the table on the right of the join. This nuance becomes important when we examine the(+)
semantic implemented by Oracle. - A
RIGHT JOIN
implements an equijoin and relative complement, which returns rows combined from both tables when the values in the join condition match and the rows in the right table that aren’t found in the left table. This makes aRIGHT JOIN
the opposite of aLEFT JOIN
. Naturally, it is also a combination of anINNER JOIN
and a relative complement of the left table. Like theLEFT JOIN
, this nuance becomes important when we examine the(+)
semantic implemented by Oracle. - A
FULL JOIN
implements an equijoin and both relative complements, which returns rows combined from both tables when the values in the join condition match and the rows in both the left and right tables that aren’t found in their respective other tables. This makes aFULL JOIN
a combination of anINNER JOIN
and relative complements of the both tables. This type of join can’t be done with the(+)
alone in the older syntax. It requires left and right outer queries glued together by aUNION
set operator. The UNION set operator eliminates duplicate from the intersection of the two tables, which are created by gluing together two outer joins. - An Antijoin implements an outer join minus an equijoin, which returns a relative complement when the first query is a
LEFT JOIN
orRIGHT JOIN
and a symmetric difference when the first query is aFULL JOIN
.
SQL Join Inheritance Tree:
Unified Modeling Language (UML) is great for modeling concrete software but is also good for illustrating abstract models. The following uses UML to depict the abstract relationship between the various join types.
It’s borrowed from the SQL Primer (Appendix B) in my Oracle Database 11g PL/SQL Programming book. Duly footnoted.
SQL Self Joins:
-
A self join occurs when you create a temporary copy of a table by assigning it an alias. Good SQL coding practices (from my perspective) would involve giving both instances of the table aliases. There is no reason why you can’t have three or four copies of a table in a self join. You can find an example of a self join in the Inner Join section later in the blog page.
SQL Join Examples
The SQL join statements are based on a little model that you’ll find at the end of the blog page. You should be able to copy it directly into SQL*Plus or MySQL. Naturally, the Oracle-only examples work exclusively in Oracle.
Cross Joins:
-
Rule(s):
- A cross join doesn’t have a join statement.
Examples:
This example creates a Cartesian Product of 42 rows because the first table has 6 rows and the second has 7. The syntax that uses CROSS JOIN
is known as ANSI standard.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english CROSS JOIN ordinal_french; |
The older syntax, sometimes called Oracle Proprietary, doesn’t use the CROSS JOIN
semantic. You simply write the tables as a comma separated list, like the next example. The results are exactly the same.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french; |
Inner Joins:
-
Rule(s):
- An inner join must have a join statement.
- An inner join that uses the
INNER JOIN
clause must include a join statement in theFROM
clause. - An inner join that uses comma delimited tables must include a join statement in the
WHERE
clause.
Examples:
This example creates an inner join where the values of a column in each table are equal. The syntax that uses INNER JOIN
is known as ANSI standard. The INNER
key word is optional, which make a JOIN
equivalent to an inner join.
It has two variants. One uses an ON
clause when the column names are equal or unequal. The other uses a USING
clause. The USING
clause only works when the column names in both tables are the same. You can also pass a list of comma separated column names as arguments to the USING
clause because it acts like (and is more or less) an overloaded function.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english INNER JOIN ordinal_french ON ordinal_english_id = ordinal_french_id; |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
-------------------- --------------------
One Un
Three Trois
Four Quatre
Five Cinq
Six Six |
The ORDINAL_ENGLISH
table has a primary key column name that differs from the ORDINAL_ENGLISH
table’s primary key column name. The ORDINAL_OLD_ENGLISH
table has the same column name for the primary key, which lets you use the USING
clause.
SELECT ordinal_english_text , ordinal_old_english_text FROM ordinal_english INNER JOIN ordinal_old_english USING(ordinal_english_id); |
You can put more than one column name in a USING
clause. There may be an implementation limit but I’ve not found it. It appears that the parser treats the list recursively. The syntax for that is the following:
USING(column_name1, column_name2, ...) |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
-------------------- ------------------------
One Fyrmest
Three Pridda
Four Feorda
Five Fifta
Six Siexta |
The older syntax, sometimes called Oracle Proprietary but really ANSI 89, doesn’t use the INNER JOIN
semantic. You simply write the tables as a comma separated list, like the next example. The join condition shifts from the FROM
clause to the WHERE
clause. The results are exactly the same as the similar query above.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id = ordinal_french_id; |
SQL Self Referencing Queries
Interchangeable primary and foreign keys in a single column
A self referencing query uses multiple copies of a table. This works by aliasing the name of the table, which effectively creates multiple record sets in memory. Two examples of a self referencing query are provided using the ordinal_french
table.
Both use table aliases and column aliases. You should note that column aliases may be preceded by an AS
but it isn’t required. The same is not true for table aliases because using an AS
triggers a run time error. Another tidbit – Oracle SQL*Plus lets you use double quotes to create column names with white space and strings that begin with numbers.
This query uses the INNER JOIN
syntax:
SELECT of1.ordinal_french_text AS "French Text #1" , of2.ordinal_french_text AS "French Text #2" FROM ordinal_french of1 INNER JOIN ordinal_french of2 ON of1.ordinal_french_id = of2.ordinal_french_id; |
This query uses the comma separated list of tables syntax:
SELECT of1.ordinal_french_text AS "French Text #1" , of2.ordinal_french_text AS "French Text #2" FROM ordinal_french of1, ordinal_french of2 WHERE of1.ordinal_french_id = of2.ordinal_french_id; |
Both produce the following result set:
French Text #1 French Text #2 -------------- -------------- Un Un Trois Trois Quatre Quatre Cinq Cinq Six Six Sept Sept Huit Huit |
Primary and foreign keys in different columns
The preceding self referencing query is straightforward but can be misleading. You may wonder why it isn’t that simple. The reason why is that the preceding queries use the same column as both the primary and foreign key. Generally, that’s not the case when you use self referencing joins. Self referencing joins typically use the primary to foreign key relationship between two columns.
Assume you had a system_user
table and it contained a system_user_id
as the primary key, system_user_name
as a non-key column, and who audit columns, like a created_by
and last_updated_by
column that held foreign key values that pointed back to the system_user_id
column, how would you get the user name for a row and the user name responsible for both creating and updating the row. In this type of example, the primary key can be in the same row or a different row.
This query represents the solution to the problem:
SELECT su1.system_user_name AS "System User" , su2.system_user_name AS "Creating User" , su3.system_user_name AS "Updating User" FROM system_user su1 , system_user su2 , system_user su3 WHERE su1.created_by = su2.system_user_id -- Primary(FK) to Secondary (PK) AND su1.last_updated_by = su3.system_user_id -- Primary(FK) to Tertiary(PK) ORDER BY su1.system_user_id; |
You can see that you need to match the created_by
and last_updated_by
column values to the the row that contains the same system_user_id
column value. This requires that you create three copies of the table, and mimics the behavior of two nested loops inside a master loop. The master loop has the primary user name, and the nested loops hold the data for the user that created or updated the row. This is illustrated in the following drawing.
The query produces a Cartesian Product of 53 or 125 rows. The rows consist of three copies of the columns from the table glued together into long rows. Only 5 of the rows meet the two join conditions, which means the rest are discarded. A Cartesian Product is the best way to visualize this process because it lets us see in a 2-dimensional model n-dimensions of space. Every table, or copy of a table, represents a spatial dimension in a join. You can find the mechanics of how joins really work in the SQL join logic section later in the blog page. The script yields the following output:
System User Creating User Updating User ------------ -------------- -------------- SYSADMIN SYSADMIN SYSADMIN DBA 1 SYSADMIN DBA 1 DBA 2 SYSADMIN SYSADMIN DBA 3 SYSADMIN DBA 2 APPS DBA DBA 3 DBA 3 |
Naturally, there is a way to do this in the more current ANSI SQL with the INNER JOIN
phrase. Here’s the equivalent using the newer syntax:
SELECT su1.system_user_name AS "System User" , su2.system_user_name AS "Creating User" , su3.system_user_name AS "Updating User" FROM system_user su1 JOIN system_user su2 -- Inner is an optional identifier ON su1.created_by = su2.system_user_id -- Primary(FK) to Secondary (PK) JOIN system_user su3 -- Inner is an optional identifier ON su1.last_updated_by = su3.system_user_id -- Primary(FK) to Tertiary(PK) ORDER BY su1.system_user_id; |
Left Joins:
-
Rule(s):
- A left join must have a join statement.
- A left join that uses the
LEFT JOIN
clause must include a join statement in theFROM
clause. - A left join that uses comma delimited tables must include a join statement in the
WHERE
clause.
Examples:
This example creates an inner join where the values of a column in each table are equal. The syntax that uses LEFT JOIN
is known as ANSI standard. It has two variants. One uses a ON
clause when the column names are equal or unequal. The other uses a USING
clause. The USING
clause only works when the column names in both tables are the same.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english LEFT JOIN ordinal_french ON ordinal_english_id = ordinal_french_id; |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
-------------------- --------------------
One Un
Three Trois
Four Quatre
Five Cinq
Six Six
Two |
The ORDINAL_ENGLISH
table has a primary key column name that differs from the ORDINAL_ENGLISH
table’s primary key column name. The ORDINAL_OLD_ENGLISH
table has the same column name for the primary key, which lets you use the USING
clause.
SELECT ordinal_english_text , ordinal_old_english_text FROM ordinal_english LEFT JOIN ordinal_old_english USING(ordinal_english_id); |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
-------------------- ------------------------
One Forma
Three Pridda
Four Feorda
Five Fifta
Six Siexta
Two |
The older syntax, sometimes called Oracle Proprietary, doesn’t use the LEFT JOIN
semantic. You simply write the tables as a comma separated list, like the next example. The join condition shifts from the FROM
clause to the WHERE
clause. The (+)
following the ordinal_french_id
column in the join indicates that you want the relative complement of that table, which are all non-matching values in the ordinal_english
table. The results are exactly the same.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id = ordinal_french_id(+); |
Right Joins:
-
Rule(s):
- A right join must have a join statement.
- A right join that uses the
RIGHT JOIN
clause must include a join statement in theFROM
clause. - A right join that uses comma delimited tables must include a join statement in the
WHERE
clause.
Examples:
This example creates an right join where the values of a column in each table are equal. The syntax that uses RIGHT JOIN
is known as ANSI standard. It has two variants. One uses a ON
clause when the column names are equal or unequal. The other uses a USING
clause. The USING
clause only works when the column names in both tables are the same.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english RIGHT JOIN ordinal_french ON ordinal_english_id = ordinal_french_id; |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
-------------------- --------------------
One Un
Three Trois
Four Quatre
Five Cinq
Six Six
Eahtoda
Scofoda |
The ORDINAL_ENGLISH
table has a primary key column name that differs from the ORDINAL_ENGLISH
table’s primary key column name. The ORDINAL_OLD_ENGLISH
table has the same column name for the primary key, which lets you use the USING
clause.
SELECT ordinal_english_text , ordinal_old_english_text FROM ordinal_english RIGHT JOIN ordinal_old_english USING(ordinal_english_id); |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
-------------------- ------------------------
One Forma
Three Pridda
Four Feorda
Five Fifta
Six Siexta
Eahtoda
Scofoda |
The older syntax, sometimes called Oracle Proprietary, doesn’t use the RIGHT JOIN
semantic. You simply write the tables as a comma separated list, like the next example. The join condition shifts from the FROM
clause to the WHERE
clause. The (+)
following the ordinal_english_id
column in the join indicates that you want the relative complement of that table, which are all non-matching values in the ordinal_french
table. The results are exactly the same.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id(+) = ordinal_french_id; |
Full Joins:
-
Rule(s):
- A full join must have a join statement.
- A full join that uses the
FULL JOIN
clause must include a join statement in theFROM
clause. - A full join that uses a list of comma separated tables must include a join statement in the
WHERE
clause.
Examples:
This example creates an full join where the values of a column in each table are equal. The syntax that uses FULL JOIN
is known as ANSI standard. It has two variants. One uses a ON
clause when the column names are equal or unequal. The other uses a USING
clause. The USING
clause only works when the column names in both tables are the same.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english FULL JOIN ordinal_french ON ordinal_english_id = ordinal_french_id; |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
-------------------- -------------------
One Un
Three Trois
Four Quatre
Five Cinq
Six Six
Sept
Huit
Two |
The ORDINAL_ENGLISH
table has a primary key column name that differs from the ORDINAL_ENGLISH
table’s primary key column name. The ORDINAL_OLD_ENGLISH
table has the same column name for the primary key, which lets you use the USING
clause.
SELECT ordinal_english_text , ordinal_old_english_text FROM ordinal_english FULL JOIN ordinal_old_english USING(ordinal_english_id); |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
-------------------- ------------------------
One Forma
Three Pridda
Four Feorda
Five Fifta
Six Siexta
Eahtoda
Scofoda
Two |
The older syntax, sometimes called Oracle Proprietary, doesn’t use the FULL JOIN
semantic. You can’t simply write the tables as a comma separated list, like the other example. While the join condition shifts from the FROM
clause to the WHERE
clause, you must use two queries and a UNION
set operator. The (+)
following the ordinal_english_id
column in one query, and the ordinal_french_id
in the second query. You should note that the UNION
eliminates the duplicated intersection values from the two outer join statements. This is demonstrated in the following example.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id(+) = ordinal_french_id UNION SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id = ordinal_french_id(+); |
Non-equijoins:
-
Rule(s):
- A non-equijoin doesn’t have an equijoin statement.
- A non-equijoin that uses the
CROSS JOIN
doesn’t have a join in theFROM
clause. - A non-equijoin that uses a comma separated list of tables doesn’t have an equijoin in the
WHERE
clause.
Examples:
A range non-equijoin
This example creates an non-equijoin where the comparison of values looks for the occurrence of values in an inclusive range of values. The syntax that uses CROSS JOIN
is known as ANSI standard. The syntax using a comma separated list of tables is sometimes called Oracle Proprietary but is really only an older version of ANSI SQL when you exclude some Oracle extensions.
This example uses a traditional model with calendar
and transaction
tables. The calendar
table sets the range by qualifying the starting and ending dates of a month. The transaction
table provides the transaction date. A range non-equijoin filters a Cartesian Product or cross join result set and returns those rows where the search value is inside the range.
SELECT c.month_short_name , t.transaction_amount FROM calendar c CROSS JOIN TRANSACTION t WHERE t.transaction_date BETWEEN c.start_date AND c.end_date ORDER BY EXTRACT(MONTH FROM t.transaction_date); |
This returns the following set from the setup data provided in the scripts at the end of the blog.
Month Name Amount ---------- ------ JAN 32.87 JAN 38.99 MAR 9.99 APR 43.19 |
The older syntax uses a comma separated list of tables, which is a cross join when there isn’t a equality join command word between the two.
SELECT c.month_short_name AS "Month Name" , t.transaction_amount AS "Amount" FROM calendar c , TRANSACTION t WHERE t.transaction_date BETWEEN c.start_date AND c.end_date ORDER BY EXTRACT(MONTH FROM t.transaction_date); |
An inequality non-equijoin
This example creates an non-equijoin where the inequality looks for the occurrence of values less than a value. The syntax that uses CROSS JOIN
is known as ANSI standard. The syntax using a comma separated list of tables is sometimes called Oracle Proprietary but is really only an older version of ANSI SQL when you exclude Oracle’s extensions.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english CROSS JOIN ordinal_french WHERE ordinal_english_id < ordinal_french_id AND ordinal_french_id = 3; |
The older syntax uses a comma separated list of tables, which is a cross join when there isn’t a equality join between the two.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id < ordinal_french_id AND ordinal_french_id = 3; |
Either of these produce the following output.
ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT -------------------- ------------------- One Trois Two Trois |
Anitjoins:
-
Rule(s):
- An antijoin must have two queries that use join statements, linked by a
MINUS
operator. - An antijoin that uses
FULL JOIN
andINNER JOIN
clauses, and must include a join statement in theFROM
clause of each query. - An antijoin that uses comma delimited tables must include a join statement in the
WHERE
clause.
Examples:
This example creates an antijoin where the values of a column in each table are equal. The syntax that uses both FULL JOIN
and INNER JOIN
is known as ANSI standard. It has two variants. One uses a ON
clause when the column names are equal or unequal. The other uses a USING
clause. The USING
clause only works when the column names in both tables are the same.
SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english FULL JOIN ordinal_french ON ordinal_english_id = ordinal_french_id MINUS SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english INNER JOIN ordinal_french ON ordinal_english_id = ordinal_french_id; |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
-------------------- -------------------
Sept
Huit
Two |
The ORDINAL_ENGLISH
table has a primary key column name that differs from the ORDINAL_ENGLISH
table’s primary key column name. You should note that while these are the primary keys, there are no database level constraints on any of the example tabes. You can use the USING
clause because the ORDINAL_OLD_ENGLISH
table has the same column name as the primary key.
SELECT ordinal_english_text , ordinal_old_english_text FROM ordinal_english FULL JOIN ordinal_old_english USING(ordinal_english_id) MINUS SELECT ordinal_english_text , ordinal_old_english_text FROM ordinal_english INNER JOIN ordinal_old_english USING(ordinal_english_id); |
This returns the following rows from the tables:
ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
-------------------- ------------------------
Eahtoda
Scofoda
Two |
As noted, the older syntax, sometimes called Oracle Proprietary, doesn’t use the FULL JOIN
semantic. While the join condition shifts from the FROM
clause to the WHERE
clause, you must use three queries and both a UNION
and MINUS
set operators. You enclose the first query and second query in parentheses to create a full outer join. This sets the order of precedence for the union before the minus set operation. The MINUS
operator subtracts the inner join, as demonstrated in the following example. The results are naturally the same as the previous query.
(SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id(+) = ordinal_french_id UNION SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id = ordinal_french_id(+)) MINUS SELECT ordinal_english_text , ordinal_french_text FROM ordinal_english, ordinal_french WHERE ordinal_english_id = ordinal_french_id; |
There are three types of typical joins. They are nested loop, or a variation known as block nested loop, merged join, and hash join. Nested loops are exhaustive searches and don’t require presorting of keys. Merged joins rely on presorting of keys and that’s why cost-based optimizers collect data to improve join speed. Hash joins are more complex and limited to only equijoin resolutions.
An example that highlights an inner join is shown below. The rows that meet the condition are highlighted in yellow. The rows left without highlighting are excluded from the join. The excluded rows in one or the other table would be the relative complement in an outer join and together they are the symmetric difference of the join.
I also have a JavaScript widget that renders it this way for a full outer join:
Unlike a Venn Diagram, this illustration exhibits the behavior of database tables in an equijoin. A column or series of columns constitutes the JOIN
or intersection between the two sets and a WHERE
clause can filter the rows returned based on a value or range comparison. Both of these processes serve to filter the data and collectively are known as the selection process. Once you have selected the data, you can have access to the rest of the columns in a the nested set or row of a table. You project through the data when you filter the columns to display only some of them, which is known as the projection process.
Inner Joins:
-
Rule(s):
- An inner join is most like a nested loop where the larger set is navigated by the outer loop and the smaller set navigated by the inner set.
- An inner join is more frequently implemented as a block nested loop (BNL), which means that inner join and both relative complements are collected in one pass of the inner loop. While more expensive in terms of computational cycles and time, this is more efficient than running multiple nested loops.
Examples:
Nested Loop
This example mimics an inner join where the values of a column in each table are equal. I chose the PHP programming language because it easily adapts to illustrating the problem with associative arrays. Associative arrays use either numbers or text strings as index values to elements of arrays.
You should notice that the program uses the column names as index values in the arrays, respectively english_id
and english_text
and french_id
and french_text
for english
and french
tables. The sample program implements a simple nested join to match and return desired data from the two arrays.
<?php // Declare two arrays, two with data and one for the combined data. $english = array(array("ENGLISH_ID"=>"1","ENGLISH_TEXT"=>"One") ,array("ENGLISH_ID"=>"2","ENGLISH_TEXT"=>"Two") ,array("ENGLISH_ID"=>"3","ENGLISH_TEXT"=>"Three") ,array("ENGLISH_ID"=>"4","ENGLISH_TEXT"=>"Four") ,array("ENGLISH_ID"=>"5","ENGLISH_TEXT"=>"Five") ,array("ENGLISH_ID"=>"6","ENGLISH_TEXT"=>"Six")); $french = array(array("FRENCH_ID"=>"1","FRENCH_TEXT"=>"Un") ,array("FRENCH_ID"=>"3","FRENCH_TEXT"=>"Trois") ,array("FRENCH_ID"=>"4","FRENCH_TEXT"=>"Quatre") ,array("FRENCH_ID"=>"5","FRENCH_TEXT"=>"Cinque") ,array("FRENCH_ID"=>"6","FRENCH_TEXT"=>"Six") ,array("FRENCH_ID"=>"7","FRENCH_TEXT"=>"Sept") ,array("FRENCH_ID"=>"8","FRENCH_TEXT"=>"Huit")); $result_set = array(); // The larger array is always the outer loop and inner loop the smaller array. for ($i = 0;$i < count($french);$i++) for ($j = 0;$j < count($english);$j++) // Perform a match for an intersection or inner join of two array sets. if ($english[$j]["ENGLISH_ID"] == $french[$i]["FRENCH_ID"]) // This selects the elements from the joined set, like the SELECT clause in SQL. $result_set[] = "[".$english[$j]["ENGLISH_TEXT"]."] [".$french[$i]["FRENCH_TEXT"]."]"; // Print the result set of the join. print_r($result_set); ?> |
The program is designed to be run from the command line. Most people think of PHP as an exclusively web programming tool, but as an interpreted language and you can run it from the command line. You start by putting this code in to a file named InnerJoin.php
. Then, issue the following command.
php InnerJoin.php |
It uses the print_r()
function that prints the contents of an array. You see the following result set:
Array ( [0] => [One] [Un] [1] => [Three] [Trois] [2] => [Four] [Quatre] [3] => [Five] [Cinque] [4] => [Six] [Six] ) |
The numbers 0 through 4 indicates the element numbers in the new array. A database would put the column names index position 0 (calling it metadata), and the data in rows indexed 1 through 5. You see this data again for the left, right, and full join examples but the system_user
data for the self referencing join seems like a better set to demonstrate a BNL structure.
You could restructure the problem a bit in PHP and eliminate the for loops by using the array_intersect
but that wouldn’t mimic the database tables.
Block Nested Loop
This example borrows the setup data from the self referencing join used earlier on the page. You make three copies of the base array, which is like using table aliasing in SQL statements. It uses a block nested loop (BNL) to navigate through the parent array of the query only once. The block is created by the curly braces around the nested loops.
The first loop gathers the information for the system_user_name
of the base row, and system_user_name
of the row that maps to the user who created the row. The second loop gathers the same information for the system_user_name
that last updated the row.
<?php // Declare two arrays, two with data and one for the combined data. $su = array(array("SYSTEM_USER_ID"=>1,"SYSTEM_USER_NAME"=>"SYSADMIN","CREATED_BY"=>1,"LAST_UPDATED_BY"=>1) ,array("SYSTEM_USER_ID"=>2,"SYSTEM_USER_NAME"=>"DBA 1","CREATED_BY"=>1,"LAST_UPDATED_BY"=>2) ,array("SYSTEM_USER_ID"=>3,"SYSTEM_USER_NAME"=>"DBA 2","CREATED_BY"=>1,"LAST_UPDATED_BY"=>1) ,array("SYSTEM_USER_ID"=>4,"SYSTEM_USER_NAME"=>"DBA 3","CREATED_BY"=>1,"LAST_UPDATED_BY"=>3) ,array("SYSTEM_USER_ID"=>5,"SYSTEM_USER_NAME"=>"APPS DBA","CREATED_BY"=>4,"LAST_UPDATED_BY"=>4)); // Assign aliases to the single array set. $su1 = $su; $su2 = $su; $su3 = $su; // In self join they're all equal. for ($i = 0;$i < count($su);$i++) { for ($j = 0;$j < count($su);$j++) // Perform a match for an intersection or inner join of the two sets on a primary to foreign key match. if ($su2[$j]["SYSTEM_USER_ID"] == $su1[$i]["CREATED_BY"]) // This selects the elements from the joined set, like the SELECT clause in SQL. $result_set[$i] = "[".$su1[$i]["SYSTEM_USER_ID"]."] [".$su1[$i]["SYSTEM_USER_NAME"]."] [".$su2[$j]["SYSTEM_USER_NAME"]."]"; for ($j = 0;$j < count($su);$j++) // Perform a match for an intersection or inner join of the two sets on a primary to foreign key match. if ($su3[$j]["SYSTEM_USER_ID"] == $su1[$i]["LAST_UPDATED_BY"]) // This selects the elements from the joined set, like the SELECT clause in SQL. $result_set[$i] .= " [".$su3[$j]["SYSTEM_USER_NAME"]."]"; } // Print the result set of the join. print_r($result_set); ?> |
You can get the same answer by using two nested loop structures but it’s less efficient than a BNL solution. The script is run the same as the other PHP program, and it returns the following data set:
Array ( [0] => [1] [SYSADMIN] [SYSADMIN] [SYSADMIN] [1] => [2] [DBA 1] [SYSADMIN] [DBA 1] [2] => [3] [DBA 2] [SYSADMIN] [SYSADMIN] [3] => [4] [DBA 3] [SYSADMIN] [DBA 2] [4] => [5] [APPS DBA] [DBA 3] [DBA 3] ) |
Setup Script
Oracle SQL Join Script
-- ------------------------------------------------------------------ -- This part seeds joins, except the range non-equijoin section. -- ------------------------------------------------------------------ -- Conditionally drop tables when they're found. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('ORDINAL_ENGLISH' ,'ORDINAL_FRENCH' ,'ORDINAL_OLD_ENGLISH')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; END; / -- Create tables. CREATE TABLE ordinal_english ( ordinal_english_id NUMBER , ordinal_english_text VARCHAR2(26)); CREATE TABLE ordinal_french ( ordinal_french_id NUMBER , ordinal_french_text VARCHAR2(26)); CREATE TABLE ordinal_old_english ( ordinal_english_id NUMBER , ordinal_old_english_text VARCHAR2(26)); -- Seed tables. DECLARE -- Define local collection types. TYPE number_list IS TABLE OF NUMBER; TYPE string_list IS TABLE OF VARCHAR2(26); -- Declare variables. ordinal_left NUMBER_LIST := number_list(1,2,3,4,5,6); ordinal_right NUMBER_LIST := number_list(1,3,4,5,6,7,8); english_text1 STRING_LIST := string_list('One','Two','Three','Four','Five','Six'); english_text2 STRING_LIST := string_list('Fyrmest','Pridda','Feorda','Fifta','Siexta','Scofoda','Eahtoda'); french_text STRING_LIST := string_list('Un','Trois','Quatre','Cinq','Six','Sept','Huit'); BEGIN FOR i IN 1..ordinal_left.COUNT LOOP INSERT INTO ordinal_english VALUES (ordinal_left(i),english_text1(i)); END LOOP; FOR i IN 1..ordinal_right.COUNT LOOP INSERT INTO ordinal_french VALUES (ordinal_right(i),french_text(i)); INSERT INTO ordinal_old_english VALUES (ordinal_right(i),english_text2(i)); END LOOP; -- Commit the work. COMMIT; END; / -- ------------------------------------------------------------------ -- This part seeds for the self referencing query section. -- ------------------------------------------------------------------ -- Conditionally drop objects. BEGIN FOR i IN (SELECT NULL FROM user_tables WHERE TABLE_NAME = 'SYSTEM_USER') LOOP EXECUTE IMMEDIATE 'DROP TABLE system_user CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT NULL FROM user_sequences WHERE sequence_name = 'SYSTEM_USER_S1') LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE system_user_s1'; END LOOP; END; / -- Create SYSTEM_USER table and sequence and seed data. CREATE TABLE system_user ( system_user_id NUMBER CONSTRAINT pk_system_user1 PRIMARY KEY , system_user_name VARCHAR2(20) CONSTRAINT nn_system_user1 NOT NULL , system_user_group_id NUMBER CONSTRAINT nn_system_user2 NOT NULL , system_user_type NUMBER CONSTRAINT nn_system_user3 NOT NULL , last_name VARCHAR2(20) , first_name VARCHAR2(20) , middle_initial VARCHAR2(1) , created_by NUMBER CONSTRAINT nn_system_user4 NOT NULL , creation_date DATE CONSTRAINT nn_system_user5 NOT NULL , last_updated_by NUMBER CONSTRAINT nn_system_user6 NOT NULL , last_update_date DATE CONSTRAINT nn_system_user7 NOT NULL , CONSTRAINT fk_system_user1 FOREIGN KEY(created_by) REFERENCES system_user(system_user_id) , CONSTRAINT fk_system_user2 FOREIGN KEY(last_updated_by) REFERENCES system_user(system_user_id)); INSERT INTO system_user VALUES ( 1,'SYSADMIN', 1, 1, '', '', '', 1, SYSDATE, 1, SYSDATE); INSERT INTO system_user VALUES ( 2,'DBA', 2, 1,'Adams','Samuel', '', 1, SYSDATE, 2, SYSDATE); INSERT INTO system_user VALUES ( 3,'DBA', 2, 1,'Henry','Patrick', '', 1, SYSDATE, 1, SYSDATE); INSERT INTO system_user VALUES ( 4,'DBA', 2, 1,'Puri','Manmohan', '', 1, SYSDATE, 3, SYSDATE); INSERT INTO system_user VALUES ( 5,'APPS DBA', 2, 1,'Bremen','Gunther', '', 4, SYSDATE, 4, SYSDATE); -- Create the constraint to leave room for seeding new application default values. CREATE SEQUENCE system_user_s1 START WITH 1001; -- ------------------------------------------------------------------ -- This part seeds the range non-equijoin section. -- ------------------------------------------------------------------ -- Create tables and sequences. CREATE TABLE calendar ( calendar_id NUMBER CONSTRAINT pk_calendar1 PRIMARY KEY , month_short_name VARCHAR2(3) CONSTRAINT nn_calendar1 NOT NULL , month_long_name VARCHAR2(10) CONSTRAINT nn_calendar2 NOT NULL , start_date DATE CONSTRAINT nn_calendar3 NOT NULL , end_date DATE CONSTRAINT nn_calendar4 NOT NULL); CREATE SEQUENCE calendar_s1; CREATE TABLE TRANSACTION ( transaction_id NUMBER CONSTRAINT pk_transaction1 PRIMARY KEY , transaction_amount NUMBER CONSTRAINT nn_transaction1 NOT NULL , transaction_date DATE CONSTRAINT nn_transaction2 NOT NULL); CREATE SEQUENCE transaction_s1; -- Seed the CALENDAR table. INSERT INTO calendar VALUES (calendar_s1.nextval,'JAN','January','01-JAN-09','31-JAN-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'FEB','February','01-FEB-09','28-FEB-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'MAR','March','01-MAR-09','31-MAR-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'APR','April','01-APR-09','30-APR-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'MAY','May','01-MAY-09','31-MAY-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'JUN','June','01-JUN-09','30-JUN-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'JUL','July','01-JUL-09','31-JUL-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'AUG','August','01-AUG-09','31-AUG-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'SEP','September','01-SEP-09','30-SEP-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'OCT','October','01-OCT-09','31-OCT-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'NOV','November','01-NOV-09','30-NOV-09'); INSERT INTO calendar VALUES (calendar_s1.nextval,'DEC','Decenber','01-DEC-09','31-DEC-09'); -- Seed the TRANSACTION table. INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,38.99,'16-JAN-09'); INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,32.87,'21-JAN-09'); INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,9.99,'25-MAR-09'); INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,43.19,'13-APR-09'); |
MySQL SQL Join Script
-- Coming shortly. |
I hope this help a few folks trying to figure out how to join tables.
UNION differs from FULL OUTER JOIN as it will remove duplicates.
SELECT ordinal_english_text
, ordinal_french_text
FROM ordinal_english, ordinal_french
WHERE ordinal_english_id(+) = ordinal_french_id
UNION
SELECT ordinal_english_text
, ordinal_french_text
FROM ordinal_english, ordinal_french
WHERE ordinal_english_id = ordinal_french_id(+);
this will remove the duplicates for ordinal_english_text , ordinal_french_text
use UNION ALL instead
SQL> create table lsc_t1(x number, y number)
Table created.
SQL> insert into lsc_t1 values(1,1)
1 row created.
SQL> insert into lsc_t1 values(1,1)
1 row created.
SQL> insert into lsc_t1 values(2,2)
1 row created.
SQL> create table lsc_t2(y number, z number)
Table created.
SQL> insert into lsc_t2 values(2,2)
1 row created.
SQL> insert into lsc_t2 values(3,3)
1 row created.
SQL> insert into lsc_t2 values(3,3)
1 row created.
SQL> commit
Commit complete.
SQL> select x,y,z from lsc_t1 full join lsc_t2 using (y) order by y
X Y Z
———- ———- ———-
1 1
1 1
2 2 2
3 3
3 3
5 rows selected.
SQL> select x,lsc_t1.y,z from lsc_t1 , lsc_t2 where lsc_t1.y=lsc_t2.y(+) order by y
X Y Z
———- ———- ———-
1 1
1 1
2 2 2
3 rows selected.
SQL> select x,lsc_t2.y,z from lsc_t1 , lsc_t2 where lsc_t1.y(+)=lsc_t2.y order by y
X Y Z
———- ———- ———-
2 2 2
3 3
3 3
3 rows selected.
SQL> select x,lsc_t1.y,z from lsc_t1 , lsc_t2 where lsc_t1.y=lsc_t2.y(+)
union
select x,lsc_t2.y,z from lsc_t1 , lsc_t2 where lsc_t1.y(+)=lsc_t2.y order by y
X Y Z
———- ———- ———-
1 1
2 2 2
3 3
3 rows selected.
SQL> select x,lsc_t1.y,z from lsc_t1 , lsc_t2 where lsc_t1.y=lsc_t2.y(+)
union all
select x,lsc_t2.y,z from lsc_t1 , lsc_t2 where lsc_t1.y(+)=lsc_t2.y and lsc_t1.y is null order by y
X Y Z
———- ———- ———-
1 1
1 1
2 2 2
3 3
3 3
5 rows selected.
Laurent Schneider
4 Feb 09 at 4:53 am
Natural join can be also outer join. Actually I’d say it is just another (potentially dangerous!!!) syntactical notation. And I’ve tried to explain joins here http://www.gplivna.eu/papers/sql_join_types.htm
Gints Plivna
15 Feb 09 at 2:12 pm
Thanks for the observation. I concur wholeheartedly with your observations that there’s nothing natural about natural joins. 🙂 BTW, that’s a nice page.
maclochlainn
15 Feb 09 at 3:06 pm
Nice writeup. I like the graphic on left, right and inner join. I did an extension of it here:
http://db-optimizer.blogspot.com/2009/06/sql-joins.html
Kyle Hailey
26 Jun 09 at 10:59 am
Really nice post. I haven’t seen the execution logic presented anywhere other than here. I was under the impression that it had to be nested loops for JOINS but no one has ever said this.
Thank you 🙂
Preston
Preston
16 Mar 11 at 1:46 pm
My semi-complete join tutorials are here. It may help you a bit more, this was a base posting that led to a more complete discussion and comparison of the various join semantics. These smaller pages also load more quickly. 😉
maclochlainn
16 Mar 11 at 9:50 pm
Neat post, very cleat and thorough enough to help me understand SQL.
I’m new to the subject and come from a very different platform.
You have helped me alot in begining the road.
Patricia Echeverry
4 Aug 11 at 11:29 am
You might like this other website, it’s still a work-in-progress.
maclochlainn
4 Aug 11 at 2:28 pm
[…] Outer Joins Graphically December 30th, 2009 Goto comments Leave a comment Outer Joins If English and French both have a unique key on the “ordinal_id” then it’s basically one-to-one relationship We add an arrow in the middle of the line to denote “outer join”. The arrow points from the table that drives the join, ie all the rows in the table pointed from are returned even if a match isn’t found in the table pointed to. idea for the above graphic came from Join Mapping which was originally posted on blog page […]
DB Optimizer » Outer Joins Graphically
20 Oct 11 at 12:23 pm
[…] Basic SQL Join Semantics https://blog.mclaughlinsoftware.comoracle-sql-programming/basic-sql-join-semantics […]
Useful Links « Maleshg
13 Feb 12 at 11:36 pm
[…] looking at SQL joins […]
ANSI SQL Joins
25 Apr 13 at 3:47 pm
false RESULT :
HOW CAN GET THIS RESULT
TRUE RESULT :
I WANT CODE TO RUN TRUE RESULT
BELAL
29 Apr 13 at 7:05 am
While I’d like to help, there’s not enough information to help you. The output appears different than what would be anticipated by the query? 🙁
maclochlainn
30 Apr 13 at 12:28 am
I personally speculate how come you called this blog, “Basic SQL
Join Semantics | MacLochlainns Weblog”. In any event .
I admired the article!I appreciate it-Seymour
Heather
11 Nov 13 at 4:37 pm
[…] above graphic originally on […]
Outer Joins : which side does the (+) go on
5 Mar 14 at 2:43 pm
Awesome blog !!!!
udhayakumar
16 Jul 14 at 5:26 am
An SQL JOIN section utilized to unite rows from two or added tables; situate on an ordinary field among them.
swati sharma
17 Jul 15 at 5:03 am