MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Basic SQL join semantics

with 17 comments

It’s almost funny reflecting on when I first learned how to write SQL in 1985. It was using SQL/DS 1.1 (Structured Query Language/Data System) on a VMS/CMS operating system at the IBM center in the Los Angeles, California. That was in the Crocker Bank Building when I worked for First Interstate Bank. The building is still there and so are databases but both banks are long gone.

After 24 years of SQL, a quick primer on joins seems like a good idea. I have a simple example that should help illustrate various joins and SQL semantics.

Definitions

Before jumping into the SQL join statements, setting up some basic information seems like a good idea. Here are some basic definitions:

SQL JOIN Concepts:

  • A table is an array of arrays, which means a row in any table contains an array. The definition of a table defines the data structure of a row or array.
  • Selection is the process of identifying, accessing, and joining data into return data sets. The basic elements of selection in a SQL statement are the FROM and WHERE clauses.
    • The ANSI 89 syntax uses a cross join or comma-delimited list of tables, and places the join in the WHERE clause.
    • The ANSI 92 syntax uses a set of key words like INNER JOIN, LEFT JOIN, et cetera, and puts the join in an ON or USING clause, which are often referred to as subclauses.
  • Projection is the process of paring columns from filtered data sets. Projection is determined by the list of columns found in a SELECT clause.

SQL JOIN Logic: (Check wikipedia for more detail …)

  • A SQL join without a relationship creates what’s known as a Cartesian Product, which is a result set that consists of all rows from one table matched with all row in another table. This type of join is also known as a CROSS JOIN.
  • A SQL join with a relationship may be one of two types:
    • A SQL equijoin (also known as a natural or semijoin) creates a relationship between two tables based on a comparison of values found in one or a set of columns in one table and one or an equal set of columns in another table.
    • A SQL non-equijoin (also known as a θ-join) effects a relation between two tables based on a filtered CROSS JOIN between two tables. This type of join can be a range comparison using the BETWEEN operator or a comparison of column values that uses an inequality operator.
  • A SQL antijoin may be one of two things, at least typically, and they are:
    • A relative complement, which is everything in another table not found in a table.
    • A symmetric difference, which is the relative complement of both tables in a join.

SQL JOIN Implementations:

  • A CROSS JOIN implements a Cartesian Product, which returns rows combined from both tables. It matches every row in one table against every row in another table. For example, if you have 3 rows in one table and 2 rows in another you’ll have 2 times 3, or 6, rows in the result set.
  • An INNER JOIN implements an equijoin, which returns rows combined from both tables when the values in the join condition match. Join conditions can include one or more columns.
  • A NATURAL JOIN implements an equijoin, which returns rows combined from both tables when the values in the join condition match. While the join conditions can include one or more columns, you have no choice in selecting the join columns. A NATURAL JOIN checks the definition of both tables (in the data catalog or metadata) and looks for like named columns. It then joins the table based on the values of all like named columns from both tables. One might say it’s an unnatural behavior because a developer has no way to override it. I’d concur with calling it unnatural and for that matter wrote much more about it in this older post.
  • A LEFT JOIN implements an equijoin and relative complement, which returns rows combined from both tables when the values in the join condition match and the rows in the left table that aren’t found in the right table. This makes a LEFT JOIN a combination of an INNER JOIN and a relative complement of the table on the right of the join. This nuance becomes important when we examine the (+) semantic implemented by Oracle.
  • A RIGHT JOIN implements an equijoin and relative complement, which returns rows combined from both tables when the values in the join condition match and the rows in the right table that aren’t found in the left table. This makes a RIGHT JOIN the opposite of a LEFT JOIN. Naturally, it is also a combination of an INNER JOIN and a relative complement of the left table. Like the LEFT JOIN, this nuance becomes important when we examine the (+) semantic implemented by Oracle.
  • A FULL JOIN implements an equijoin and both relative complements, which returns rows combined from both tables when the values in the join condition match and the rows in both the left and right tables that aren’t found in their respective other tables. This makes a FULL JOIN a combination of an INNER JOIN and relative complements of the both tables. This type of join can’t be done with the (+) alone in the older syntax. It requires left and right outer queries glued together by a UNION set operator. The UNION set operator eliminates duplicate from the intersection of the two tables, which are created by gluing together two outer joins.
  • An Antijoin implements an outer join minus an equijoin, which returns a relative complement when the first query is a LEFT JOIN or RIGHT JOIN and a symmetric difference when the first query is a FULL JOIN.

SQL Join Inheritance Tree:

Unified Modeling Language (UML) is great for modeling concrete software but is also good for illustrating abstract models. The following uses UML to depict the abstract relationship between the various join types.

joininheritancetree

It’s borrowed from the SQL Primer (Appendix B) in my Oracle Database 11g PL/SQL Programming book. Duly footnoted.

SQL Self Joins:

    A self join occurs when you create a temporary copy of a table by assigning it an alias. Good SQL coding practices (from my perspective) would involve giving both instances of the table aliases. There is no reason why you can’t have three or four copies of a table in a self join. You can find an example of a self join in the Inner Join section later in the blog page.

SQL Join Examples

The SQL join statements are based on a little model that you’ll find at the end of the blog page. You should be able to copy it directly into SQL*Plus or MySQL. Naturally, the Oracle-only examples work exclusively in Oracle.

Cross Joins:

    Rule(s):

    1. A cross join doesn’t have a join statement.

    Examples:

    This example creates a Cartesian Product of 42 rows because the first table has 6 rows and the second has 7. The syntax that uses CROSS JOIN is known as ANSI standard.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english CROSS JOIN ordinal_french;

    The older syntax, sometimes called Oracle Proprietary, doesn’t use the CROSS JOIN semantic. You simply write the tables as a comma separated list, like the next example. The results are exactly the same.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english, ordinal_french;

Inner Joins:

    Rule(s):

    1. An inner join must have a join statement.
    2. An inner join that uses the INNER JOIN clause must include a join statement in the FROM clause.
    3. An inner join that uses comma delimited tables must include a join statement in the WHERE clause.

    Examples:

    This example creates an inner join where the values of a column in each table are equal. The syntax that uses INNER JOIN is known as ANSI standard. The INNER key word is optional, which make a JOIN equivalent to an inner join.

    It has two variants. One uses an ON clause when the column names are equal or unequal. The other uses a USING clause. The USING clause only works when the column names in both tables are the same. You can also pass a list of comma separated column names as arguments to the USING clause because it acts like (and is more or less) an overloaded function.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english INNER JOIN ordinal_french
    ON       ordinal_english_id = ordinal_french_id;

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
    -------------------- --------------------
    One                  Un
    Three                Trois
    Four                 Quatre
    Five                 Cinq
    Six                  Six

    The ORDINAL_ENGLISH table has a primary key column name that differs from the ORDINAL_ENGLISH table’s primary key column name. The ORDINAL_OLD_ENGLISH table has the same column name for the primary key, which lets you use the USING clause.

    SELECT   ordinal_english_text
    ,        ordinal_old_english_text
    FROM     ordinal_english INNER JOIN ordinal_old_english USING(ordinal_english_id);

    You can put more than one column name in a USING clause. There may be an implementation limit but I’ve not found it. It appears that the parser treats the list recursively. The syntax for that is the following:

    USING(column_name1, column_name2, ...)

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
    -------------------- ------------------------
    One                  Fyrmest
    Three                Pridda
    Four                 Feorda
    Five                 Fifta
    Six                  Siexta

    The older syntax, sometimes called Oracle Proprietary but really ANSI 89, doesn’t use the INNER JOIN semantic. You simply write the tables as a comma separated list, like the next example. The join condition shifts from the FROM clause to the WHERE clause. The results are exactly the same as the similar query above.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english, ordinal_french
    WHERE    ordinal_english_id = ordinal_french_id;

    SQL Self Referencing Queries

    Interchangeable primary and foreign keys in a single column

    A self referencing query uses multiple copies of a table. This works by aliasing the name of the table, which effectively creates multiple record sets in memory. Two examples of a self referencing query are provided using the ordinal_french table.

    Both use table aliases and column aliases. You should note that column aliases may be preceded by an AS but it isn’t required. The same is not true for table aliases because using an AS triggers a run time error. Another tidbit – Oracle SQL*Plus lets you use double quotes to create column names with white space and strings that begin with numbers.

    This query uses the INNER JOIN syntax:

    SELECT   of1.ordinal_french_text AS "French Text #1"
    ,        of2.ordinal_french_text AS "French Text #2"
    FROM     ordinal_french of1 INNER JOIN ordinal_french of2
    ON       of1.ordinal_french_id = of2.ordinal_french_id;

    This query uses the comma separated list of tables syntax:

    SELECT   of1.ordinal_french_text AS "French Text #1"
    ,        of2.ordinal_french_text AS "French Text #2"
    FROM     ordinal_french of1, ordinal_french of2
    WHERE    of1.ordinal_french_id = of2.ordinal_french_id;

    Both produce the following result set:

    French Text #1  French Text #2
    -------------- --------------
    Un             Un
    Trois          Trois
    Quatre         Quatre
    Cinq           Cinq
    Six            Six
    Sept           Sept
    Huit           Huit

    Primary and foreign keys in different columns

    The preceding self referencing query is straightforward but can be misleading. You may wonder why it isn’t that simple. The reason why is that the preceding queries use the same column as both the primary and foreign key. Generally, that’s not the case when you use self referencing joins. Self referencing joins typically use the primary to foreign key relationship between two columns.

    Assume you had a system_user table and it contained a system_user_id as the primary key, system_user_name as a non-key column, and who audit columns, like a created_by and last_updated_by column that held foreign key values that pointed back to the system_user_id column, how would you get the user name for a row and the user name responsible for both creating and updating the row. In this type of example, the primary key can be in the same row or a different row.

    This query represents the solution to the problem:

    SELECT   su1.system_user_name AS "System User"
    ,        su2.system_user_name AS "Creating User"
    ,        su3.system_user_name AS "Updating User"
    FROM     system_user su1
    ,        system_user su2
    ,        system_user su3
    WHERE    su1.created_by = su2.system_user_id      -- Primary(FK) to Secondary (PK)
    AND      su1.last_updated_by = su3.system_user_id -- Primary(FK) to Tertiary(PK)
    ORDER BY su1.system_user_id;

    You can see that you need to match the created_by and last_updated_by column values to the the row that contains the same system_user_id column value. This requires that you create three copies of the table, and mimics the behavior of two nested loops inside a master loop. The master loop has the primary user name, and the nested loops hold the data for the user that created or updated the row. This is illustrated in the following drawing.

    selfreference3table

    The query produces a Cartesian Product of 53 or 125 rows. The rows consist of three copies of the columns from the table glued together into long rows. Only 5 of the rows meet the two join conditions, which means the rest are discarded. A Cartesian Product is the best way to visualize this process because it lets us see in a 2-dimensional model n-dimensions of space. Every table, or copy of a table, represents a spatial dimension in a join. You can find the mechanics of how joins really work in the SQL join logic section later in the blog page. The script yields the following output:

    System User  Creating User  Updating User
    ------------ -------------- --------------
    SYSADMIN     SYSADMIN       SYSADMIN
    DBA 1        SYSADMIN       DBA 1
    DBA 2        SYSADMIN       SYSADMIN
    DBA 3        SYSADMIN       DBA 2
    APPS DBA     DBA 3          DBA 3

    Naturally, there is a way to do this in the more current ANSI SQL with the INNER JOIN phrase. Here’s the equivalent using the newer syntax:

    SELECT   su1.system_user_name AS "System User"
    ,        su2.system_user_name AS "Creating User"
    ,        su3.system_user_name AS "Updating User"
    FROM     system_user su1
    JOIN     system_user su2                          -- Inner is an optional identifier
    ON       su1.created_by = su2.system_user_id      -- Primary(FK) to Secondary (PK)
    JOIN     system_user su3                          -- Inner is an optional identifier
    ON       su1.last_updated_by = su3.system_user_id -- Primary(FK) to Tertiary(PK)
    ORDER BY su1.system_user_id;

Left Joins:

    Rule(s):

    1. A left join must have a join statement.
    2. A left join that uses the LEFT JOIN clause must include a join statement in the FROM clause.
    3. A left join that uses comma delimited tables must include a join statement in the WHERE clause.

    Examples:

    This example creates an inner join where the values of a column in each table are equal. The syntax that uses LEFT JOIN is known as ANSI standard. It has two variants. One uses a ON clause when the column names are equal or unequal. The other uses a USING clause. The USING clause only works when the column names in both tables are the same.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english LEFT JOIN ordinal_french
    ON       ordinal_english_id = ordinal_french_id;

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
    -------------------- --------------------
    One                  Un
    Three                Trois
    Four                 Quatre
    Five                 Cinq
    Six                  Six
    Two

    The ORDINAL_ENGLISH table has a primary key column name that differs from the ORDINAL_ENGLISH table’s primary key column name. The ORDINAL_OLD_ENGLISH table has the same column name for the primary key, which lets you use the USING clause.

    SELECT   ordinal_english_text
    ,        ordinal_old_english_text
    FROM     ordinal_english LEFT JOIN ordinal_old_english USING(ordinal_english_id);

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
    -------------------- ------------------------
    One                  Forma
    Three                Pridda
    Four                 Feorda
    Five                 Fifta
    Six                  Siexta
    Two

    The older syntax, sometimes called Oracle Proprietary, doesn’t use the LEFT JOIN semantic. You simply write the tables as a comma separated list, like the next example. The join condition shifts from the FROM clause to the WHERE clause. The (+) following the ordinal_french_id column in the join indicates that you want the relative complement of that table, which are all non-matching values in the ordinal_english table. The results are exactly the same.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english, ordinal_french
    WHERE    ordinal_english_id = ordinal_french_id(+);

Right Joins:

    Rule(s):

    1. A right join must have a join statement.
    2. A right join that uses the RIGHT JOIN clause must include a join statement in the FROM clause.
    3. A right join that uses comma delimited tables must include a join statement in the WHERE clause.

    Examples:

    This example creates an right join where the values of a column in each table are equal. The syntax that uses RIGHT JOIN is known as ANSI standard. It has two variants. One uses a ON clause when the column names are equal or unequal. The other uses a USING clause. The USING clause only works when the column names in both tables are the same.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english RIGHT JOIN ordinal_french
    ON       ordinal_english_id = ordinal_french_id;

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
    -------------------- --------------------
    One                  Un
    Three                Trois
    Four                 Quatre
    Five                 Cinq
    Six                  Six
                         Eahtoda
                         Scofoda

    The ORDINAL_ENGLISH table has a primary key column name that differs from the ORDINAL_ENGLISH table’s primary key column name. The ORDINAL_OLD_ENGLISH table has the same column name for the primary key, which lets you use the USING clause.

    SELECT   ordinal_english_text
    ,        ordinal_old_english_text
    FROM     ordinal_english RIGHT JOIN ordinal_old_english USING(ordinal_english_id);

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
    -------------------- ------------------------
    One                  Forma
    Three                Pridda
    Four                 Feorda
    Five                 Fifta
    Six                  Siexta
                         Eahtoda
                         Scofoda

    The older syntax, sometimes called Oracle Proprietary, doesn’t use the RIGHT JOIN semantic. You simply write the tables as a comma separated list, like the next example. The join condition shifts from the FROM clause to the WHERE clause. The (+) following the ordinal_english_id column in the join indicates that you want the relative complement of that table, which are all non-matching values in the ordinal_french table. The results are exactly the same.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english, ordinal_french
    WHERE    ordinal_english_id(+) = ordinal_french_id;

Full Joins:

    Rule(s):

    1. A full join must have a join statement.
    2. A full join that uses the FULL JOIN clause must include a join statement in the FROM clause.
    3. A full join that uses a list of comma separated tables must include a join statement in the WHERE clause.

    Examples:

    This example creates an full join where the values of a column in each table are equal. The syntax that uses FULL JOIN is known as ANSI standard. It has two variants. One uses a ON clause when the column names are equal or unequal. The other uses a USING clause. The USING clause only works when the column names in both tables are the same.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english FULL JOIN ordinal_french
    ON       ordinal_english_id = ordinal_french_id;

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
    -------------------- -------------------
    One                  Un
    Three                Trois
    Four                 Quatre
    Five                 Cinq
    Six                  Six
                         Sept
                         Huit
    Two

    The ORDINAL_ENGLISH table has a primary key column name that differs from the ORDINAL_ENGLISH table’s primary key column name. The ORDINAL_OLD_ENGLISH table has the same column name for the primary key, which lets you use the USING clause.

    SELECT   ordinal_english_text
    ,        ordinal_old_english_text
    FROM     ordinal_english FULL JOIN ordinal_old_english USING(ordinal_english_id);

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
    -------------------- ------------------------
    One                  Forma
    Three                Pridda
    Four                 Feorda
    Five                 Fifta
    Six                  Siexta
                         Eahtoda
                         Scofoda
    Two

    The older syntax, sometimes called Oracle Proprietary, doesn’t use the FULL JOIN semantic. You can’t simply write the tables as a comma separated list, like the other example. While the join condition shifts from the FROM clause to the WHERE clause, you must use two queries and a UNION set operator. The (+) following the ordinal_english_id column in one query, and the ordinal_french_id in the second query. You should note that the UNION eliminates the duplicated intersection values from the two outer join statements. This is demonstrated in the following example.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english, ordinal_french
    WHERE    ordinal_english_id(+) = ordinal_french_id
    UNION
    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english, ordinal_french
    WHERE    ordinal_english_id = ordinal_french_id(+);

Non-equijoins:

    Rule(s):

    1. A non-equijoin doesn’t have an equijoin statement.
    2. A non-equijoin that uses the CROSS JOIN doesn’t have a join in the FROM clause.
    3. A non-equijoin that uses a comma separated list of tables doesn’t have an equijoin in the WHERE clause.

    Examples:

    A range non-equijoin

    This example creates an non-equijoin where the comparison of values looks for the occurrence of values in an inclusive range of values. The syntax that uses CROSS JOIN is known as ANSI standard. The syntax using a comma separated list of tables is sometimes called Oracle Proprietary but is really only an older version of ANSI SQL when you exclude some Oracle extensions.

    This example uses a traditional model with calendar and transaction tables. The calendar table sets the range by qualifying the starting and ending dates of a month. The transaction table provides the transaction date. A range non-equijoin filters a Cartesian Product or cross join result set and returns those rows where the search value is inside the range.

    SELECT   c.month_short_name
    ,        t.transaction_amount
    FROM     calendar c CROSS JOIN TRANSACTION t
    WHERE    t.transaction_date BETWEEN c.start_date AND c.end_date
    ORDER BY EXTRACT(MONTH FROM t.transaction_date);

    This returns the following set from the setup data provided in the scripts at the end of the blog.

    Month Name Amount
    ---------- ------
    JAN         32.87
    JAN         38.99
    MAR          9.99
    APR         43.19

    The older syntax uses a comma separated list of tables, which is a cross join when there isn’t a equality join command word between the two.

    SELECT   c.month_short_name AS "Month Name"
    ,        t.transaction_amount AS "Amount"
    FROM     calendar c , TRANSACTION t
    WHERE    t.transaction_date BETWEEN c.start_date AND c.end_date
    ORDER BY EXTRACT(MONTH FROM t.transaction_date);

    An inequality non-equijoin

    This example creates an non-equijoin where the inequality looks for the occurrence of values less than a value. The syntax that uses CROSS JOIN is known as ANSI standard. The syntax using a comma separated list of tables is sometimes called Oracle Proprietary but is really only an older version of ANSI SQL when you exclude Oracle’s extensions.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english CROSS JOIN ordinal_french
    WHERE    ordinal_english_id < ordinal_french_id
    AND      ordinal_french_id = 3;

    The older syntax uses a comma separated list of tables, which is a cross join when there isn’t a equality join between the two.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english, ordinal_french
    WHERE    ordinal_english_id < ordinal_french_id
    AND      ordinal_french_id = 3;

    Either of these produce the following output.

    ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
    -------------------- -------------------
    One                  Trois
    Two                  Trois

Anitjoins:

    Rule(s):

    1. An antijoin must have two queries that use join statements, linked by a MINUS operator.
    2. An antijoin that uses FULL JOIN and INNER JOIN clauses, and must include a join statement in the FROM clause of each query.
    3. An antijoin that uses comma delimited tables must include a join statement in the WHERE clause.

    Examples:

    This example creates an antijoin where the values of a column in each table are equal. The syntax that uses both FULL JOIN and INNER JOIN is known as ANSI standard. It has two variants. One uses a ON clause when the column names are equal or unequal. The other uses a USING clause. The USING clause only works when the column names in both tables are the same.

    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english FULL JOIN ordinal_french
    ON       ordinal_english_id = ordinal_french_id
    MINUS
    SELECT   ordinal_english_text
    ,        ordinal_french_text
    FROM     ordinal_english INNER JOIN ordinal_french
    ON       ordinal_english_id = ordinal_french_id;

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_FRENCH_TEXT
    -------------------- -------------------
                         Sept
                         Huit
    Two

    The ORDINAL_ENGLISH table has a primary key column name that differs from the ORDINAL_ENGLISH table’s primary key column name. You should note that while these are the primary keys, there are no database level constraints on any of the example tabes. You can use the USING clause because the ORDINAL_OLD_ENGLISH table has the same column name as the primary key.

    SELECT   ordinal_english_text
    ,        ordinal_old_english_text
    FROM     ordinal_english FULL JOIN ordinal_old_english USING(ordinal_english_id)
    MINUS
    SELECT   ordinal_english_text
    ,        ordinal_old_english_text
    FROM     ordinal_english INNER JOIN ordinal_old_english USING(ordinal_english_id);

    This returns the following rows from the tables:

    ORDINAL_ENGLISH_TEXT ORDINAL_OLD_ENGLISH_TEXT
    -------------------- ------------------------
                         Eahtoda
                         Scofoda
    Two

    As noted, the older syntax, sometimes called Oracle Proprietary, doesn’t use the FULL JOIN semantic. While the join condition shifts from the FROM clause to the WHERE clause, you must use three queries and both a UNION and MINUS set operators. You enclose the first query and second query in parentheses to create a full outer join. This sets the order of precedence for the union before the minus set operation. The MINUS operator subtracts the inner join, as demonstrated in the following example. The results are naturally the same as the previous query.

    (SELECT   ordinal_english_text
     ,        ordinal_french_text
     FROM     ordinal_english, ordinal_french
     WHERE    ordinal_english_id(+) = ordinal_french_id
     UNION
     SELECT   ordinal_english_text
     ,        ordinal_french_text
     FROM     ordinal_english, ordinal_french
     WHERE    ordinal_english_id = ordinal_french_id(+))
     MINUS
     SELECT   ordinal_english_text
     ,        ordinal_french_text
     FROM     ordinal_english, ordinal_french
     WHERE    ordinal_english_id = ordinal_french_id;

SQL Join Logic:

There are three types of typical joins. They are nested loop, or a variation known as block nested loop, merged join, and hash join. Nested loops are exhaustive searches and don’t require presorting of keys. Merged joins rely on presorting of keys and that’s why cost-based optimizers collect data to improve join speed. Hash joins are more complex and limited to only equijoin resolutions.

An example that highlights an inner join is shown below. The rows that meet the condition are highlighted in yellow. The rows left without highlighting are excluded from the join. The excluded rows in one or the other table would be the relative complement in an outer join and together they are the symmetric difference of the join.

joinmapping

I also have a JavaScript widget that renders it this way for a full outer join:

FullOuterJoin

Unlike a Venn Diagram, this illustration exhibits the behavior of database tables in an equijoin. A column or series of columns constitutes the JOIN or intersection between the two sets and a WHERE clause can filter the rows returned based on a value or range comparison. Both of these processes serve to filter the data and collectively are known as the selection process. Once you have selected the data, you can have access to the rest of the columns in a the nested set or row of a table. You project through the data when you filter the columns to display only some of them, which is known as the projection process.

Inner Joins:

    Rule(s):

    1. An inner join is most like a nested loop where the larger set is navigated by the outer loop and the smaller set navigated by the inner set.
    2. An inner join is more frequently implemented as a block nested loop (BNL), which means that inner join and both relative complements are collected in one pass of the inner loop. While more expensive in terms of computational cycles and time, this is more efficient than running multiple nested loops.

    Examples:

    Nested Loop

    This example mimics an inner join where the values of a column in each table are equal. I chose the PHP programming language because it easily adapts to illustrating the problem with associative arrays. Associative arrays use either numbers or text strings as index values to elements of arrays.

    You should notice that the program uses the column names as index values in the arrays, respectively english_id and english_text and french_id and french_text for english and french tables. The sample program implements a simple nested join to match and return desired data from the two arrays.

    <?php
      // Declare two arrays, two with data and one for the combined data.
      $english = array(array("ENGLISH_ID"=>"1","ENGLISH_TEXT"=>"One")
                      ,array("ENGLISH_ID"=>"2","ENGLISH_TEXT"=>"Two")
                      ,array("ENGLISH_ID"=>"3","ENGLISH_TEXT"=>"Three")
                      ,array("ENGLISH_ID"=>"4","ENGLISH_TEXT"=>"Four")
                      ,array("ENGLISH_ID"=>"5","ENGLISH_TEXT"=>"Five")
                      ,array("ENGLISH_ID"=>"6","ENGLISH_TEXT"=>"Six"));
      $french = array(array("FRENCH_ID"=>"1","FRENCH_TEXT"=>"Un")
                     ,array("FRENCH_ID"=>"3","FRENCH_TEXT"=>"Trois")
                     ,array("FRENCH_ID"=>"4","FRENCH_TEXT"=>"Quatre")
                     ,array("FRENCH_ID"=>"5","FRENCH_TEXT"=>"Cinque")
                     ,array("FRENCH_ID"=>"6","FRENCH_TEXT"=>"Six")
                     ,array("FRENCH_ID"=>"7","FRENCH_TEXT"=>"Sept")
                     ,array("FRENCH_ID"=>"8","FRENCH_TEXT"=>"Huit"));
      $result_set = array();
     
      // The larger array is always the outer loop and inner loop the smaller array.
      for ($i = 0;$i < count($french);$i++)
        for ($j = 0;$j < count($english);$j++)
          // Perform a match for an intersection or inner join of two array sets.
          if ($english[$j]["ENGLISH_ID"] == $french[$i]["FRENCH_ID"])
            // This selects the elements from the joined set, like the SELECT clause in SQL.
            $result_set[] = "[".$english[$j]["ENGLISH_TEXT"]."] [".$french[$i]["FRENCH_TEXT"]."]";
     
      // Print the result set of the join.
      print_r($result_set);
    ?>

    The program is designed to be run from the command line. Most people think of PHP as an exclusively web programming tool, but as an interpreted language and you can run it from the command line. You start by putting this code in to a file named InnerJoin.php. Then, issue the following command.

    php InnerJoin.php

    It uses the print_r() function that prints the contents of an array. You see the following result set:

    Array
    (
        [0] => [One] [Un]
        [1] => [Three] [Trois]
        [2] => [Four] [Quatre]
        [3] => [Five] [Cinque]
        [4] => [Six] [Six]
    )

    The numbers 0 through 4 indicates the element numbers in the new array. A database would put the column names index position 0 (calling it metadata), and the data in rows indexed 1 through 5. You see this data again for the left, right, and full join examples but the system_user data for the self referencing join seems like a better set to demonstrate a BNL structure.

    You could restructure the problem a bit in PHP and eliminate the for loops by using the array_intersect but that wouldn’t mimic the database tables.

    Block Nested Loop

    This example borrows the setup data from the self referencing join used earlier on the page. You make three copies of the base array, which is like using table aliasing in SQL statements. It uses a block nested loop (BNL) to navigate through the parent array of the query only once. The block is created by the curly braces around the nested loops.

    The first loop gathers the information for the system_user_name of the base row, and system_user_name of the row that maps to the user who created the row. The second loop gathers the same information for the system_user_name that last updated the row.

    <?php
      // Declare two arrays, two with data and one for the combined data.
      $su = array(array("SYSTEM_USER_ID"=>1,"SYSTEM_USER_NAME"=>"SYSADMIN","CREATED_BY"=>1,"LAST_UPDATED_BY"=>1)
                 ,array("SYSTEM_USER_ID"=>2,"SYSTEM_USER_NAME"=>"DBA 1","CREATED_BY"=>1,"LAST_UPDATED_BY"=>2)
                 ,array("SYSTEM_USER_ID"=>3,"SYSTEM_USER_NAME"=>"DBA 2","CREATED_BY"=>1,"LAST_UPDATED_BY"=>1)
                 ,array("SYSTEM_USER_ID"=>4,"SYSTEM_USER_NAME"=>"DBA 3","CREATED_BY"=>1,"LAST_UPDATED_BY"=>3)
                 ,array("SYSTEM_USER_ID"=>5,"SYSTEM_USER_NAME"=>"APPS DBA","CREATED_BY"=>4,"LAST_UPDATED_BY"=>4));
     
      // Assign aliases to the single array set.
      $su1 = $su;
      $su2 = $su;
      $su3 = $su;
     
      // In self join they're all equal.
      for ($i = 0;$i < count($su);$i++) {
        for ($j = 0;$j < count($su);$j++)
          // Perform a match for an intersection or inner join of the two sets on a primary to foreign key match.
          if ($su2[$j]["SYSTEM_USER_ID"] == $su1[$i]["CREATED_BY"])
            // This selects the elements from the joined set, like the SELECT clause in SQL.
            $result_set[$i] = "[".$su1[$i]["SYSTEM_USER_ID"]."] [".$su1[$i]["SYSTEM_USER_NAME"]."] [".$su2[$j]["SYSTEM_USER_NAME"]."]";
        for ($j = 0;$j < count($su);$j++)
          // Perform a match for an intersection or inner join of the two sets on a primary to foreign key match.
          if ($su3[$j]["SYSTEM_USER_ID"] == $su1[$i]["LAST_UPDATED_BY"])
            // This selects the elements from the joined set, like the SELECT clause in SQL.
            $result_set[$i] .= " [".$su3[$j]["SYSTEM_USER_NAME"]."]"; }
     
      // Print the result set of the join.
      print_r($result_set);
      ?>

    You can get the same answer by using two nested loop structures but it’s less efficient than a BNL solution. The script is run the same as the other PHP program, and it returns the following data set:

    Array
    (
        [0] => [1] [SYSADMIN] [SYSADMIN] [SYSADMIN]
        [1] => [2] [DBA 1] [SYSADMIN] [DBA 1]
        [2] => [3] [DBA 2] [SYSADMIN] [SYSADMIN]
        [3] => [4] [DBA 3] [SYSADMIN] [DBA 2]
        [4] => [5] [APPS DBA] [DBA 3] [DBA 3]
    )

Setup Script

Oracle SQL Join Script

-- ------------------------------------------------------------------
-- This part seeds joins, except the range non-equijoin section.
-- ------------------------------------------------------------------
-- Conditionally drop tables when they're found.
BEGIN
  FOR i IN (SELECT   TABLE_NAME
            FROM     user_tables
            WHERE    TABLE_NAME IN ('ORDINAL_ENGLISH'
                                   ,'ORDINAL_FRENCH'	
                                   ,'ORDINAL_OLD_ENGLISH')) LOOP
    EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS';
 
 
  END LOOP;
 
END;
/
 
-- Create tables.
CREATE TABLE ordinal_english
( ordinal_english_id   NUMBER
, ordinal_english_text VARCHAR2(26));
 
CREATE TABLE ordinal_french
( ordinal_french_id   NUMBER
, ordinal_french_text VARCHAR2(26));
 
CREATE TABLE ordinal_old_english
( ordinal_english_id       NUMBER
, ordinal_old_english_text VARCHAR2(26));
 
-- Seed tables.
DECLARE
  -- Define local collection types.
  TYPE number_list IS TABLE OF NUMBER;
  TYPE string_list IS TABLE OF VARCHAR2(26);
 
  -- Declare variables.
  ordinal_left  NUMBER_LIST := number_list(1,2,3,4,5,6);
  ordinal_right NUMBER_LIST := number_list(1,3,4,5,6,7,8);
  english_text1 STRING_LIST := string_list('One','Two','Three','Four','Five','Six');
  english_text2 STRING_LIST := string_list('Fyrmest','Pridda','Feorda','Fifta','Siexta','Scofoda','Eahtoda');										  
  french_text   STRING_LIST := string_list('Un','Trois','Quatre','Cinq','Six','Sept','Huit');
BEGIN
  FOR i IN 1..ordinal_left.COUNT LOOP
    INSERT INTO ordinal_english VALUES (ordinal_left(i),english_text1(i));
  END LOOP;
  FOR i IN 1..ordinal_right.COUNT LOOP
    INSERT INTO ordinal_french VALUES (ordinal_right(i),french_text(i));
    INSERT INTO ordinal_old_english VALUES (ordinal_right(i),english_text2(i));
  END LOOP;
  -- Commit the work.
  COMMIT;
END;
/
 
-- ------------------------------------------------------------------
-- This part seeds for the self referencing query section.
-- ------------------------------------------------------------------
-- Conditionally drop objects.
BEGIN
  FOR i IN (SELECT NULL
            FROM   user_tables
            WHERE  TABLE_NAME = 'SYSTEM_USER') LOOP
    EXECUTE IMMEDIATE 'DROP TABLE system_user CASCADE CONSTRAINTS';
  END LOOP;
  FOR i IN (SELECT NULL
            FROM   user_sequences
            WHERE  sequence_name = 'SYSTEM_USER_S1') LOOP
    EXECUTE IMMEDIATE 'DROP SEQUENCE system_user_s1';
  END LOOP;
END;
/
 
-- Create SYSTEM_USER table and sequence and seed data.
CREATE TABLE system_user
( system_user_id       NUMBER       CONSTRAINT pk_system_user1 PRIMARY KEY
, system_user_name     VARCHAR2(20) CONSTRAINT nn_system_user1 NOT NULL 
, system_user_group_id NUMBER       CONSTRAINT nn_system_user2 NOT NULL
, system_user_type     NUMBER       CONSTRAINT nn_system_user3 NOT NULL
, last_name            VARCHAR2(20)
, first_name           VARCHAR2(20)
, middle_initial       VARCHAR2(1)
, created_by           NUMBER       CONSTRAINT nn_system_user4 NOT NULL
, creation_date        DATE         CONSTRAINT nn_system_user5 NOT NULL
, last_updated_by      NUMBER       CONSTRAINT nn_system_user6 NOT NULL
, last_update_date     DATE         CONSTRAINT nn_system_user7 NOT NULL
, CONSTRAINT fk_system_user1
  FOREIGN KEY(created_by) REFERENCES system_user(system_user_id)
, CONSTRAINT fk_system_user2
  FOREIGN KEY(last_updated_by) REFERENCES system_user(system_user_id));
 
INSERT INTO system_user VALUES
( 1,'SYSADMIN', 1, 1, '', '', '', 1, SYSDATE, 1, SYSDATE);
 
INSERT INTO system_user VALUES
( 2,'DBA', 2, 1,'Adams','Samuel', '', 1, SYSDATE, 2, SYSDATE);
 
INSERT INTO system_user VALUES
( 3,'DBA', 2, 1,'Henry','Patrick', '', 1, SYSDATE, 1, SYSDATE);
 
INSERT INTO system_user VALUES
( 4,'DBA', 2, 1,'Puri','Manmohan', '', 1, SYSDATE, 3, SYSDATE);
 
INSERT INTO system_user VALUES
( 5,'APPS DBA', 2, 1,'Bremen','Gunther', '', 4, SYSDATE, 4, SYSDATE);			  								  
 
-- Create the constraint to leave room for seeding new application default values.
CREATE SEQUENCE system_user_s1 START WITH 1001;
 
-- ------------------------------------------------------------------
-- This part seeds the range non-equijoin section.
-- ------------------------------------------------------------------
-- Create tables and sequences.
CREATE TABLE calendar
( calendar_id        NUMBER       CONSTRAINT pk_calendar1 PRIMARY KEY
, month_short_name   VARCHAR2(3)  CONSTRAINT nn_calendar1 NOT NULL
, month_long_name    VARCHAR2(10) CONSTRAINT nn_calendar2 NOT NULL
, start_date         DATE         CONSTRAINT nn_calendar3 NOT NULL
, end_date           DATE         CONSTRAINT nn_calendar4 NOT NULL);
 
CREATE SEQUENCE calendar_s1;
 
CREATE TABLE TRANSACTION
( transaction_id     NUMBER CONSTRAINT pk_transaction1 PRIMARY KEY
, transaction_amount NUMBER CONSTRAINT nn_transaction1 NOT NULL
, transaction_date   DATE   CONSTRAINT nn_transaction2 NOT NULL);
 
CREATE SEQUENCE transaction_s1;
 
-- Seed the CALENDAR table.
INSERT INTO calendar VALUES (calendar_s1.nextval,'JAN','January','01-JAN-09','31-JAN-09');
INSERT INTO calendar VALUES (calendar_s1.nextval,'FEB','February','01-FEB-09','28-FEB-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'MAR','March','01-MAR-09','31-MAR-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'APR','April','01-APR-09','30-APR-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'MAY','May','01-MAY-09','31-MAY-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'JUN','June','01-JUN-09','30-JUN-09');  
 
INSERT INTO calendar VALUES (calendar_s1.nextval,'JUL','July','01-JUL-09','31-JUL-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'AUG','August','01-AUG-09','31-AUG-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'SEP','September','01-SEP-09','30-SEP-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'OCT','October','01-OCT-09','31-OCT-09');  
INSERT INTO calendar VALUES (calendar_s1.nextval,'NOV','November','01-NOV-09','30-NOV-09');
INSERT INTO calendar VALUES (calendar_s1.nextval,'DEC','Decenber','01-DEC-09','31-DEC-09');
 
-- Seed the TRANSACTION table.
INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,38.99,'16-JAN-09');
INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,32.87,'21-JAN-09');
INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,9.99,'25-MAR-09');
INSERT INTO TRANSACTION VALUES (transaction_s1.nextval,43.19,'13-APR-09');

MySQL SQL Join Script

-- Coming shortly.

I hope this help a few folks trying to figure out how to join tables.

Written by maclochlainn

February 4th, 2009 at 2:39 am

17 Responses to 'Basic SQL join semantics'

Subscribe to comments with RSS or TrackBack to 'Basic SQL join semantics'.

  1. UNION differs from FULL OUTER JOIN as it will remove duplicates.

    SELECT ordinal_english_text
    , ordinal_french_text
    FROM ordinal_english, ordinal_french
    WHERE ordinal_english_id(+) = ordinal_french_id
    UNION
    SELECT ordinal_english_text
    , ordinal_french_text
    FROM ordinal_english, ordinal_french
    WHERE ordinal_english_id = ordinal_french_id(+);

    this will remove the duplicates for ordinal_english_text , ordinal_french_text

    use UNION ALL instead

    SQL> create table lsc_t1(x number, y number)
    Table created.
    SQL> insert into lsc_t1 values(1,1)
    1 row created.
    SQL> insert into lsc_t1 values(1,1)
    1 row created.
    SQL> insert into lsc_t1 values(2,2)
    1 row created.
    SQL> create table lsc_t2(y number, z number)
    Table created.
    SQL> insert into lsc_t2 values(2,2)
    1 row created.
    SQL> insert into lsc_t2 values(3,3)
    1 row created.
    SQL> insert into lsc_t2 values(3,3)
    1 row created.
    SQL> commit
    Commit complete.
    SQL> select x,y,z from lsc_t1 full join lsc_t2 using (y) order by y

    X Y Z
    ———- ———- ———-
    1 1
    1 1
    2 2 2
    3 3
    3 3

    5 rows selected.
    SQL> select x,lsc_t1.y,z from lsc_t1 , lsc_t2 where lsc_t1.y=lsc_t2.y(+) order by y

    X Y Z
    ———- ———- ———-
    1 1
    1 1
    2 2 2

    3 rows selected.
    SQL> select x,lsc_t2.y,z from lsc_t1 , lsc_t2 where lsc_t1.y(+)=lsc_t2.y order by y

    X Y Z
    ———- ———- ———-
    2 2 2
    3 3
    3 3

    3 rows selected.
    SQL> select x,lsc_t1.y,z from lsc_t1 , lsc_t2 where lsc_t1.y=lsc_t2.y(+)
    union
    select x,lsc_t2.y,z from lsc_t1 , lsc_t2 where lsc_t1.y(+)=lsc_t2.y order by y

    X Y Z
    ———- ———- ———-
    1 1
    2 2 2
    3 3

    3 rows selected.
    SQL> select x,lsc_t1.y,z from lsc_t1 , lsc_t2 where lsc_t1.y=lsc_t2.y(+)
    union all
    select x,lsc_t2.y,z from lsc_t1 , lsc_t2 where lsc_t1.y(+)=lsc_t2.y and lsc_t1.y is null order by y

    X Y Z
    ———- ———- ———-
    1 1
    1 1
    2 2 2
    3 3
    3 3

    5 rows selected.

  2. Natural join can be also outer join. Actually I’d say it is just another (potentially dangerous!!!) syntactical notation. And I’ve tried to explain joins here http://www.gplivna.eu/papers/sql_join_types.htm

    Gints Plivna

    15 Feb 09 at 2:12 pm

  3. Thanks for the observation. I concur wholeheartedly with your observations that there’s nothing natural about natural joins. 🙂 BTW, that’s a nice page.

    maclochlainn

    15 Feb 09 at 3:06 pm

  4. Nice writeup. I like the graphic on left, right and inner join. I did an extension of it here:

    http://db-optimizer.blogspot.com/2009/06/sql-joins.html

    Kyle Hailey

    26 Jun 09 at 10:59 am

  5. Really nice post. I haven’t seen the execution logic presented anywhere other than here. I was under the impression that it had to be nested loops for JOINS but no one has ever said this.
    Thank you 🙂

    Preston

    Preston

    16 Mar 11 at 1:46 pm

  6. My semi-complete join tutorials are here. It may help you a bit more, this was a base posting that led to a more complete discussion and comparison of the various join semantics. These smaller pages also load more quickly. 😉

    maclochlainn

    16 Mar 11 at 9:50 pm

  7. Neat post, very cleat and thorough enough to help me understand SQL.

    I’m new to the subject and come from a very different platform.

    You have helped me alot in begining the road.

    Patricia Echeverry

    4 Aug 11 at 11:29 am

  8. You might like this other website, it’s still a work-in-progress.

    maclochlainn

    4 Aug 11 at 2:28 pm

  9. […] Outer Joins Graphically December 30th, 2009 Goto comments Leave a comment Outer Joins If English and French both have a unique key on the “ordinal_id” then it’s basically one-to-one relationship We add an arrow in the middle of the line to denote “outer join”. The arrow points from the table that drives the join, ie all the rows in the table pointed from are returned even if a match isn’t found in the table pointed to. idea for the above graphic came from Join Mapping which was originally posted on blog page […]

  10. […] looking at SQL joins […]

    ANSI SQL Joins

    25 Apr 13 at 3:47 pm

  11. SELECT   "invoice"."ITEMNO"
    ,        "invoice"."Outgoing"
    ,        "storeage"."incoming"
    FROM     "MVXJDTAP"."MVXJDTA"."invoice" INNER JOIN "MVXJDTAP"."MVXJDTA"."storeage "
    ON       "invoice"."ITEMNO" = "storeage"."itemno"
    ORDER BY "storeage"."itemno"

    false RESULT :

         invoice                           storeage                          FALSE RESULT
    itemno       Outgoing             itemno      incoming         itemno      outgoing      incoming
    ZOMV3-94        4                 ZOMV3-94    3                ZOMV3-94       4             3
    ZOMV3-94        7                 ZOMV3-94    1                ZOMV3-94       4             1
                                      ZOMV3-94    60               ZOMV3-94       4             60
                                      ZOMV3-94    5                ZOMV3-94       4             5
                                                                   ZOMV3-94       7             3
                                                                   ZOMV3-94       7             1 
                                                                   ZOMV3-94       7             60
                                                                   ZOMV3-94       7             5

    HOW CAN GET THIS RESULT
    TRUE RESULT :

                                                                   ZOMV3-94       4           NULL OR 0
                                                                   ZOMV3-94       7           NULL OR 0
                                                                   ZOMV3-94     NULL OR 0       3
                                                                   ZOMV3-94     NULL OR 0       1
                                                                   ZOMV3-94     NULL OR 0       60
                                                                   ZOMV3-94     NULL OR 0       5

    I WANT CODE TO RUN TRUE RESULT

    BELAL

    29 Apr 13 at 7:05 am

  12. While I’d like to help, there’s not enough information to help you. The output appears different than what would be anticipated by the query? 🙁

    maclochlainn

    30 Apr 13 at 12:28 am

  13. I personally speculate how come you called this blog, “Basic SQL
    Join Semantics | MacLochlainns Weblog”. In any event .
    I admired the article!I appreciate it-Seymour

    Heather

    11 Nov 13 at 4:37 pm

  14. […] above graphic originally on […]

  15. Awesome blog !!!!

    udhayakumar

    16 Jul 14 at 5:26 am

  16. An SQL JOIN section utilized to unite rows from two or added tables; situate on an ordinary field among them.

    swati sharma

    17 Jul 15 at 5:03 am

Leave a Reply