MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Unnatural outcome of natural joins

with one comment

You may like the standard syntax of listing tables as comma-delimited lists, rather than using an inner, left, right, full, natural, and cross join syntax. I do! The former method eliminates matching result sets and generally is less tedious.

The natural join is my least favorite of these because it can create problems long after the query is written. The problem is an outcome of its design. A natural join matches rows from one table to another by using all matching column names. This query makes a simplifying assumption: Nobody includes a column of the same name in two tables where they may be joined that isn’t a foreign key.

The following is quick example that creates two tables. The VEHICLE table uses the VEHICLE_ID column as a primary key. The WORK_ORDER table uses the VEHICLE_ID column as a foreign key.

CREATE TABLE vehicle
( vehicle_id NUMBER
, vin VARCHAR2(20));
 
CREATE TABLE work_order
( work_order_id NUMBER
, vehicle_id NUMBER);
 
INSERT INTO vehicle VALUES (1,'VIN_NUMBER1');
INSERT INTO vehicle VALUES (2,'VIN_NUMBER2');
INSERT INTO work_order VALUES (1, 1);
INSERT INTO work_order VALUES (2, 2);
INSERT INTO work_order VALUES (3, 1);

The following queries resolve the relationship by using an explicit join:

SELECT * FROM vehicle v, work_order wo WHERE v.vehicle_id = wo.vehicle_id;
SELECT * FROM vehicle v JOIN work_order wo ON v.vehicle_id = wo.vehicle_id;
SELECT * FROM vehicle v JOIN work_order wo USING(vehicle_id);

A natural query also resolves the relationship by implicitly finding the columns that share column names, as follows:

SELECT * FROM vehicle v NATURAL JOIN work_order wo;

All of these return a three row result set. The unnatural part of the natural query arrives during maintenance programming when somebody adds a VIN column name to the WORK_ORDER table (let’s say to simplify the ETL to the data warehouse). The code using a natural join isn’t invalidated but its runtime behavior is altered because it now joins both the VEHICLE_ID and VIN columns. Continuing the example, you’d change the table by:

ALTER TABLE work_order ADD (vin VARCHAR2(20));

The explicit queries still work after the change. The implicit (natural) query now returns no rows because the VIN column contains all null values, and it is automatically added to the implicit join statement. After updating the VIN column, the query resolves when the data matches but not when the VIN column contains different data. What if the VIN column in WORK_ORDER only contains the make and model portion of the vehicle identification number while the VIN column in VEHICLE contains the complete unique vehicle identifier. You would receive no rows selected from the natural query.

It appears the best thing to do, is avoid natural joins.

I subsequently wrote a post about SQL join semantics. You can find it here.

Written by maclochlainn

May 24th, 2008 at 3:08 am