Archive for the ‘natural joins’ tag
Unnatural outcome of natural joins
You may like the standard syntax of listing tables as comma-delimited lists, rather than using an inner, left, right, full, natural, and cross join syntax. I do! The former method eliminates matching result sets and generally is less tedious.
The natural join is my least favorite of these because it can create problems long after the query is written. The problem is an outcome of its design. A natural join matches rows from one table to another by using all matching column names. This query makes a simplifying assumption: Nobody includes a column of the same name in two tables where they may be joined that isn’t a foreign key.
The following is quick example that creates two tables. The VEHICLE
table uses the VEHICLE_ID
column as a primary key. The WORK_ORDER
table uses the VEHICLE_ID
column as a foreign key.
CREATE TABLE vehicle ( vehicle_id NUMBER , vin VARCHAR2(20)); CREATE TABLE work_order ( work_order_id NUMBER , vehicle_id NUMBER); INSERT INTO vehicle VALUES (1,'VIN_NUMBER1'); INSERT INTO vehicle VALUES (2,'VIN_NUMBER2'); INSERT INTO work_order VALUES (1, 1); INSERT INTO work_order VALUES (2, 2); INSERT INTO work_order VALUES (3, 1); |
The following queries resolve the relationship by using an explicit join:
SELECT * FROM vehicle v, work_order wo WHERE v.vehicle_id = wo.vehicle_id; SELECT * FROM vehicle v JOIN work_order wo ON v.vehicle_id = wo.vehicle_id; SELECT * FROM vehicle v JOIN work_order wo USING(vehicle_id); |
A natural query also resolves the relationship by implicitly finding the columns that share column names, as follows:
SELECT * FROM vehicle v NATURAL JOIN work_order wo; |
All of these return a three row result set. The unnatural part of the natural query arrives during maintenance programming when somebody adds a VIN
column name to the WORK_ORDER
table (let’s say to simplify the ETL to the data warehouse). The code using a natural join isn’t invalidated but its runtime behavior is altered because it now joins both the VEHICLE_ID
and VIN
columns. Continuing the example, you’d change the table by:
ALTER TABLE work_order ADD (vin VARCHAR2(20)); |
The explicit queries still work after the change. The implicit (natural) query now returns no rows because the VIN
column contains all null values, and it is automatically added to the implicit join statement. After updating the VIN
column, the query resolves when the data matches but not when the VIN
column contains different data. What if the VIN
column in WORK_ORDER
only contains the make and model portion of the vehicle identification number while the VIN
column in VEHICLE
contains the complete unique vehicle identifier. You would receive no rows selected from the natural query.
It appears the best thing to do, is avoid natural joins.
I subsequently wrote a post about SQL join semantics. You can find it here.