Merge Statement for ETL
While working through examples for my students on uploading data, I thought it would be interesting to demonstrate how to create a re-runnable upload. Especially when chatting with a friend who was unaware that you could use joins inside the source element of a MERGE
statement. Naturally, the MERGE
statement seemed like the best approach in an Oracle database because with my criteria:
- The source file would not include any surrogate key values.
- The source file would have denormalized record sets with data that should belong to parent and child tables, technically unnormalized form (UNF).
- Primary and foreign key values would be determined on load to the tables.
- There could be a one-to-many relationship between the parent and child tables in the original source.
- Subsequent data sets may replicate data already seeded or not in the tables.
- Avoid any complex PL/SQL structures.
Step #1 : Create a Virtual Directory
You can create a virtual directory without a physical directory but it won’t work when you try to access it. Therefore, you should create the physical directory first. Assuming you’ve created a C:\Data\Download
file directory on the Windows platform, you can then create a virtual directory and grant permissions to the student
user as the SYS
privileged user. The syntax for these steps is:
CREATE DIRECTORY download AS 'C:\Data\Download'; GRANT READ, WRITE ON DIRECTORY download TO student; |
If you want more detail on these steps, check this older post on the blog.
Step #2 : Create the Target and External Tables
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | -- Conditionally drop tables and sequences. BEGIN FOR i IN (SELECT TABLE_NAME FROM user_tables WHERE TABLE_NAME IN ('KINGDOM','KNIGHT','KINGDOM_KNIGHT_IMPORT')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT sequence_name FROM user_sequences WHERE sequence_name IN ('KINGDOM_S1','KNIGHT_S1')) LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name; END LOOP; END; / -- Create normalized kingdom table. CREATE TABLE kingdom ( kingdom_id NUMBER , kingdom_name VARCHAR2(20) , population NUMBER); -- Create a sequence for the kingdom table. CREATE SEQUENCE kingdom_s1; -- Create normalized knight table. CREATE TABLE knight ( knight_id NUMBER , knight_name VARCHAR2(24) , kingdom_allegiance_id NUMBER , allegiance_start_date DATE , allegiance_end_date DATE); -- Create a sequence for the knight table. CREATE SEQUENCE knight_s1; -- Create external import table. CREATE TABLE kingdom_knight_import ( kingdom_name VARCHAR2(20) , population NUMBER , knight_name VARCHAR2(24) , allegiance_start_date DATE , allegiance_end_date DATE) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY download ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII BADFILE 'DOWNLOAD':'kingdom_import.bad' DISCARDFILE 'DOWNLOAD':'kingdom_import.dis' LOGFILE 'DOWNLOAD':'kingdom_import.log' FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY "'" MISSING FIELD VALUES ARE NULL ) LOCATION ('kingdom_import.csv')) REJECT LIMIT UNLIMITED; |
Step #3 : Create a Procedure to ensure an all or nothing transaction
The procedure ensures that an all or nothing transaction occurs to both tables. Inside the procedure you have two MERGE
statements.
The first MERGE
statement uses a LEFT JOIN
to ensure that any new kingdom_name
will be added to the kingdom
table. The kingdom_name
and population
columns are the natural key in this model. The second MERGE
statement uses an INNER JOIN
to ensure that knight
rows are only inserted when they belong to an existing kingdom_name
. Naturally, the primary key capture occurs in this statement and it maps the primary key to the foreign key column in the knight
table.
The complete procedure code follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | -- Create a procedure to wrap the transaction. CREATE OR REPLACE PROCEDURE upload_kingdom IS BEGIN -- Set save point for an all or nothing transaction. SAVEPOINT starting_point; -- Insert or update the table, which makes this rerunnable when the file hasn't been updated. MERGE INTO kingdom target USING (SELECT DISTINCT k.kingdom_id , kki.kingdom_name , kki.population FROM kingdom_knight_import kki LEFT JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population) SOURCE ON (target.kingdom_id = SOURCE.kingdom_id) WHEN MATCHED THEN UPDATE SET kingdom_name = SOURCE.kingdom_name WHEN NOT MATCHED THEN INSERT VALUES ( kingdom_s1.nextval , SOURCE.kingdom_name , SOURCE.population); -- Insert or update the table, which makes this rerunnable when the file hasn't been updated. MERGE INTO knight target USING (SELECT k.kingdom_id , kki.knight_name , kki.allegiance_start_date AS start_date , kki.allegiance_end_date AS end_date FROM kingdom_knight_import kki INNER JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population) SOURCE ON (target.kingdom_allegiance_id = SOURCE.kingdom_id) WHEN MATCHED THEN UPDATE SET allegiance_start_date = SOURCE.start_date , allegiance_end_date = SOURCE.end_date WHEN NOT MATCHED THEN INSERT VALUES ( knight_s1.nextval , SOURCE.knight_name , SOURCE.kingdom_id , SOURCE.start_date , SOURCE.end_date); -- Save the changes. COMMIT; EXCEPTION WHEN OTHERS THEN ROLLBACK TO starting_point; RETURN; END; / |
Step #4 : Test the Process
You can test it by calling the procedure. Rerunning it will demonstrate that it doesn’t violate any of the rules.
EXECUTE upload_kingdom; |
As always, I hope this is useful to somebody besides me.