MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Merge Statement for ETL

with 2 comments

While working through examples for my students on uploading data, I thought it would be interesting to demonstrate how to create a re-runnable upload. Especially when chatting with a friend who was unaware that you could use joins inside the source element of a MERGE statement. Naturally, the MERGE statement seemed like the best approach in an Oracle database because with my criteria:

  • The source file would not include any surrogate key values.
  • The source file would have denormalized record sets with data that should belong to parent and child tables, technically unnormalized form (UNF).
  • Primary and foreign key values would be determined on load to the tables.
  • There could be a one-to-many relationship between the parent and child tables in the original source.
  • Subsequent data sets may replicate data already seeded or not in the tables.
  • Avoid any complex PL/SQL structures.

Step #1 : Create a Virtual Directory

You can create a virtual directory without a physical directory but it won’t work when you try to access it. Therefore, you should create the physical directory first. Assuming you’ve created a C:\Data\Download file directory on the Windows platform, you can then create a virtual directory and grant permissions to the student user as the SYS privileged user. The syntax for these steps is:

CREATE DIRECTORY download AS 'C:\Data\Download';
GRANT READ, WRITE ON DIRECTORY download TO student;

If you want more detail on these steps, check this older post on the blog.

Step #2 : Create the Target and External Tables

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
-- Conditionally drop tables and sequences.
BEGIN
  FOR i IN (SELECT TABLE_NAME
            FROM   user_tables
            WHERE  TABLE_NAME IN ('KINGDOM','KNIGHT','KINGDOM_KNIGHT_IMPORT')) LOOP 
    EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINTS';
  END LOOP;
  FOR i IN (SELECT sequence_name
            FROM   user_sequences
            WHERE  sequence_name IN ('KINGDOM_S1','KNIGHT_S1')) LOOP 
    EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name;
  END LOOP;
END;
/
 
-- Create normalized kingdom table.
CREATE TABLE kingdom
( kingdom_id    NUMBER
, kingdom_name  VARCHAR2(20)
, population    NUMBER);
 
-- Create a sequence for the kingdom table.
CREATE SEQUENCE kingdom_s1;
 
-- Create normalized knight table.
CREATE TABLE knight
( knight_id             NUMBER
, knight_name           VARCHAR2(24)
, kingdom_allegiance_id NUMBER
, allegiance_start_date DATE
, allegiance_end_date   DATE);
 
-- Create a sequence for the knight table.
CREATE SEQUENCE knight_s1;
 
-- Create external import table.
CREATE TABLE kingdom_knight_import
( kingdom_name          VARCHAR2(20)
, population            NUMBER
, knight_name           VARCHAR2(24)
, allegiance_start_date DATE
, allegiance_end_date   DATE)
  ORGANIZATION EXTERNAL
  ( TYPE oracle_loader
    DEFAULT DIRECTORY download
    ACCESS PARAMETERS
    ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
      BADFILE     'DOWNLOAD':'kingdom_import.bad'
      DISCARDFILE 'DOWNLOAD':'kingdom_import.dis'
      LOGFILE     'DOWNLOAD':'kingdom_import.log'
      FIELDS TERMINATED BY ','
      OPTIONALLY ENCLOSED BY "'"
      MISSING FIELD VALUES ARE NULL )
    LOCATION ('kingdom_import.csv'))
REJECT LIMIT UNLIMITED;

Step #3 : Create a Procedure to ensure an all or nothing transaction

The procedure ensures that an all or nothing transaction occurs to both tables. Inside the procedure you have two MERGE statements.

The first MERGE statement uses a LEFT JOIN to ensure that any new kingdom_name will be added to the kingdom table. The kingdom_name and population columns are the natural key in this model. The second MERGE statement uses an INNER JOIN to ensure that knight rows are only inserted when they belong to an existing kingdom_name. Naturally, the primary key capture occurs in this statement and it maps the primary key to the foreign key column in the knight table.

The complete procedure code follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
-- Create a procedure to wrap the transaction.
CREATE OR REPLACE PROCEDURE upload_kingdom IS 
BEGIN
  -- Set save point for an all or nothing transaction.
  SAVEPOINT starting_point;
 
  -- Insert or update the table, which makes this rerunnable when the file hasn't been updated.  
  MERGE INTO kingdom target
  USING (SELECT   DISTINCT
                  k.kingdom_id
         ,        kki.kingdom_name
         ,        kki.population
         FROM     kingdom_knight_import kki LEFT JOIN kingdom k
         ON       kki.kingdom_name = k.kingdom_name
         AND      kki.population = k.population) SOURCE
  ON (target.kingdom_id = SOURCE.kingdom_id)
  WHEN MATCHED THEN
  UPDATE SET kingdom_name = SOURCE.kingdom_name
  WHEN NOT MATCHED THEN
  INSERT VALUES
  ( kingdom_s1.nextval
  , SOURCE.kingdom_name
  , SOURCE.population);
 
  -- Insert or update the table, which makes this rerunnable when the file hasn't been updated.  
  MERGE INTO knight target
  USING (SELECT   k.kingdom_id
         ,        kki.knight_name
         ,        kki.allegiance_start_date AS start_date
         ,        kki.allegiance_end_date AS end_date
         FROM     kingdom_knight_import kki INNER JOIN kingdom k
         ON       kki.kingdom_name = k.kingdom_name
         AND      kki.population = k.population) SOURCE
  ON (target.kingdom_allegiance_id = SOURCE.kingdom_id)
  WHEN MATCHED THEN
  UPDATE SET allegiance_start_date = SOURCE.start_date
  ,          allegiance_end_date = SOURCE.end_date
  WHEN NOT MATCHED THEN
  INSERT VALUES
  ( knight_s1.nextval
  , SOURCE.knight_name
  , SOURCE.kingdom_id
  , SOURCE.start_date
  , SOURCE.end_date);
 
  -- Save the changes.
  COMMIT;
 
EXCEPTION
  WHEN OTHERS THEN
    ROLLBACK TO starting_point;
    RETURN;
END;
/

Step #4 : Test the Process

You can test it by calling the procedure. Rerunning it will demonstrate that it doesn’t violate any of the rules.

EXECUTE upload_kingdom;

As always, I hope this is useful to somebody besides me.

Written by maclochlainn

February 23rd, 2010 at 2:28 am