Multi-row Merge in MySQL
After I wrote the post for students on the multiple row MERGE
statement for an upload through an external table in Oracle, I thought to check how it might be done with MySQL. More or less because I try to keep track of how things are done in several databases.
MySQL’s equivalent to a MERGE
statement is an INSERT
statement with an ON DUPLICATE KEY
clause, which I blogged about a while back. You may also use the REPLACE INTO
when you want to merge more than one row. At the time that I wrote this, I thought there wasn’t support for an INSERT ON DUPLICATE KEY
clause statement with a subquery but I found that I was wrong. Fortunately, somebody posted a comment to remind me about this and now both solutions are here for anybody that would like them.
The workaround with a VALUES
clause was to write a stored procedure with two cursor loops, explicitly pass the values from the cursor to local variables, and then put the local variables in the VALUES
clause. I’ll post the other with a subquery soon. On parity, clearly Oracle’s MERGE
statement (shown here) is far superior than MySQL’s approach.
Demonstration
Here are the steps to accomplish an import/upload with the INSERT
statement and ON DUPLICATE KEY
clause. In this example, you upload data from a flat file, or Comma Separated Value (CSV) file to a denormalized table (actually in unnormalized form). This type of file upload transfers information that doesn’t have surrogate key values. You have to create those in the scope of the transformation to the normalized tables.
Step #1 : Position your CSV file in the physical directory
After creating the virtual directory, copy the following contents into a file named kingdom_mysql_import.csv
in the C:\Data\Download
directory or folder. If you have Windows UAC enabled in Windows Vista or 7, you should disable it before performing this step.
Place the following in the kingdom_mysql_import.csv
file. The trailing commas are meaningful in MySQL and avoid problems when reading CSV files.
Narnia, 77600,'Peter the Magnificent',12720320,12920609, Narnia, 77600,'Edmund the Just',12720320,12920609, Narnia, 77600,'Susan the Gentle',12720320,12920609, Narnia, 77600,'Lucy the Valiant',12720320,12920609, Narnia, 42100,'Peter the Magnificent',15310412,15310531, Narnia, 42100,'Edmund the Just',15310412,15310531, Narnia, 42100,'Susan the Gentle',15310412,15310531, Narnia, 42100,'Lucy the Valiant',15310412,15310531, Camelot, 15200,'King Arthur',06310310,06861212, Camelot, 15200,'Sir Lionel',06310310,06861212, Camelot, 15200,'Sir Bors',06310310,06351212, Camelot, 15200,'Sir Bors',06400310,06861212, Camelot, 15200,'Sir Galahad',06310310,06861212, Camelot, 15200,'Sir Gawain',06310310,06861212, Camelot, 15200,'Sir Tristram',06310310,06861212, Camelot, 15200,'Sir Percival',06310310,06861212, Camelot, 15200,'Sir Lancelot',06700930,06821212, |
Step #2 : Connect as the student
user
Disconnect and connect as the student user, or reconnect as the student user. The reconnect syntax that protects your password is:
mysql -ustudent -p |
Connect to the sampledb
database, like so:
mysql> USE sampledb; |
Step #3 : Run the script that creates tables and sequences
Copy the following into a create_mysql_kingdom_upload.sql
file within a directory of your choice. Then, run it as the student
account.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | -- This enables dropping tables with foreign key dependencies. -- It is specific to the InnoDB Engine. SET FOREIGN_KEY_CHECKS = 0; -- Conditionally drop objects. SELECT 'KINGDOM' AS "Drop Table"; DROP TABLE IF EXISTS KINGDOM; SELECT 'KNIGHT' AS "Drop Table"; DROP TABLE IF EXISTS KNIGHT; SELECT 'KINGDOM_KNIGHT_IMPORT' AS "Drop Table"; DROP TABLE IF EXISTS KINGDOM_KNIGHT_IMPORT; -- Create normalized kingdom table. SELECT 'KINGDOM' AS "Create Table"; CREATE TABLE kingdom ( kingdom_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , kingdom_name VARCHAR(20) , population INT UNSIGNED) ENGINE=INNODB; -- Create normalized knight table. SELECT 'KNIGHT' AS "Create Table"; CREATE TABLE knight ( knight_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , knight_name VARCHAR(24) , kingdom_allegiance_id INT UNSIGNED , allegiance_start_date DATE , allegiance_end_date DATE , CONSTRAINT fk_kingdom FOREIGN KEY (kingdom_allegiance_id) REFERENCES kingdom (kingdom_id)) ENGINE=INNODB; -- Create external import table in memory only - disappears after rebooting the mysqld service. SELECT 'KINGDOM_KNIGHT_IMPORT' AS "Create Table"; CREATE TABLE kingdom_knight_import ( kingdom_name VARCHAR(20) , population INT UNSIGNED , knight_name VARCHAR(24) , allegiance_start_date DATE , allegiance_end_date DATE) ENGINE=MEMORY; |
Step #4 : Load the data into your target upload table
There a number of things that could go wrong but when you choose LOCAL
there generally aren’t any problems. Run the following query from the student
account while using the sampledb
database, and check whether or not you can access the kingdom_import.csv
file.
1 2 3 4 5 6 | LOAD DATA LOCAL INFILE 'c:/Data/kingdom_mysql_import.csv' INTO TABLE kingdom_knight_import FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\r\n'; |
Step #5 : Create the upload procedure
Copy the following into a create_mysql_upload_procedure.sql
file within a directory of your choice. You should note that unlike Oracle’s MERGE
statement, this is done with the ON DUPLICATE KEY
clause and requires actual values not a source query. This presents few options other than a stored routine, known as a stored procedure. As you can see from the code, there’s a great deal of complexity to the syntax and a much more verbose implementation than Oracle’s equivalent PL/SQL.
Then, run it as the student
account. As you look at the structure to achieve this simple thing, the long standing complaint about PL/SQL being a verbose language comes to mind. Clearly, stored procedures are new to MySQL but they’re quite a bit more verbose than PL/SQL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | -- Conditionally drop the procedure. SELECT 'UPLOAD_KINGDOM' AS "Drop Procedure"; DROP PROCEDURE IF EXISTS upload_kingdom; -- Reset the execution delimiter to create a stored program. DELIMITER $$ -- The parentheses after the procedure name must be there or the MODIFIES SQL DATA raises an compile time exception. CREATE PROCEDURE upload_kingdom() MODIFIES SQL DATA BEGIN /* Declare local variables. */ DECLARE lv_kingdom_id INT UNSIGNED; DECLARE lv_kingdom_name VARCHAR(20); DECLARE lv_population INT UNSIGNED; DECLARE lv_knight_id INT UNSIGNED; DECLARE lv_knight_name VARCHAR(24); DECLARE lv_kingdom_allegiance_id INT UNSIGNED; DECLARE lv_allegiance_start_date DATE; DECLARE lv_allegiance_end_date DATE; /* Declare a handler variables. */ DECLARE duplicate_key INT DEFAULT 0; DECLARE foreign_key INT DEFAULT 0; DECLARE fetched INT DEFAULT 0; /* Cursors must come after variables and before event handlers. */ /* Declare a SQL cursor with a left join on the natural key. */ DECLARE kingdom_cursor CURSOR FOR SELECT DISTINCT k.kingdom_id , kki.kingdom_name , kki.population FROM kingdom_knight_import kki LEFT JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population; /* Declare a SQL cursor with a join on the natural key. */ DECLARE knight_cursor CURSOR FOR SELECT kn.knight_id , kki.knight_name , k.kingdom_id , kki.allegiance_start_date AS start_date , kki.allegiance_end_date AS end_date FROM kingdom_knight_import kki INNER JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population LEFT JOIN knight kn ON k.kingdom_id = kn.kingdom_allegiance_id AND kki.knight_name = kn.knight_name AND kki.allegiance_start_date = kn.allegiance_start_date AND kki.allegiance_end_date = kn.allegiance_end_date; /* Event handlers must always be last in the declaration section. */ /* Declare a duplicate key handler */ DECLARE CONTINUE HANDLER FOR 1062 SET duplicate_key = 1; DECLARE CONTINUE HANDLER FOR 1216 SET foreign_key = 1; /* Declare a not found record handler to close a cursor loop. */ DECLARE CONTINUE HANDLER FOR NOT FOUND SET fetched = 1; /* ---------------------------------------------------------------------- */ /* Start transaction context. */ START TRANSACTION; /* Set savepoint. */ SAVEPOINT both_or_none; /* Open a local cursor. */ OPEN kingdom_cursor; cursor_kingdom: LOOP FETCH kingdom_cursor INTO lv_kingdom_id , lv_kingdom_name , lv_population; /* Place the catch handler for no more rows found immediately after the fetch operation. */ IF fetched = 1 THEN LEAVE cursor_kingdom; END IF; INSERT INTO kingdom VALUES ( lv_kingdom_id , lv_kingdom_name , lv_population ) ON DUPLICATE KEY UPDATE kingdom_name = lv_kingdom_name; END LOOP cursor_kingdom; CLOSE kingdom_cursor; /* Reset the continue handler to zero. */ SET fetched = 0; /* Open a local cursor. */ OPEN knight_cursor; cursor_knight: LOOP /* Fetch records until they're all read, and a NOT FOUND SET is returned. */ FETCH knight_cursor INTO lv_knight_id , lv_knight_name , lv_kingdom_allegiance_id , lv_allegiance_start_date , lv_allegiance_end_date; /* Place the catch handler for no more rows found immediately after the fetch operation. */ IF fetched = 1 THEN LEAVE cursor_knight; END IF; INSERT INTO knight VALUES ( lv_knight_id , lv_knight_name , lv_kingdom_allegiance_id , lv_allegiance_start_date , lv_allegiance_end_date ) ON DUPLICATE KEY UPDATE knight_name = lv_knight_name; END LOOP cursor_knight; CLOSE knight_cursor; /* Reset the continue handler to zero. */ SET fetched = 0; /* ---------------------------------------------------------------------- */ /* This acts as an exception handling block. */ IF duplicate_key = 1 OR foreign_key = 1 THEN /* This undoes all DML statements to this point in the procedure. */ ROLLBACK TO SAVEPOINT both_or_none; ELSE /* This commits the writes. */ COMMIT; END IF; END; $$ -- Reset the delimiter to the default. DELIMITER ; |
Here’s the better option with an embedded query:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | -- Conditionally drop the procedure. SELECT 'UPLOAD_KINGDOM' AS "Drop Procedure"; DROP PROCEDURE IF EXISTS upload_kingdom; -- Reset the execution delimiter to create a stored program. DELIMITER $$ -- The parentheses after the procedure name must be there or the MODIFIES SQL DATA raises an compile time exception. CREATE PROCEDURE upload_kingdom() MODIFIES SQL DATA BEGIN /* Declare a handler variables. */ DECLARE duplicate_key INT DEFAULT 0; DECLARE foreign_key INT DEFAULT 0; /* Declare a duplicate key handler */ DECLARE CONTINUE HANDLER FOR 1062 SET duplicate_key = 1; DECLARE CONTINUE HANDLER FOR 1216 SET foreign_key = 1; /* ---------------------------------------------------------------------- */ /* Start transaction context. */ START TRANSACTION; /* Set savepoint. */ SAVEPOINT both_or_none; /* Using subqueries update the targets. */ INSERT INTO knight ( SELECT kn.knight_id , kki.knight_name , k.kingdom_id , kki.allegiance_start_date AS start_date , kki.allegiance_end_date AS end_date FROM kingdom_knight_import kki INNER JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population LEFT JOIN knight kn ON k.kingdom_id = kn.kingdom_allegiance_id AND kki.knight_name = kn.knight_name AND kki.allegiance_start_date = kn.allegiance_start_date AND kki.allegiance_end_date = kn.allegiance_end_date ) ON DUPLICATE KEY UPDATE knight_id = kn.knight_id; INSERT INTO knight ( SELECT kn.knight_id , kki.knight_name , k.kingdom_id , kki.allegiance_start_date AS start_date , kki.allegiance_end_date AS end_date FROM kingdom_knight_import kki INNER JOIN kingdom k ON kki.kingdom_name = k.kingdom_name AND kki.population = k.population LEFT JOIN knight kn ON k.kingdom_id = kn.kingdom_allegiance_id AND kki.knight_name = kn.knight_name AND kki.allegiance_start_date = kn.allegiance_start_date AND kki.allegiance_end_date = kn.allegiance_end_date ) ON DUPLICATE KEY UPDATE knight_id = kn.knight_id; /* ---------------------------------------------------------------------- */ /* This acts as an exception handling block. */ IF duplicate_key = 1 OR foreign_key = 1 THEN /* This undoes all DML statements to this point in the procedure. */ ROLLBACK TO SAVEPOINT both_or_none; ELSE /* This commits the writes. */ COMMIT; END IF; END; $$ -- Reset the delimiter to the default. DELIMITER ; |
Step #6 : Run the upload procedure
You can run the file by calling the stored procedure built by the script. The procedure ensures that records are inserted or updated into their respective tables.
CALL upload_kingdom; |
Step #7 : Test the results of the upload procedure
You can test whether or not it worked by running the following queries.
-- Check the kingdom table. SELECT * FROM kingdom; SELECT * FROM knight; |
It should display the following information:
+------------+--------------+------------+ | kingdom_id | kingdom_name | population | +------------+--------------+------------+ | 1 | Narnia | 77600 | | 2 | Narnia | 42100 | | 3 | Camelot | 15200 | +------------+--------------+------------+ +-----------+-------------------------+-----------------------+-----------------------+---------------------+ | knight_id | knight_name | kingdom_allegiance_id | allegiance_start_date | allegiance_end_date | +-----------+-------------------------+-----------------------+-----------------------+---------------------+ | 1 | 'Peter the Magnificent' | 1 | 1272-03-20 | 1292-06-09 | | 2 | 'Edmund the Just' | 1 | 1272-03-20 | 1292-06-09 | | 3 | 'Susan the Gentle' | 1 | 1272-03-20 | 1292-06-09 | | 4 | 'Lucy the Valiant' | 1 | 1272-03-20 | 1292-06-09 | | 5 | 'Peter the Magnificent' | 2 | 1531-04-12 | 1531-05-31 | | 6 | 'Edmund the Just' | 2 | 1531-04-12 | 1531-05-31 | | 7 | 'Susan the Gentle' | 2 | 1531-04-12 | 1531-05-31 | | 8 | 'Lucy the Valiant' | 2 | 1531-04-12 | 1531-05-31 | | 9 | 'King Arthur' | 3 | 0631-03-10 | 0686-12-12 | | 10 | 'Sir Lionel' | 3 | 0631-03-10 | 0686-12-12 | | 11 | 'Sir Bors' | 3 | 0631-03-10 | 0635-12-12 | | 12 | 'Sir Bors' | 3 | 0640-03-10 | 0686-12-12 | | 13 | 'Sir Galahad' | 3 | 0631-03-10 | 0686-12-12 | | 14 | 'Sir Gawain' | 3 | 0631-03-10 | 0686-12-12 | | 15 | 'Sir Tristram' | 3 | 0631-03-10 | 0686-12-12 | | 16 | 'Sir Percival' | 3 | 0631-03-10 | 0686-12-12 | | 17 | 'Sir Lancelot' | 3 | 0670-09-30 | 0682-12-12 | +-----------+-------------------------+-----------------------+-----------------------+---------------------+ |
You can rerun the procedure to check that it doesn’t alter any information, then you could add a new knight to test the insertion portion.