MySQL Merge gone Awry
Sometimes it gets tiresome when people take shots at Oracle, MySQL, SQL Server 2008, or PostgreSQL. When I went to the MySQL for Database Administrators, the instructor mentioned a number of times how many Oracle people he’d get in his class. It was said almost as if all the Oracle customers were migrating to MySQL, which I don’t think is the case. If I’m wrong just post a comment.
After writing SQL since 1985 (IBM SQL/DS), I’ve come to appreciate a number of the features in the Oracle database. This probably makes sense because I worked there for over eight years, and probably would still be there if I hadn’t left to teach at University. I’m constantly amazed as I explore and look for points of commonality across SQL dialects and PL/SQL dialects. At least, I think it’s fair to call T-SQL stored programs and MySQL stored programs PL/SQL dialects.
This blog post is about the MERGE
statement and it’s close MySQL cousin, the ON DUPLICATE KEY UPDATE
statement (that only works with single row INSERT
statements). I’ve recently added a post demonstrating the REPLACE INTO
command syntax. You may also be interested in a newer blog post about importing external data inside a MySQL procedure with cursor loops. Upfront, I have to vote for Oracle’s MERGE
statement because it’s more complete as an implementation. By the way, T-SQL supports the same MERGE
syntax. They’re so exact it wasn’t worth taking the space to show the syntax for the SQL Server 2008 Express product.
Oracle’s Merge Statement
The downside of Oracle’s MERGE
statement is that it takes a LOT of typing. The upside from my perspective is that it enforces that you must use the primary key column. The same is not true with the MySQL syntax.
Here’s a quick example that you can cut and paste into your environment for Oracle Database 11g or remove the Oracle specific FROM dual
and it’ll run in SQL Server. You can also see how to leverage joins and imports with the MERGE
statement in this more recent blog post.
-- Conditionally drop the table and sequence. BEGIN FOR i IN (SELECT NULL FROM user_tables WHERE TABLE_NAME = 'SYSTEM_USER') LOOP EXECUTE IMMEDIATE 'DROP TABLE system_user CASCADE CONSTRAINTS'; END LOOP; FOR i IN (SELECT NULL FROM user_sequences WHERE sequence_name = 'SYSTEM_USER_S1') LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE system_user_s1'; END LOOP; END; / -- Create the table. CREATE TABLE system_user ( system_user_id NUMBER CONSTRAINT pk_su PRIMARY KEY , system_user_name VARCHAR2(20) CONSTRAINT nn_su_1 NOT NULL , system_user_group_id NUMBER CONSTRAINT nn_su_2 NOT NULL , system_user_type NUMBER CONSTRAINT nn_su_3 NOT NULL , first_name VARCHAR2(20) , middle_name VARCHAR2(10) , last_name VARCHAR2(20) , created_by NUMBER CONSTRAINT nn_su_4 NOT NULL , creation_date DATE CONSTRAINT nn_su_5 NOT NULL , last_updated_by NUMBER CONSTRAINT nn_su_6 NOT NULL , last_update_date DATE CONSTRAINT nn_su_7 NOT NULL); -- Create the sequence with a default start value of 1. CREATE SEQUENCE system_user_s1; -- Insert new row. INSERT INTO system_user VALUES ( system_user_s1.nextval , 'SYSADMIN' , 1 , 1 , NULL , NULL , NULL , 1 , SYSDATE - 1 , 1 , SYSDATE - 1); -- Insert new or merge into existing row. MERGE INTO system_user target USING (SELECT 1 AS system_user_id , 'SYSADMIN' AS system_user_name , 1 AS system_user_group_id , 1 AS system_user_type , 'Samuel' AS first_name , 'the' AS middle_name , 'Lamanite' AS last_name , 1 AS created_by , SYSDATE AS creation_date , 1 AS last_updated_by , SYSDATE AS last_update_date FROM dual) SOURCE ON (target.system_user_id = SOURCE.system_user_id) WHEN MATCHED THEN UPDATE SET first_name = 'Samuel' , middle_name = 'the' , last_name = 'Lamanite' , last_updated_by = 1 , last_update_date = SYSDATE WHEN NOT MATCHED THEN INSERT ( target.system_user_id , target.system_user_name , target.system_user_group_id , target.system_user_type , target.first_name , target.middle_name , target.last_name , target.created_by , target.creation_date , target.last_updated_by , target.last_update_date ) VALUES ( SOURCE.system_user_id , SOURCE.system_user_name , SOURCE.system_user_group_id , SOURCE.system_user_type , SOURCE.first_name , SOURCE.middle_name , SOURCE.last_name , SOURCE.created_by , SOURCE.creation_date , SOURCE.last_updated_by , SOURCE.last_update_date ); |
MySQL On Duplicate Key Update Statement
The downside of MySQL’s ON DUPLICATE KEY UPDATE
statement is that it takes it lets you use an override signature on the INSERT
that then creates a new row when it shouldn’t. I logged a bug when I discovered that behavior earlier today but they don’t quite agree that it’s a bug. The upside is that the typing is MUCH shorter provided you remember to use the auto increment key column in the SELECT
clause.
Here’s a quick example of what not to do! The next one shows you what you should do. You can see how to tie together MySQL subroutines with an INSERT ... ON DUPLICATE KEY
statement in this more recent blog entry.
-- Conditionally drop the table. DROP TABLE IF EXISTS system_user; -- Create the table. CREATE TABLE system_user ( system_user_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , system_user_name CHAR(20) NOT NULL , system_user_group_id INT NOT NULL , system_user_type INT NOT NULL , first_name CHAR(20) , middle_name CHAR(10) , last_name CHAR(20) , created_by INT NOT NULL , creation_date DATE NOT NULL , last_updated_by INT NOT NULL , last_update_date DATE NOT NULL); -- Insert new row. INSERT INTO system_user ( system_user_name , system_user_group_id , system_user_type , created_by , creation_date , last_updated_by , last_update_date ) VALUES ('SYSADMIN' , 1 , 1 , 1 , DATE_SUB(NOW(),INTERVAL 1 DAY) , 1 , DATE_SUB(NOW(),INTERVAL 1 DAY)); -- Insert new or merge into existing row. INSERT INTO system_user ( system_user_name , system_user_group_id , system_user_type , first_name , middle_name , last_name , created_by , creation_date , last_updated_by , last_update_date ) VALUES ('SYSADMIN' , 1 , 1 ,'Samuel' ,'the' ,'Lamanite' , 1 , NOW() , 1 , NOW()) ON DUPLICATE KEY UPDATE first_name = 'Samuel' , middle_name = 'the' , last_name = 'Lamanite' , last_updated_by = 1 , last_update_date = UTC_DATE(); |
This script ends up inserting two rows when only one should be present. Why did that happen? Great question! You can fix this by adding a unique key defined for the columns that make up the natural key for the SYSTEM_USER
table. As noted by a comment below, the unique key must only include columns that are NOT NULL
constrained. Here’s the results without such a unique key:
mysql> SELECT * FROM system_user\G *************************** 1. ROW *************************** system_user_id: 1 system_user_name: SYSADMIN system_user_group_id: 1 system_user_type: 1 first_name: NULL middle_name: NULL last_name: NULL created_by: 1 creation_date: 2009-05-24 last_updated_by: 1 last_update_date: 2009-05-24 *************************** 2. ROW *************************** system_user_id: 2 system_user_name: SYSADMIN system_user_group_id: 1 system_user_type: 1 first_name: Samuel middle_name: NULL last_name: Lamanite created_by: 1 creation_date: 2009-05-25 last_updated_by: 1 last_update_date: 2009-05-25 2 ROWS IN SET (0.02 sec) |
The correct way to do this in MySQL is shown in the next example.
-- Conditionally drop table. DROP TABLE IF EXISTS system_user; -- Create table. CREATE TABLE system_user ( system_user_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , system_user_name CHAR(20) NOT NULL , system_user_group_id INT NOT NULL , system_user_type INT NOT NULL , first_name CHAR(20) , middle_name CHAR(10) , last_name CHAR(20) , created_by INT NOT NULL , creation_date DATE NOT NULL , last_updated_by INT NOT NULL , last_update_date DATE NOT NULL); -- Insert new row. INSERT INTO system_user ( system_user_name , system_user_group_id , system_user_type , created_by , creation_date , last_updated_by , last_update_date ) VALUES ('SYSADMIN' , 1 , 1 , 1 , DATE_SUB(UTC_DATE(),INTERVAL 1 DAY) , 1 , DATE_SUB(UTC_DATE(),INTERVAL 1 DAY)); -- Insert new or merge into existing row. INSERT INTO system_user VALUES ( 1 ,'SYSADMIN' , 1 , 1 ,'Samuel' ,'the' ,'Lamanite' , 1 , NOW() , 1 , NOW()) ON DUPLICATE KEY UPDATE first_name = 'Samuel' , middle_name = 'the' , last_name = 'Lamanite' , last_updated_by = 1 , last_update_date = NOW(); |
This ensures that the auto increment column values are matched. It returns what you’d expect, a single row inserted into or updated in the table.
mysql> SELECT * FROM system_user\G *************************** 1. ROW *************************** system_user_id: 1 system_user_name: SYSADMIN system_user_group_id: 1 system_user_type: 1 first_name: Samuel middle_name: NULL last_name: Lamanite created_by: 1 creation_date: 2009-05-24 last_updated_by: 1 last_update_date: 2009-05-25 1 ROW IN SET (0.00 sec) |
I hope this saves somebody from an insertion anomaly.