MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

MySQL Merge gone Awry

with 13 comments

Sometimes it gets tiresome when people take shots at Oracle, MySQL, SQL Server 2008, or PostgreSQL. When I went to the MySQL for Database Administrators, the instructor mentioned a number of times how many Oracle people he’d get in his class. It was said almost as if all the Oracle customers were migrating to MySQL, which I don’t think is the case. If I’m wrong just post a comment.

After writing SQL since 1985 (IBM SQL/DS), I’ve come to appreciate a number of the features in the Oracle database. This probably makes sense because I worked there for over eight years, and probably would still be there if I hadn’t left to teach at University. I’m constantly amazed as I explore and look for points of commonality across SQL dialects and PL/SQL dialects. At least, I think it’s fair to call T-SQL stored programs and MySQL stored programs PL/SQL dialects.

This blog post is about the MERGE statement and it’s close MySQL cousin, the ON DUPLICATE KEY UPDATE statement (that only works with single row INSERT statements). I’ve recently added a post demonstrating the REPLACE INTO command syntax. You may also be interested in a newer blog post about importing external data inside a MySQL procedure with cursor loops. Upfront, I have to vote for Oracle’s MERGE statement because it’s more complete as an implementation. By the way, T-SQL supports the same MERGE syntax. They’re so exact it wasn’t worth taking the space to show the syntax for the SQL Server 2008 Express product.

Oracle’s Merge Statement

The downside of Oracle’s MERGE statement is that it takes a LOT of typing. The upside from my perspective is that it enforces that you must use the primary key column. The same is not true with the MySQL syntax.

Here’s a quick example that you can cut and paste into your environment for Oracle Database 11g or remove the Oracle specific FROM dual and it’ll run in SQL Server. You can also see how to leverage joins and imports with the MERGE statement in this more recent blog post.

-- Conditionally drop the table and sequence.
BEGIN
  FOR i IN (SELECT NULL
            FROM   user_tables
            WHERE  TABLE_NAME = 'SYSTEM_USER') LOOP
    EXECUTE IMMEDIATE 'DROP TABLE system_user CASCADE CONSTRAINTS';
  END LOOP;
  FOR i IN (SELECT NULL
            FROM   user_sequences
            WHERE  sequence_name = 'SYSTEM_USER_S1') LOOP
    EXECUTE IMMEDIATE 'DROP SEQUENCE system_user_s1';
  END LOOP;
END;
/
 
-- Create the table.
CREATE TABLE system_user
( system_user_id        NUMBER        CONSTRAINT pk_su   PRIMARY KEY
, system_user_name      VARCHAR2(20)  CONSTRAINT nn_su_1 NOT NULL
, system_user_group_id  NUMBER        CONSTRAINT nn_su_2 NOT NULL
, system_user_type      NUMBER        CONSTRAINT nn_su_3 NOT NULL
, first_name            VARCHAR2(20)
, middle_name           VARCHAR2(10)
, last_name             VARCHAR2(20)
, created_by            NUMBER        CONSTRAINT nn_su_4 NOT NULL
, creation_date         DATE          CONSTRAINT nn_su_5 NOT NULL
, last_updated_by       NUMBER        CONSTRAINT nn_su_6 NOT NULL
, last_update_date      DATE          CONSTRAINT nn_su_7 NOT NULL);
 
-- Create the sequence with a default start value of 1.
CREATE SEQUENCE system_user_s1;
 
-- Insert new row.
INSERT INTO system_user
VALUES
( system_user_s1.nextval
, 'SYSADMIN'
, 1
, 1
, NULL
, NULL
, NULL
, 1
, SYSDATE - 1
, 1
, SYSDATE - 1);
 
-- Insert new or merge into existing row.
MERGE INTO system_user target
USING (SELECT   1 AS system_user_id
       ,       'SYSADMIN' AS system_user_name
       ,        1 AS system_user_group_id
       ,        1 AS system_user_type
       ,       'Samuel' AS first_name
       ,        'the' AS middle_name
       ,       'Lamanite' AS last_name
       ,        1 AS created_by
       ,        SYSDATE AS creation_date
       ,        1 AS last_updated_by
       ,        SYSDATE AS last_update_date
       FROM     dual) SOURCE
ON (target.system_user_id = SOURCE.system_user_id)
WHEN MATCHED THEN
  UPDATE SET first_name = 'Samuel'
  ,          middle_name = 'the'
  ,          last_name = 'Lamanite'
  ,          last_updated_by = 1
  ,          last_update_date = SYSDATE
WHEN NOT MATCHED THEN
  INSERT
  ( target.system_user_id
  , target.system_user_name
  , target.system_user_group_id
  , target.system_user_type
  , target.first_name
  , target.middle_name
  , target.last_name
  , target.created_by
  , target.creation_date
  , target.last_updated_by
  , target.last_update_date )  
  VALUES
  ( SOURCE.system_user_id
  , SOURCE.system_user_name
  , SOURCE.system_user_group_id
  , SOURCE.system_user_type
  , SOURCE.first_name
  , SOURCE.middle_name
  , SOURCE.last_name
  , SOURCE.created_by
  , SOURCE.creation_date
  , SOURCE.last_updated_by
  , SOURCE.last_update_date );

MySQL On Duplicate Key Update Statement

The downside of MySQL’s ON DUPLICATE KEY UPDATE statement is that it takes it lets you use an override signature on the INSERT that then creates a new row when it shouldn’t. I logged a bug when I discovered that behavior earlier today but they don’t quite agree that it’s a bug. The upside is that the typing is MUCH shorter provided you remember to use the auto increment key column in the SELECT clause.

Here’s a quick example of what not to do! The next one shows you what you should do. You can see how to tie together MySQL subroutines with an INSERT ... ON DUPLICATE KEY statement in this more recent blog entry.

-- Conditionally drop the table.
DROP TABLE IF EXISTS system_user;
 
-- Create the table.
CREATE TABLE system_user
( system_user_id              INT UNSIGNED  PRIMARY KEY AUTO_INCREMENT
, system_user_name            CHAR(20)      NOT NULL
, system_user_group_id        INT           NOT NULL
, system_user_type            INT           NOT NULL
, first_name                  CHAR(20)
, middle_name                 CHAR(10)
, last_name                   CHAR(20)
, created_by                  INT           NOT NULL
, creation_date               DATE          NOT NULL
, last_updated_by             INT           NOT NULL
, last_update_date            DATE          NOT NULL);
 
-- Insert new row.
INSERT INTO system_user
( system_user_name
, system_user_group_id
, system_user_type
, created_by
, creation_date
, last_updated_by
, last_update_date )
VALUES
('SYSADMIN'
, 1
, 1
, 1
, DATE_SUB(NOW(),INTERVAL 1 DAY)
, 1
, DATE_SUB(NOW(),INTERVAL 1 DAY));
 
-- Insert new or merge into existing row.
INSERT INTO system_user
( system_user_name
, system_user_group_id
, system_user_type
, first_name
, middle_name
, last_name
, created_by
, creation_date
, last_updated_by
, last_update_date )
VALUES
('SYSADMIN'
, 1
, 1
,'Samuel'
,'the'
,'Lamanite'
, 1
, NOW()
, 1
, NOW())
ON DUPLICATE KEY 
UPDATE first_name = 'Samuel'
,      middle_name = 'the'
,      last_name = 'Lamanite'
,      last_updated_by = 1
,      last_update_date = UTC_DATE();

This script ends up inserting two rows when only one should be present. Why did that happen? Great question! You can fix this by adding a unique key defined for the columns that make up the natural key for the SYSTEM_USER table. As noted by a comment below, the unique key must only include columns that are NOT NULL constrained. Here’s the results without such a unique key:

mysql> SELECT * FROM system_user\G
*************************** 1. ROW ***************************
      system_user_id: 1
    system_user_name: SYSADMIN
system_user_group_id: 1
    system_user_type: 1
          first_name: NULL
         middle_name: NULL
           last_name: NULL
          created_by: 1
       creation_date: 2009-05-24
     last_updated_by: 1
    last_update_date: 2009-05-24
*************************** 2. ROW ***************************
      system_user_id: 2
    system_user_name: SYSADMIN
system_user_group_id: 1
    system_user_type: 1
          first_name: Samuel
         middle_name: NULL
           last_name: Lamanite
          created_by: 1
       creation_date: 2009-05-25
     last_updated_by: 1
    last_update_date: 2009-05-25
2 ROWS IN SET (0.02 sec)

The correct way to do this in MySQL is shown in the next example.

-- Conditionally drop table.
DROP TABLE IF EXISTS system_user;
 
-- Create table.
CREATE TABLE system_user
( system_user_id              INT UNSIGNED  PRIMARY KEY AUTO_INCREMENT
, system_user_name            CHAR(20)      NOT NULL
, system_user_group_id        INT           NOT NULL
, system_user_type            INT           NOT NULL
, first_name                  CHAR(20)
, middle_name                 CHAR(10)
, last_name                   CHAR(20)
, created_by                  INT           NOT NULL
, creation_date               DATE          NOT NULL
, last_updated_by             INT           NOT NULL
, last_update_date            DATE          NOT NULL);
 
-- Insert new row.
INSERT INTO system_user
( system_user_name
, system_user_group_id
, system_user_type
, created_by
, creation_date
, last_updated_by
, last_update_date )
VALUES
('SYSADMIN'
, 1
, 1
, 1
, DATE_SUB(UTC_DATE(),INTERVAL 1 DAY)
, 1
, DATE_SUB(UTC_DATE(),INTERVAL 1 DAY));
 
-- Insert new or merge into existing row.
INSERT INTO system_user
VALUES
( 1
,'SYSADMIN'
, 1
, 1
,'Samuel'
,'the'
,'Lamanite'
, 1
, NOW()
, 1
, NOW())
ON DUPLICATE KEY
UPDATE first_name = 'Samuel'
,      middle_name = 'the'
,      last_name = 'Lamanite'
,      last_updated_by = 1
,      last_update_date = NOW();

This ensures that the auto increment column values are matched. It returns what you’d expect, a single row inserted into or updated in the table.

mysql> SELECT * FROM system_user\G
*************************** 1. ROW ***************************
      system_user_id: 1
    system_user_name: SYSADMIN
system_user_group_id: 1
    system_user_type: 1
          first_name: Samuel
         middle_name: NULL
           last_name: Lamanite
          created_by: 1
       creation_date: 2009-05-24
     last_updated_by: 1
    last_update_date: 2009-05-25
1 ROW IN SET (0.00 sec)

I hope this saves somebody from an insertion anomaly.

Written by maclochlainn

May 25th, 2009 at 6:03 pm

Posted in MySQL,Oracle,sql