MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Load XML Local Infile

with 9 comments

Having downloaded and tested some basics of the MySQL 5.5 Release Candidate, I started checking out the new features. While testing the new LOAD XML LOCAL INFILE feature, I discovered that there are restrictions governing the configuration of source XML files.

You must restrict the XML file to a list of tag names that correspond to column names within a tag defined as <row>. The tag names are case sensitive to your column names. You can replace the <row> tag name with any name of your choosing provided you append a clause that maps rows to your substitution XML tag name.

You can’t convert a file that has multiple child XML tags with the same name. Any attempt simply loads the last tag name found in the row hierarchy. Therefore, you should ensure that all source files have a unique list of case-sensitive child tags that map to the column definitions of the import table.

Either of the following table definition provides for lowercase column names. The first one uses nothing to delimit the column names.

CREATE TABLE CHARACTER
( ROLE CHAR(30) NOT NULL
, actor CHAR(30) NOT NULL
, part CHAR(20) NOT NULL
, film CHAR(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

If you run SHOW CREATE TABLE character; you would see the more conventional definition below.

CREATE TABLE `character`
( `role` CHAR(30) NOT NULL
, `actor` CHAR(30) NOT NULL
, `part` CHAR(20) NOT NULL
, `film` CHAR(100) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

The following source file format supports the table definition because the XML tags are lowercase names. The source file wouldn’t work if the column names where uppercase or mixed case names.

<row>
  <role>Indiana Jones</role>
  <actor>Harrison Ford</actor>
  <part>protagonist</part>
  <film>Indiana Jones and Raiders of the Lost Ark</film>
  <film>Indiana Jones and the Temple of Doom</film>
  <film>Indiana Jones and the Last Crusade</film>
</row>

When the XML tags and column names match in case, you can load the file with the following syntax:

LOAD XML LOCAL FILE 'C:/Data/XML/character.html'
INTO TABLE `character`;

If you name the <row> tag <character>, you need to add a clause as noted below.

LOAD XML LOCAL FILE 'C:/Data/XML/character.html'
INTO TABLE `character`
ROWS IDENTIFIED BY '<character>';

If you have two film tags in a row tag, MySQL 5.5 doesn’t raise an error or warning. It simply loads the last <film> value. If you create tables with column names that don’t match on a case-sensitive basis, you’ll receive a 1263 warning message. You can see the warning message with the following command.

SHOW warnings;

The warning message only occurs when a column is not null constrained and the column name fails to match an XML tag attribute in the source file. No error or warning is raised when a column isn’t not null constrained under the same scenario. You can test it and then show warnings.

You should see something like this:

+---------+------+---------------------------------------------------------------------------------+
| Level   | Code | Message                                                                         |
+---------+------+---------------------------------------------------------------------------------+
| Warning | 1263 | COLUMN SET TO DEFAULT VALUE; NULL supplied TO NOT NULL COLUMN 'Role' at ROW 1   |
| Warning | 1263 | COLUMN SET TO DEFAULT VALUE; NULL supplied TO NOT NULL COLUMN 'Actor' at ROW 1  |
| Warning | 1263 | COLUMN SET TO DEFAULT VALUE; NULL supplied TO NOT NULL COLUMN 'Part' at ROW 1   |
| Warning | 1263 | COLUMN SET TO DEFAULT VALUE; NULL supplied TO NOT NULL COLUMN 'Film' at ROW 1   |
+---------+------+---------------------------------------------------------------------------------+

When you define a table with the AUTO_INCREMENT set to a value other than 1, the LOAD XML LOCAL FILE command resets the sequence to 1 before loading. I’d recommend you import into a table without an auto incrementing column and then use the REPLACE INTO to set the surrogate key values of auto incremented columns.

Written by maclochlainn

September 26th, 2010 at 1:56 am

Posted in MySQL,xml