Convert XML to CSV
There are several XSLT Processors that you can download. I like the Apache XML Project – Xalan. You can read an overview or download it. You’ll need to have the Java 5 SDK loaded on your machine because this utility is written in Java and called by it (in this example).
The command line instructions are a bit out of date. After you download the software, you’ll need to source your Java %PATH%
(or $PATH
) and %CLASSPATH%
(or $CLASSPATH
) environment variables. You can set your environment with the following references.
set PATH=C:\Sun\SDK\jdk\bin;%PATH% set CLASSPATH=C:\JavaDev\Java5\ojdbc5.jar;. |
After sourcing the local SDK, you need to source the XML Processor utility with these Java Archives:
set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xalan.jar;. set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\serializer.jar set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xercesImpl.jar set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xml-apis.jar set CLASSPATH=%CLASSPATH%;C:\xalan-j_2_7_1\xsltc.jar |
If you mistype the *.jar
files in your %CLASSPATH%
(or $CLASSPATH
) variable, you’ll raise the following error:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/xalan/xslt/Process |
Now that the environment is configured, you can do the following steps.
1. Create an XML file like this:
<?xml version="1.0" encoding="ISO-8859-1"?> <character> <name> <role>Indiana Jones</role> <actor>Harrison Ford</actor> <part>protagonist</part> <film>Indiana Jones and Raiders of the Lost Ark</film> <film>Indiana Jones and the Temple of Doom</film> <film>Indiana Jones and the Last Crusade</film> <film>Indiana Jones and the Kingdom of the Crystal Skull</film> </name> <name> <role>Wilhelmina Scott</role> <actor>Kate Capshaw</actor> <part>support</part> <film>Indiana Jones and the Temple of Doom</film> </name> <name> <role>Marion Ravenwood</role> <actor>Karen Allen</actor> <part>support</part> <film>Indiana Jones and Raiders of the Lost Ark</film> <film>Indiana Jones and the Kingdom of the Crystal Skull</film> </name> <name> <role>Elsa Schneider</role> <actor>Alison Doody</actor> <part>support</part> <film>Indiana Jones and the Last Crusade</film> </name> <name> <role>Short Round</role> <actor>Jonathan Ke Quan</actor> <part>support</part> <film>Indiana Jones and the Temple of Doom</film> </name> <name> <role>Sallah</role> <actor>Jonn Rhys-Davies</actor> <part>support</part> <film>Indiana Jones and Raiders of the Lost Ark</film> <film>Indiana Jones and the Last Crusade</film> </name> <name> <role>Professor Henry Jones</role> <actor>Sean Connery</actor> <part>support</part> <film>Indiana Jones and the Last Crusade</film> </name> <name> <role>Henry "Mutt" Williams</role> <actor>Shia LaBeouf</actor> <part>support</part> <film>Indiana Jones and the Kingdom of the Crystal Skull</film> </name> <name> <role>Marcus Brody</role> <actor>Denholm Elliott</actor> <part>support</part> <film>Indiana Jones and Raiders of the Lost Ark</film> <film>Indiana Jones and the Last Crusade</film> </name> <name> <role>Amrish Puri</role> <actor>Mola Ram</actor> <part>antagonist</part> <film>Indiana Jones and the Temple of Doom</film> </name> <name> <role>Rene Belloq</role> <actor>Belloq</actor> <part>antagonist</part> <film>Indiana Jones and Raiders of the Lost Ark</film> </name> <name> <role>Walter Donovan</role> <actor>Julian Glover</actor> <part>antagonist</part> <film>Indiana Jones and the Last Crusade</film> </name> <name> <role>Colonel Vogel</role> <actor>Michael Bryne</actor> <part>antagonist</part> <film>Indiana Jones and the Last Crusade</film> </name> <name> <role>Irina Spalko</role> <actor>Cate Blanchett</actor> <part>antagonist</part> <film>Indiana Jones and the Kingdom of the Crystal Skull</film> </name> </character> |
2. Create an XSL file to transform the XML. This one actually filters the file by movie, checks for non-antogonists, and sorts the data. It also provides a couple tricks, like entering the decimal commas, apostrophes, and line returns. An apostrophe is a decimal '
, a comma is a decimal ,
, and a line return is a 
.
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <!-- This loops through the branch when a sibling meets a condition. --> <xsl:for-each select="character/name[film='Indiana Jones and the Last Crusade']"> <!-- Sorts based on the value in the "role" element. --> <xsl:sort select="role" /> <!-- Eliminates anyone that has a "part" element value of "antagonist". --> <xsl:if test="part != 'antagonist'"> <!-- An apostrophe before and after with a line return. --> <xsl:text>'</xsl:text> <xsl:value-of select="role"/> <!-- An apostrophe followed by a comma --> <xsl:text>',</xsl:text> <xsl:text>'</xsl:text> <xsl:value-of select="actor"/> <xsl:text>',</xsl:text> <xsl:text>'</xsl:text> <xsl:value-of select="film"/> <!-- An apostrophe followed by a line return --> <xsl:text>' </xsl:text> </xsl:if> </xsl:for-each> </xsl:template> </xsl:stylesheet> |
3. Run the XSLT Processor (Xalan) to convert it to a comma separated values file:
C:\> java org.apache.xalan.xslt.Process -IN RaidersXML.xml -XSL RaidersXML.xsl -TEXT > FilteredRaidersXML.csv |
4. Test the contents (use cat on Linux or Unix):
C:\> type FilteredRaidersXML.csv |
This should display:
'Elsa Schneider','Alison Doody','Indiana Jones and the Last Crusade' 'Indiana Jones','Harrison Ford','Indiana Jones and Raiders of the Lost Ark' 'Marcus Brody','Denholm Elliott','Indiana Jones and Raiders of the Lost Ark' 'Professor Henry Jones','Sean Connery','Indiana Jones and the Last Crusade' 'Sallah','Jonn Rhys-Davies','Indiana Jones and Raiders of the Lost Ark' |
An earlier post has the details for uploading and downloading from Oracle external tables. You can check it here.
5. Create the virtual directory as a privileged user:
SQL> CREATE DIRECTORY download AS ‘C:\Download’ |
6. As the privileged user grant the read only privilege to the schema that will read the data:
SQL> GRANT READ ON DIRECTORY download TO importer; |
7. Alternatively, if you want to write log or exception files, grant read and write like:
SQL> GRANT READ, WRITE ON DIRECTORY download TO importer; |
8. Connect to the database and create a read-only external table that doesn’t log success, failures or bad rows:
CREATE TABLE movie ( ROLE VARCHAR2(30) , actor VARCHAR2(30) , movie VARCHAR2(60)) ORGANIZATION EXTERNAL ( TYPE oracle_loader DEFAULT DIRECTORY download ACCESS PARAMETERS ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII NOBADFILE NODISCARDFILE NOLOGFILE FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY "'" MISSING FIELD VALUES ARE NULL ) LOCATION ('FilteredRaidersXML.csv')) REJECT LIMIT 0; |
9. Query the external table movie to verify that it works.
SELECT * FROM movie; |
If you don’t like this solution, you can always default back to the Oracle XML Developer Guide for Oracle 11g, Chapter 35 for an alternative. Thanks to Kevin for reminding me that Oracle APEX lets you upload XML data through the interface: Utilities -> Data Load/Unload -> Load XML Data. Chapter 35 also contains a link to Chapter 3 which directs you to information about configuring and using Oracle XDB. Oracle XDB is the foundation component of Oracle APEX.
If you’d like to upload it to MySQL, you can check this other blog post to discover how to do it. Hope this was helpful. If I missed anything, please let me know.
As an alternative, if you have Apex installed (10g) or enabled (11g), you have the option to load data in either XML or CSV format.
Kevin
23 Jun 08 at 12:33 am
I tried your example and got this error
-bash-3.00$ java org.apache.xalan.xslt.Process -IN test.xml -XSL filter.xsl -TEXT
[Fatal Error] filter.xsl:2:25: Open quote is expected for attribute “version” associated with an element type “xsl:stylesheet”.
file:///export/home/ywu/xml/filter.xsl; Line #2; Column #25; org.xml.sax.SAXParseException: Open quote is expected for attribute “version” associated with an element type “xsl:stylesheet”.
(Location of error unknown)XSLT Error (javax.xml.transform.TransformerException): No xml-stylesheet PI found in: test.xml
Exception in thread “main” java.lang.RuntimeException: No xml-stylesheet PI found in: test.xml
at org.apache.xalan.xslt.Process.doExit(Process.java:1155)
at org.apache.xalan.xslt.Process.main(Process.java:1128)
Yong
18 Sep 08 at 11:12 pm
Yong,
Unfortunately, when I copied the code in as plain text there were two outcomes. The first is that people can cut and paste. The second was that the apostrophes weren’t purely ASCII apostrophes because WordPress converted them to specialized characters.
While you could replace the extended character set apostrophes in an editor and run the sample code, it was tedious. Your error appears to indicate that the quotes aren’t matched.
I’ve updated all the code to now work with a direct copy into your console or file. It’s one of the reasons why I moved the blog off wordpress.com because they don’t provide that control for plugins.
maclochlainn
18 Sep 08 at 11:39 pm
If you run into problems while creating the external table, try replacing the “REJECT LIMIT 0” part by “REJECT LIMIT UNLIMITED”
Mathew P
3 Dec 09 at 11:28 pm
I was able to transform the files using bothe the command line and in TextPad, however the OUT file has appended to the beginning of the file. Any insight as to why?
Thanks
Rochaus
24 Mar 10 at 7:06 am
I can’t get the string that is appearing in my out file to appear in this post?
Rochaus
24 Mar 10 at 7:08 am
I’m not sure but I think you’re on step #3, did you use a single > or two. Two would append to the file.
maclochlainn
24 Mar 10 at 9:25 pm
Which string? Can you point to a part of the blog entry that’s hung you up, or show what you’ve got?
maclochlainn
24 Mar 10 at 9:25 pm
[…] There are several XSLT Processors that you can download. I like the Apache XML Project – Xalan. […]
Fix Xsltc.jar Errors - Windows XP, Vista, 7 & 8
9 Sep 14 at 8:06 pm
[…] an XML file to a CSV file to upload into MySQL. They were asking the question based on an old Convert XML to CSV blog post from 2008. Amazing though that is, I had to explain the process no longer requires manual […]
Oxygen XML Editor | MacLochlainns Weblog
11 Mar 21 at 1:15 pm