Update to Oracle External File Basics
While an earlier entry discusses CSV uploads to Oracle external tables, I neglected to mention some things. You can also upload position specific files, override the directory for log files, and override the file extensions.
There are actually two syntax methods for uploading position specific file. Only one is covered in the Oracle 11g Database Utilities manual. It makes you wonder if Oracle supports one or both.
You can override default log, bad, or discard file extensions when you enclose the relative file name in single quotes. You can also specify a virtual directory that differs from your upload (data import) directory.
You may get a surprise if you think virtual directories are case insensitive while defining external tables. They’re not. You must enter overriding virtual directories in uppercase only! You’ll raise this exception stack if you attempt anything else:
ERROR at line 1: ORA-29913: error IN executing ODCIEXTTABLEOPEN callout ORA-29400: DATA cartridge error KUP-04080: directory object download NOT found |
The full syntax for virtual directories and extensions is in this note.
Convert XML Files to Comma-delimited Files without Programming
A couple students snagged some data through Google searches but they found it was formatted as an XML file. They wanted to know the easiest way to convert it and load it into an Oracle database. I created this quick reference which leverages Apache Xalan Project command-line tool and Oracle external tables. You must source files in the %CLASSPATH% (or $CLASSPATH) correctly order as shown in the example or you will raise a "NoClassDefFoundError: org/apache/xalan/xslt/Process" error.
Briefly, XML supports two paradigms. One is the single-pass compiler (or streams approach). Another is a data structure (or tree approach), which typically requires programming skills.
XSLT Processors typically implement a streams approach. You call the XSLT Processor by passing two arguments. One is an XML source document and the other XSL style sheet.
While this is often done in your web browser or an application, you can leverage the technology to manually convert an XML file transfer into a comma-delimited file. Comma-delimited files are also known as comma separated value (CSV) files. After you convert the XML file you can use an Oracle external table to read it into the database. Naturally, the Oracle XDK offers more features and complexity.
How to relate table, virtual directory, and external file names
I was trying to automate cleaning up external files when I discovered that there isn’t an administrative view in Oracle Database 11g to link table, virtual directory, and external file. Reflecting on that discovery in Oracle 11g, I realized that limits the concept of a push paradigm with an external file. So, I wrote one.
Oracle External Table Basics
External files are great tools for reading data into and writing data out of an Oracle database. You have two options for reading data into the database. One uses SQL*Loader and the other uses Oracle Data Pump. You have only one option to write data from the database into an external table file. That’s Oracle Data Pump.
I thought this was pretty straightforward when recommending it as a solution. Given the questions that I got back, it appears that it isn’t. Actually, I couldn’t find an example for how you import data through an external table by using Oracle Data Pump. I only checked the Oracle Database Utilities 11g documentation, but maybe its somewhere else.
The “Creating an external table that uses SQL*Loader” page demonstrates how you can create an input or read-only external table. The “Creating an external table that uses Oracle Data Pump” shows you how to create read-write external tables.
Exceptions covered in the Articles
The two referenced pages should help you understand the basics and resolve these error messages (at least on an Oracle Database 11g where I tested them):
Exception stack raised by Oracle SQL*Loader when you provide file extensions for log, bad, or discarded file names without enclosing them in single quotes:
ORA-29913: error IN executing ODCIEXTTABLEOPEN callout ORA-29400: DATA cartridge error KUP-00554: error encountered while parsing access parameters KUP-01005: syntax error: found "dot": expecting one OF: "badfile, byteordermark, characterset, colon, column, data, delimited, discardfile, disable_directory_link_check, fields, fixed, load, logfile, language, nodiscardfile, nobadfile, nologfile, date_cache, processing, readsize, string, skip, territory, vari" KUP-01007: at line 2 COLUMN 20 |
You can also enclose a different Oracle virtual directory by using ‘virtual_directory’:’name.extension’ syntax.
Exception stack raised by Oracle Data Pump when you fail to enumerate columns in the source query:
ERROR at line 6: ORA-30656: COLUMN TYPE NOT supported ON external organized TABLE |
Exception stack raised by Oracle Data Pump when you try to rebuild the external table without previously dropping the external file:
CREATE TABLE item_export * ERROR at line 1: ORA-29913: error IN executing ODCIEXTTABLEOPEN callout ORA-29400: DATA cartridge error KUP-11012: file item_export.dmp IN C:\DATA\Download already EXISTS |
How you can read an external directory list from SQL
A post last week in the SQL & PL/SQL Forum caught my eye because it referenced an old post by Tom Kyte. That post shows you how to read an external file system directory using Java library wrapped by a PL/SQL program unit. The problem I have with the solution is that it writes the data to a table, and then it reads the file list from the table. This type of design requires cleaning up the table after running the function or procedure.
An improvement on Tom’s old solution would be to return the list as a SQL collection data type. A few searches on the Internet and of the Oracle documentation didn’t unearth an example. The referenced code and instructions show you how to implement the necessary pieces with a PL/SQL wrapper function.
PL/SQL NDS Reference Cursor with Record Collection
Somebody got to this blog looking for a way to write a Native Dynamic SQL (NDS) statement that returned a system reference cursor. I created an image file to show how to do that. I borrowed the code from Chapter 11 of Oracle Database 11g PL/SQL Programming. The only problem with this is that it’s a PL/SQL only solution typically. You can write it as a function or procedure, and then use a Java or PHP program to display the reference cursor in a web page. You can check this previous post for a PHP example.
Leap year, last day of the month the easy way?
You could write an ugly SQL statement full of CASE statements to derive the leap year moving forward or backward in time. You could likewise write an uglier statement that allows for moving back from a 31 day month to a 30 day month or forward from a 30 day month to a 31 day month.
However, you can use the add_months(date_in,number_of_months), like:
Leap year back to a non-leap year (system clock set to 29-FEB-2008):
SQL\> SELECT add_months(SYSDATE,-12) FROM dual; |
From the last day of May to the last day of April (system clock set to 31-MAY-2008):
SQL\> SELECT add_months(SYSDATE,-1) FROM dual; |
From the last day of April to the last day of May (system clock set to 30-APR-2008):
SQL\> SELECT add_months(SYSDATE,1) FROM dual; |
You can read more here in the Oracle SQL Reference.
Leap year not welcome by the TO_YMINTERVAL function
Playing around with illustrating date math, I noticed that Oracle 11g still hasn’t fixed the TO_YMINTERVAL function. The function should let you to subtract months, years, or a combination of month and years from any date. However, it doesn’t like leap years or going from the last day of a longer month to the last day of a shorter month.
You can add or subtract a year from a date in the middle of the month:
SELECT TO_DATE('15-JAN-08') - TO_YMINTERVAL('01-00') FROM dual; |
You can also add or subtract a month from a date in the middle of the month:
SELECT TO_DATE('15-FEB-08') - TO_YMINTERVAL('00-01') FROM dual; |
You can’t subtract a year from the last day of February in a leap year. There is also no logic to let you add a year to 28-FEB-07 and get 29-FEB-08 (like a Boolean for last_day). Likewise, you can’t add a month to the last day of a month where the next month has fewer days in it; or subtract a month from the last day of a month where the last day is greater than the prior months last day without raising an ORA-01839 error.
Here’s what happens subtracting a year from this years leap day:
SELECT TO_DATE('29-FEB-08') - TO_YMINTERVAL('01-00') FROM dual * ERROR at line 1: ORA-01839: DATE NOT valid FOR MONTH specified |
You can’t complain about it though, it’s apparently compliant with the SQL92 standard according to metalink.
Result cache functions disallow nested table input parameters
If you implement street address as a nested table (or collection), the problem is printing an address book using only a SQL statement. The problem comes from matching up the multiple line return from the TABLE function with the rest of the row. That’s not a problem when you denormalized the list into known columns.
A PL/SQL function can convert the list into a scalar value. It’s an easy way to leverage a nested table. You can find the full code here. You might think that this is a neat opportunity to use a result cache function. I did, but the behavior isn’t presently supported. On compilation of a function with the RESULT_CACHE clause and a scalar collection input parameter, you raise a PLS-00999 error.
The message tells you:
RESULT_CACHE IS disallowed ON subprograms WITH IN parameter OF (OR containing) nested TABLE TYPE |
Unnatural outcome of natural joins
You may like the standard syntax of listing tables as comma-delimited lists, rather than using an inner, left, right, full, natural, and cross join syntax. I do! The former method eliminates matching result sets and generally is less tedious.
The natural join is my least favorite of these because it can create problems long after the query is written. The problem is an outcome of its design. A natural join matches rows from one table to another by using all matching column names. This query makes a simplifying assumption: Nobody includes a column of the same name in two tables where they may be joined that isn’t a foreign key.
The following is quick example that creates two tables. The VEHICLE table uses the VEHICLE_ID column as a primary key. The WORK_ORDER table uses the VEHICLE_ID column as a foreign key.
CREATE TABLE vehicle ( vehicle_id NUMBER , vin VARCHAR2(20)); CREATE TABLE work_order ( work_order_id NUMBER , vehicle_id NUMBER); INSERT INTO vehicle VALUES (1,'VIN_NUMBER1'); INSERT INTO vehicle VALUES (2,'VIN_NUMBER2'); INSERT INTO work_order VALUES (1, 1); INSERT INTO work_order VALUES (2, 2); INSERT INTO work_order VALUES (3, 1); |
The following queries resolve the relationship by using an explicit join:
SELECT * FROM vehicle v, work_order wo WHERE v.vehicle_id = wo.vehicle_id; SELECT * FROM vehicle v JOIN work_order wo ON v.vehicle_id = wo.vehicle_id; SELECT * FROM vehicle v JOIN work_order wo USING(vehicle_id); |
A natural query also resolves the relationship by implicitly finding the columns that share column names, as follows:
SELECT * FROM vehicle v NATURAL JOIN work_order wo; |
All of these return a three row result set. The unnatural part of the natural query arrives during maintenance programming when somebody adds a VIN column name to the WORK_ORDER table (let’s say to simplify the ETL to the data warehouse). The code using a natural join isn’t invalidated but its runtime behavior is altered because it now joins both the VEHICLE_ID and VIN columns. Continuing the example, you’d change the table by:
ALTER TABLE work_order ADD (vin VARCHAR2(20)); |
The explicit queries still work after the change. The implicit (natural) query now returns no rows because the VIN column contains all null values, and it is automatically added to the implicit join statement. After updating the VIN column, the query resolves when the data matches but not when the VIN column contains different data. What if the VIN column in WORK_ORDER only contains the make and model portion of the vehicle identification number while the VIN column in VEHICLE contains the complete unique vehicle identifier. You would receive no rows selected from the natural query.
It appears the best thing to do, is avoid natural joins.
I subsequently wrote a post about SQL join semantics. You can find it here.
