MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Preprocessing External Tables

without comments

A question that comes up now and again is there a way in Oracle Database 11g Express Edition to mimic some behavior in the Oracle Standard or Enterprise editions. Many of these questions arise because developers want to migrate a behavior they’ve implemented in Java to the Express Edition. Sometimes the answer is no but many times the answer is yes. The yes answers come with a how.

This article answers the question: “How can I read an operating systems’ file directory with out an embedded Java Virtual Machine (JVM)?” These developers have read or implemented logic like that found in my earlier “Using DBMS_JAVA to Read External Files” article. The answer is simple. You need to use a preprocessing script inside an external table. That’s what you will learn in this article, but if you’re not familiar with external tables you should read this other “External Tables” article.

External tables let you access plain text files with SQL*Loader or Oracle’s proprietary Data Pump files. You typically create external tables with Oracle Data Pump when you’re moving large data sets between database instances.

External tables use Oracle’s virtual directories. An Oracle virtual directory is an internal reference in the data dictionary. A virtual directory maps a unique directory name to a physical directory on the local operating system. Virtual directories were simple before Oracle Database 12c gave us the multitenant architecture. In a multitenant database there are two types of virtual directories. One services the schemas of the Container Database (CDB) and it’s in the CDB’s SYS schema. The other services the schemas of a Pluggable Database (PDB) and it’s in the ADMIN schema for the PDB.

You can create a CDB virtual database as SYSTEM user with the following syntax in Windows:

SQL> CREATE DIRECTORY upload AS 'C:\Data\Upload';

or, like this in Linux or Unix:

SQL> CREATE DIRECTORY upload AS '/u01/app/oracle';

There are some subtle differences between these two statements. Windows directories or folders start with a logical drive letter, like C:\, D:\, and so forth. Linux and Unix directories start with a mount point like /u01.

As you can read in the “External Tables” article, you need to change the ownership of external files and directories to the oracle user and, default, oracle user’s default dba group. Likewise, you should change the privilege of the containing directory to 755 (owner has read, write, and execute privileges; and group and others have read and execute privileges.

The balance of this article is broken into two pieces configuring a working external table with preprocessing and troubleshooting cartridge errors.

External Tables with Preprocessing Example

There are xxx database steps to creating this example. The first database step requires you create three virtual directories. The syntax for the three statements is:

SQL> CREATE DIRECTORY upload AS '/u01/app/oracle/upload';
SQL> CREATE DIRECTORY LOG AS '/u01/app/oracle/log';
SQL> CREATE DIRECTORY preproc AS '/u01/app/oracle/preproc';

The upload directory hosts the files you want to discover for upload. The log directory hosts the log files for the external tables. The preproc directory hosts the executable program, which generates a list of files currently in the upload directory.

After creating the virtual directories or before creating them, you should create the physical directories in the Linux operating system. The virtual directories can only point to something when it actually exists. Moreover, they work like Oracle’s synonyms that point to other objects in the database. The physical files need to be in a directory tree that is navigable by the oracle user and the oracle user and it’s default primary dba group needs to own them.

You can use the following command to change ownership when you’re the root user:

# chown –R oracle:dba /u01/app/oracle

The second database step requires that you grant privileges on the virtual directories to the student user. You can do that with the following syntax:

SQL> GRANT read ON DIRECTORY upload;
SQL> GRANT read, WRITE ON DIRECTORY LOG;
SQL> GRANT read, EXECUTE ON DIRECTORY preproc;

The upload directory requires read-only privileges. The log directory requires read and write privileges. The read privileges let it find files and the write privilege lets it append to log files when they already exist. The preproc directory requires read and execute privileges. The read privilege is the same as that explained earlier. The execute privilege lets you run the preprocessing program file.

The third database step requires creating an external file with preprocessing. The following script creates the sample table:

SQL> CREATE TABLE directory_list
  2  ( file_name  VARCHAR2(60))
  3  ORGANIZATION EXTERNAL
  4  ( TYPE oracle_loader
  5    DEFAULT DIRECTORY preproc
  6    ACCESS PARAMETERS
  7    ( RECORDS DELIMITED BY NEWLINE CHARACTERSET US7ASCII
  8	 PREPROCESSOR preproc:'list2dir.sh'
  9	 BADFILE     'LOG':'dir.bad'
 10	 DISCARDFILE 'LOG':'dir.dis'
 11	 LOGFILE     'LOG':'dir.log'
 12	 FIELDS TERMINATED BY ','
 13	 OPTIONALLY ENCLOSED BY "'"
 14	 MISSING FIELD VALUES ARE NULL)
 15    LOCATION ('list2dir.sh'))
 16 REJECT LIMIT UNLIMITED;

Line 5 designates the default directory as preproc because the location of the executable file should be in the preproc directory. Line 8 designates that there is a preprocessing step, and it identifies the virtual directory and physical file name inside single quotes. Line 15 identifies the source file for the external table, which is an executable program.

Next, you need to create the bash file to get and return a directory list. Before you write that file, you need to understand that preprocessing script files don’t inherit a $PATH environment variable from Oracle.

That probably means you might have tried to create a simple bash shell command like the following in a list2dir.sh file.

ls /u01/app/oracle/upload | find . -type f | ls *csv | sed -e 's/\.\///'

When you test this file by calling it from SQL, like this:

SQL> SELECT * FROM directory_list;

It raises the following exception stack:

SELECT * FROM directory_list
*
ERROR AT line 1:
ORA-29913: error IN executing ODCIEXTTABLEFETCH callout
ORA-29400: data cartridge error
KUP-04095: preprocessor command /u01/app/oracle/preprocess/list2dir.sh
encountered error "/u01/app/oracle/preprocess/list2dir.sh: line 1: ls: No such file or directory

The reason isn’t immediately clear to some developers. The significant error is:

ls: No such file or directory

The error message indicates that a call through Oracle’s OCI call interface cannot find the location of the ls program. That occurs because there is no $PATH variable set a list of values that points to the /usr/bin directory where you find the ls program. You need to prepend /usr/bin before the ls, find, and sed programs.

/usr/bin/ls /u01/app/oracle/upload | /usr/bin/find . -type f | /usr/bin/ls *csv | /usr/bin/sed -e 's/\.\///'

Create a list2dir.sh file in the /u01/app/oracle/preproc directory with the preceding command line. Then, make sure oracle is the owner with a primary dba group and the privileges are 755 on the file. The command to set the privileges is:

# chmod –R 755 /u01/app/oracle/preproc.sh

Having completed that Linux operating system step you should probably put some files in the upload directory. You can create empty files with the touch command at the linux command line for this example.

The fourth database step lets you query the external table, which runs the preprocessing program and returns its results as values in the table:

SQL> CREATE * FROM directory_list;

It should return something like this:

FILE_NAME
------------------------------
character.csv
transaction_upload2.csv
transaction_upload.csv

As always, this is written to help those solve problems.

Written by maclochlainn

November 11th, 2018 at 10:54 pm