Archive for the ‘Database Design’ Category
Data Engineer?
Students often ask me about data engineering. I try to explain some of the aspects, and how the tasks can be organized but I never laid out all the titles. I really like this illustration (click on image for larger size) from the Gartner Group because it does that. You can download the full “What Are the Essential Roles for Data and Analytics” paper here).
An excerpt from Gartner’s paper:
Data Engineer
Data engineering is the practice of making the appropriate data available to various data consumers (including data scientists, data and business analysts, citizen integrators, and line-of-business users). It is a discipline that involves collaboration across business and IT units. This key discipline requires skilled data engineers to support both IT and business teams.
Data engineers are primarily responsible for building, managing and operationalizing data pipelines in support of key D&A use cases. They are also primarily responsible for leading the tedious (and often complex) task of:
- Curating datasets and data pipelines created by nontechnical users (e.g., through self-service data preparation tools), data scientists or even IT resources.
- Operationalizing data delivery for production-level deployments.
I hope the summary is helpful and Gartner’s paper interesting.
Oracle PLS-00103 Gotcha
Teaching PL/SQL can be fun and sometimes challenging when you need to troubleshoot a student error. Take the Oracle PLS-00103 error can be very annoying when it return like this:
24/5 PLS-00103: Encountered the symbol "LV_CURRENT_DATE" WHEN expecting one OF the following: language |
Then, you look at the code and see:
22 23 24 25 | , pv_user_id NUMBER ) IS /* Declare local constants. */ lv_current_date DATE := TRUNC(SYSDATE); |
Obviously, there’s nothing wrong on the line number that the error message pointed. Now, here’s where it gets interesting because of a natural human failing. The student thought they had something wrong with declaring the variable and tested as stand alone procedure and anonymous block. Naturally, they were second guessing what they knew about the PL/SQL.
That’s when years of experience with PL/SQL kicks in to solve the problem. The trick is recognizing two things:
- The error message points to the first line of code in a package body.
- The error is pointing to the first character on the line after the error.
That meant that the package body was incorrectly defined. A quick check to the beginning of the package body showed:
1 2 3 4 5 6 | CREATE OR REPLACE PACKAGE account_creation AS PROCEDURE insert_contact ( pv_first_name VARCHAR2 , pv_middle_name VARCHAR2 := NULL |
The student failed to designate the package as an implementation by omitting the keyword BODY from line 2. The proper definition of the package body should be:
1 2 3 4 5 6 | CREATE OR REPLACE PACKAGE BODY account_creation AS PROCEDURE insert_contact ( pv_first_name VARCHAR2 , pv_middle_name VARCHAR2 := NULL |
That’s the resolution for the error message. I wrote this because I checked if they should have been able to find a helpful article with a google search. I discovered that there wasn’t an answer like this that came up after 10 minutes of various searches.
As always, I hope this helps those writing PL/SQL.
PostgreSQL Trigger 1
This entry covers how to write a statement logging trigger for PostgreSQL. It creates two tables: avenger and avenger_log; one avenger_t1 trigger, and a testing INSERT statement.
It was written to help newbies know how and what to return from a function written for a statement-level trigger. They often get stuck on the following when they try to return true. The term non-composite is another way to describe the tuple inserted.
psql:basics_postgres.sql: 59: ERROR: cannot return non-composite value from function returning composite type CONTEXT: PL/pgSQL function write_avenger_t1() line 15 at RETURN |
The avenger table:
/* Conditionally drop table. */ DROP TABLE IF EXISTS avenger; /* Create table. */ CREATE TABLE avenger ( avenger_id SERIAL , avenger_name VARCHAR(30) , first_name VARCHAR(20) , last_name VARCHAR(20)); |
Seed the avenger table:
/* Seed the avenger table with data. */ INSERT INTO avenger ( first_name, last_name, avenger_name ) VALUES ('Anthony', 'Stark', 'Iron Man') ,('Thor', 'Odinson', 'God of Thunder') ,('Steven', 'Rogers', 'Captain America') ,('Bruce', 'Banner', 'Hulk') ,('Clinton', 'Barton', 'Hawkeye') ,('Natasha', 'Romanoff', 'Black Widow') ,('Peter', 'Parker', 'Spiderman') ,('Steven', 'Strange', 'Dr. Strange') ,('Scott', 'Lange', 'Ant-man'); |
The avenger_log table:
/* Conditionally drop table. */ DROP TABLE IF EXISTS avenger_log; /* Create table. */ CREATE TABLE avenger_log ( avenger_log_id SERIAL , trigger_name VARCHAR(30) , trigger_timing VARCHAR(6) , trigger_event VARCHAR(6) , trigger_type VARCHAR(12)); |
The INSERT statement that tests the trigger:
DROP FUNCTION IF EXISTS avenger_t1_function; CREATE FUNCTION avenger_t1_function() RETURNS TRIGGER AS $$ BEGIN /* Insert a row into the avenger_log table. * Also, see PostrgreSQL 39.9 Trigger Procedures. */ INSERT INTO avenger_log ( trigger_name , trigger_timing , trigger_event , trigger_type ) VALUES ( UPPER(TG_NAME) , TG_WHEN , TG_OP , TG_LEVEL ); /* A statement trigger doesn't use a composite type or tuple, * it should simply return an empty composite type or void. */ RETURN NULL; END; $$ LANGUAGE plpgsql; |
The avenger_t1 statement trigger:
CREATE TRIGGER avenger_t1 BEFORE INSERT ON avenger EXECUTE FUNCTION avenger_t1_function(); |
The INSERT statement:
INSERT INTO avenger ( first_name, last_name, avenger_name ) VALUES ('Hope', 'van Dyne', 'Wasp'); |
The results logged to the avenger_log table from a query:
avenger_log_id | trigger_name | trigger_timing | trigger_event | trigger_type ----------------+--------------+----------------+---------------+-------------- 1 | AVENGER_T1 | BEFORE | INSERT | STATEMENT (1 row) |
As always, I hope this helps those looking for a solution.
Is SQL Programming
Is SQL, or Structured Query Language, a programming language? That’s a great question! A question that many answer with emphasis: “No, SQL is not a programming language!” There are some who answer yes; and they usually qualify that answer with something like: “SQL is a programming language designed to communicate with relational databases.”
It strikes me that those saying “yes” are saying that SQL is only a collection of interface methods to read from and write to a database engine. Those saying SQL is not a programming language often qualify that a programming language must have conditional logic and iterative structures, which don’t exist in SQL.
There’s a third group that are fence sitters. They decline to say whether SQL is a programming language, but they also say individuals who only write SQL aren’t programmers. That’s a bit harsh from my perspective.
Before determining whether SQL is a programming language let’s define a programming language. Let’s define a programming language as a collection of lexical units, or building blocks, that build program units. Lexical units are typically organized as delimiters, identifiers, literals, and comments:
- Delimiters include single or double quotes to identify strings and operators that let you assign and compare values.
- Identifiers are reserved words, keywords, predefined identifiers (like data type names), user-defined variables, subroutines, or types.
- Literals are typically numbers and strings, where some strings qualify as dates because they implement a default format mask to convert strings to dates or date-times.
- Comments are simply delimited text that the program ignores but the programmer uses.
That means a programming language must let you define a variable, assign a value to a variable, iterate across a set of values, and make conditional statements. SQL meets these four conditions, but it does, as a set-programming language, qualify all variables as lists of tuples. Though it is possible to have variables with zero to many elements and one to many members in any given tuple. That means you can assign a literal value to to a one-element list with a single-member tuple, like you would a string or integer to a variable of that type.
As Kris Köhntopp commented, computer science defines a programming language as Turing Complete. As his comment qualifies and the Wikipedia page explains: “Turing completeness in declarative SQL is implemented through recursive common table expressions. Unsurprisingly, procedural extensions to SQL (PLSQL, etc.) are also Turing-complete.” While PostgreSQL introduces recursive query syntax through CTEs, it recently added the search and cycle feature in PostgreSQL 14. The recursive query feature has existed in the Oracle database since Oracle 8, but their documentation calls them hierarchical queries. I wrote a quick a tutorial on hierarchical queries in 2008.
For clarity, define and declare are two words that give grief to some newbies. Let’s qualify what they mean. Declare means to give a variable a name and data type. Define means to declare a variable and assign it a value. Another word for assigning a variable is initializing it. Unassigned variables are automatically assigned a default value or a null dependent on the programming language.
Let’s first declare a local variable, assign it to variable, and display the variable. The following example uses Node.js to define the input variable, assign the input variable to the display variable, and then print the display variable to console. Node.js requires that you assign an empty string to the display variable to define it as a string otherwise its type would be undefined, which is common behavior in dynamically typed languages.
/* Declare the display variable as a string. */ var display = "" /* Define the input variable. */ var input = "Hello World!" /* Assign the input variable contents to the display variable. */ display = input /* Print the display variable contents to console. */ console.log(display) |
It prints:
Hello World! |
Let’s write the same type of program in MySQL. Like the Node.js, there are implementation differences. The biggest difference in MySQL or other relational databases occurs because SQL is a declarative set-based language. That means every variable is a collection of a record structure . You can only mimic a scalar or primitive data type variable by creating a record structure with a single member.
In the case below, there are four processing steps:
- The ‘Hello World!’ literal value is assigned to an input variable.
- The SELECT-list (or comma-delimited set of values in the SELECT clause) is assigned like a tuple to the struct collection variable by treating the query of the literal value as an expression.
- The FROM clause returns the struct collection as the data set or as a derived table.
- The topmost SELECT clause evaluates the struct collection row-by-row, like a loop, and assigns the input member to a display variable.
The query is:
SELECT struct.input AS display FROM (SELECT 'Hello World!' AS input) struct; |
Since the struct collection contains only one element, it displays the original literal value one time, like
+--------------+ | display | +--------------+ | Hello World! | +--------------+ 1 row in set (0.00 sec) |
Let’s update the SQL syntax to the more readable, ANSI 1999 and forward, syntax with a Common Table Expression (CTE). CTEs are implemented by the WITH clause.
WITH struct AS (SELECT 'Hello World!' AS input) SELECT struct.input AS display FROM struct; |
The best thing about CTE values they run one-time and are subsequently available anywhere in your query, subqueries, or correlated subqueries. In short, there’s never an excuse to write a subquery twice in the same query.
Let’s look at loops and if-statements. Having established that we can assign a literal to a variable, re-assign the value from one variable to another, and then display the new variable, let’s assign a set of literal values to an array variable. As before, let’s use Node.js to structure the initial problem.
The program now assigns an array of strings to the input variable, uses a for-loop to read the values from the input array, and uses an if-statement with a regular expression evaluation. The if-statement determines which of the array value meets the condition by using a negating logical expression. That’s because the search() function returns a 0 or greater value when the needle value is found in the string and a -1 when not found. After validating that the needle variable value is found in an input string, the input value is assigned to the display variable.
/* Declare the display variable as a string. */ var display = "" /* Declare a lookup variable. */ var needle = "Goodbye" /* Define the input variable as an array of strings. */ var input = ["Hello World!" ,"Goodbye, Cruel World!" ,"Good morning, too early ..."] /* Read through an array and assign the value that meets * the condition to the display variable. */ for (i = 0; i < input.length; i++) if (!(input[i].search(needle) < 0)) display = input[i] /* Print the display variable contents to console. */ console.log(display) |
Then, it prints the display value:
Goodbye, Cruel World! |
To replicate the coding approach in a query, there must be two CTEs. The needle CTE assigns a literal value of ‘goodbye’ to a one-element collection of a single-member tuple variable. The struct CTE creates a collection of strings by using the UNION ALL operator to append three unique tuples instead of one tuple as found in the early example.
The needle CTE returns a one-element collection of a single-member tuple variable. The struct CTE returns a three-element collection of a single-member tuple, which mimics an array of strings. The needle and struct CTEs return distinct variables with different data types. A cross join operation between the two CTEs puts their results together into the same context. It returns a Cartesian product that:
- Adds a single-row tuples to each row of the query’s result set or derived table.
- Adds a multiple-tuples to each row of the query’s result set or derived table by creating copies of each row (following the Cartesian set theory which multiplies rows and adds columns).
In this case, the Cartesian join adds a one-element needle CTE value to each element, or row, returned by the multiple-element struct CTE and produces the following derived table:
+-----------------------------+---------+ | display | lookup | +-----------------------------+---------+ | Hello World! | goodbye | | Goodbye, cruel world! | goodbye | | Good morning, too early ... | goodbye | +-----------------------------+---------+ 3 rows in set (0.00 sec) |
The following query reads through the CTE collection like a loop and filters out any invalid input values. It uses the MySQL regular expression like function in the WHERE clause, which acts as a conditional or if-statement.
WITH needle AS (SELECT 'goodbye' AS lookup) , struct AS (SELECT 'Hello World!' AS input UNION ALL SELECT 'Goodbye, cruel world!' AS input UNION ALL SELECT 'Good morning, too early ...' AS input) SELECT struct.input AS display FROM struct CROSS JOIN needle WHERE REGEXP_LIKE(struct.input, CONCAT('^.*',needle.lookup,'.*$'),'i'); |
It returns the one display value that meets the criteria:
+-----------------------+ | display | +-----------------------+ | Goodbye, cruel world! | +-----------------------+ 1 row in set (0.00 sec) |
The comparisons of the imperative programming approach in Node.js and declarative programming approach should have established that SQL has all the elements of a programming language. That is, SQL has variable declaration and assignment and both iterative and conditional statements. SQL also has different styles for implementing variable declaration and the examples covered subqueries and CTEs with cross joins placing variables in common scope.
Comparative Approaches:
Next, let’s examine a problem that a programmer might encounter when they think SQL only queries or inserts, updates, or deletes single rows. With that perspective of SQL there’s often a limited perspective on how to write queries. Developers with this skill set level typically write only basic queries, which may include inner and outer joins and some aggregation statements.
Let’s assume the following for this programming assignment:
- A sale table as your data source, and
- A requirement to display the type, number, pre-tax sale amount, and percentage by type.
The sale table definition:
+------------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +------------+--------------+------+-----+---------+----------------+ | sale_id | int unsigned | NO | PRI | NULL | auto_increment | | item_desc | varchar(20) | YES | | NULL | | | unit_price | decimal(8,2) | YES | | NULL | | | serial_no | varchar(10) | YES | | NULL | | +------------+--------------+------+-----+---------+----------------+ |
A basic Node.js program may contain a SQL query that returns the item_desc and unit_price columns while counting the number of serial_no rows and summing the unit_price amounts (that assumes no discount sales, after all its Apple). That type of query leaves calculating the total amount of sales and percentage by type to the Node.js program.
const mysql = require('mysql') const connection = mysql.createConnection({ host: 'localhost', user: 'student', password: 'student', database: 'studentdb' }) connection.connect((err) => { if (err) throw err else { console.log('Connected to MySQL Server!\n') connection.query("SELECT s.item_desc " + ", s.unit_price " + ", COUNT(s.serial_no) AS quantity_sold " + ", SUM(s.unit_price) AS sales " + "FROM sale s " + "GROUP BY s.item_desc " + ", s.unit_price", function (err, result) { if (err) throw err else { // Prints the index value in the RowDataPacket. console.log(result) connection.end() }})} }) |
This program would return a JSON structure, like:
[ RowDataPacket { item_desc: 'MacBook Pro 16', unit_price: 2499, quantity_sold: 16, sales: 39984 }, ... RowDataPacket { item_desc: 'MacBook Air M1', unit_price: 999, quantity_sold: 22, sales: 21978 } ] |
While the remaining JavaScript code isn’t difficult to write, it’s unnecessary effort if the developer knew SQL well enough to program in it. The developer could simply re-write the query like the following and return the percentage by type value in the base JSON structure.
WITH sales AS (SELECT SUM(unit_price) AS total FROM sale) SELECT s.item_desc , s.unit_price , COUNT(s.serial_no) AS quantity_sold , SUM(s.unit_price) AS sales , CONCAT(FORMAT((s.unit_price * COUNT(s.serial_no))/sales.total * 100,2),'%') AS percentage FROM sale s CROSS JOIN sales GROUP BY s.item_desc , s.unit_price , sales.total; |
The query uses the sales CTE to calculate and define a tuple with the total sales and adds a derived column calculating the percentage by type of device. It’s probably important to note that aggregation rules require you add the sales.total CTE tuple to the group by clause.
The new query returns this JSON list:
[ RowDataPacket { item_desc: 'MacBook Pro 16', unit_price: 2499, quantity_sold: 16, sales: 39984, percentage: '17.70%' }, ... RowDataPacket { item_desc: 'MacBook Air M1', unit_price: 999, quantity_sold: 22, sales: 21978, percentage: '9.73%' } ] |
The developer would get a complete JSON list when the new query replaces the old. It also would eliminate the need to write additional JavaScript to calculate the percentage by type of device.
Conclusions:
Leveraging the programming power of SQL is frequently possible in many frontend and backend programming solutions. However, the programming power of SQL is infrequently found in programming solutions. That leaves me to ask: “Is it possible that the almost systemic failure to leverage the programming capabilities of SQL is a result of biases by instructors and mentors to their own limited skill sets?” That likely might be true if their instructors and mentors held the belief that: “No, SQL is not a programming language!”
Candidly, folks that write SQL at the programming level almost always have concurrent mastery in two or more imperative programming languages. They’re probably the ones who say, “SQL is a programming language designed to communicate with relational databases.”
Who are those pesky fence sitters? You remember those, don’t you. They’re the ones who declined to take a position on whether SQL is a programming language. Are they the developers who are still learning, and those without an entrenched, preconceived, or learned bias? Or, do they wonder if SQL is Turing complete?
Logging Table Function
It is interesting when somebody remembers a presentation from 10 years ago. They asked if it was possible in PL/pgSQL to write an autonomous procedure to log data when calling a table view function. The answer is two fold. PL/pgSQL doesn’t support autonomous functions or procedures like the Oracle database but it doesn’t need to because unless you invoke a transaction it auto commits writes.
Logging table functions are important for security auditing and compliance management against laws, like SOX, HIPAA, and FERPA. All too many systems lack the basic ability to audit who queries records without raising an error and blocking the access. That means the bad actor or actress gains the ability to probe the system for weaknesses before determining an attack vector. It’s often better to capture the unauthorized access and take direct action to protect both the the data and systems.
While the example lets an unauthorized person access the information in the first version of the student_query, it blocks access by reporting no rows returned in the latter. Both versions of the query log the data and thereby collect the evidence necessary to act against the hack.
This blog post shows you how to write it and test it. Follow the following steps:
- Create the necessary tables and data to work with a logging PL/pgSQL table view function:
/* Conditionally drop and create table. */ DROP TABLE IF EXISTS student; CREATE TABLE student ( student_id SERIAL , first_name VARCHAR(20) , last_name VARCHAR(20) , hogwarts_house VARCHAR(10)); /* Conditionally drop and create table. */ DROP TABLE IF EXISTS logger; CREATE TABLE logger ( logger_id SERIAL , app_user VARCHAR(30) , queried_student VARCHAR(30) , query_time TIMESTAMP ); /* Insert one record into table. */ INSERT INTO student ( first_name, last_name, hogwarts_house ) VALUES ( 'Harry', 'Potter', 'Gryffindor' ) ,( 'Hermione', 'Granger', 'Gryffindor' ) ,( 'Ronald', 'Weasily', 'Gryffindor' ) ,( 'Draco', 'Malfoy', 'Slytherin' ) ,( 'Vincent', 'Crabbe', 'Slytherin' ) ,( 'Susan', 'Bones', 'Hufflepuff' ) ,( 'Hannah', 'Abbott', 'Hufflepuff' ) ,( 'Luna', 'Lovegood', 'Ravenclaw' ) ,( 'Cho', 'Chang', 'Ravenclaw' ) ,( 'Gilderoy', 'Lockhart', 'Ravenclaw' );
- While not necessary if you’re very familiar with PL/pgSQL, it may be helpful to review:
- The SET command that lets you assign a value to a session-level variable, which you can later use in a PL/pgSQL block.
- The SELECT-INTO statement in a DO-block.
Here’s a test script that demonstrates both:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
/* Set a session-level variable. */ SET credential.app_user = 'Draco Malfoy'; /* Secure the value from a session-level variable. */ SELECT current_setting('credential.app_user'); /* DO $$ DECLARE input VARCHAR(30) := 'Hermione'; output VARCHAR(30); BEGIN /* Sample for partial name construction of full name. */ SELECT CONCAT(s.first_name, ' ', s.last_name) AS student_name INTO output FROM student s WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||input||'%'; /* Show result of local assignment via a query. */ RAISE NOTICE '[%][%]', current_setting('credential.app_user'), output; END; $$;
There’s an important parsing trick to this sample program. It uses the LIKE operator rather than the SIMILAR TO operator because the parser fails to recognize the SIMILAR TO operator.
The DO-block returns the following output:
NOTICE: [Draco Malfoy][Hermione Granger]
- This creates the student_query logging table function, which takes a partial portion of a students first and last name to return the student information. While the example only returns the name and the Hogwarts House it lays a foundation for a more complete solution.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
CREATE OR REPLACE FUNCTION student_query (partial_name VARCHAR) RETURNS TABLE ( first_naem VARCHAR(20) , last_name VARCHAR(20) , hogwarts_house VARCHAR(10) ) AS $$ DECLARE queried VARCHAR; by_whome VARCHAR; BEGIN /* Query separately because embedding in insert statement fails. */ SELECT CONCAT(s.first_name, ' ', s.last_name) AS student_name FROM student s INTO queried WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%'; /* Log the query with the credentials of the user. */ INSERT INTO logger ( app_user , queried_student , query_time ) VALUES ( current_setting('credential.app_user') , queried , NOW()); /* Return the result set without disclosing the query was recorded. */ RETURN QUERY SELECT s.first_name , s.last_name , s.hogwarts_house FROM student s WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%'; END; $$ LANGUAGE plpgsql;
- You can test the function by calling it, like this:
SELECT * FROM student_query('Hermione');
It displays:
first_naem | last_name | hogwarts_house ------------+-----------+---------------- Hermione | Granger | Gryffindor (1 row)
You can check the logging table and discover who looked up another student’s records.
SELECT * FROM logger;
It displays:
logger_id | app_user | queried_student | query_time -----------+--------------+------------------+---------------------------- 1 | Draco Malfoy | Hermione Granger | 2022-05-29 22:51:50.398987 (1 row)
- Assuming you’ve built an authorized_user function that returns a Boolean, you can add a call to it in the WHERE clause. For simplicity, let’s implement the function to deny all users, like:
1 2 3 4 5 6 7 8 9 10
CREATE OR REPLACE FUNCTION authorized_user (user_name VARCHAR) RETURNS BOOLEAN AS $$ DECLARE lv_retval BOOLEAN := FALSE; BEGIN RETURN lv_retval; END; $$ LANGUAGE plpgsql;
You can now replace the query on lines 28 through 32 with the new one below. The added clause on line 33 denies access to unauthorized users because there aren’t any.
28 29 30 31 32 33
SELECT s.first_name , s.last_name , s.hogwarts_house FROM student s WHERE CONCAT(s.first_name, ' ', s.last_name) LIKE '%'||partial_name||'%' AND authorized_user(current_setting('credential.app_user'));
While it returns:
first_naem | last_name | hogwarts_house ------------+-----------+---------------- (0 rows)
The logger table shows two entries. One for the query that returned a value and one for the version that didn’t.
logger_id | app_user | queried_student | query_time -----------+--------------+------------------+---------------------------- 1 | Draco Malfoy | Hermione Granger | 2022-05-29 23:23:39.82063 2 | Draco Malfoy | Hermione Granger | 2022-05-29 23:23:40.736945 (2 rows)
In both cases the bad actor Draco Malfoy’s unauthorized access is captured and he was denied any information without alerting him to the security precaution in a logging table function.
As always, I hope this helps those looking for this type of solution.
PL/pgSQL List to Struct
This blog post addresses how to convert a list of values into a structure (in C/C++ its a struct, in Java its an ArrayList, and PL/pgSQL it’s an array of a type). The cast_strings function converts a list of strings into a record data structure. It calls the verify_date function to identify a DATE data type and uses regular expressions to identify numbers and strings.
You need to build the struct type below first.
CREATE TYPE struct AS ( xnumber DECIMAL , xdate DATE , xstring VARCHAR(100)); |
The cast_strings function is defined below:
CREATE FUNCTION cast_strings ( pv_list VARCHAR(10)[] ) RETURNS struct AS $$ DECLARE /* Declare a UDT and initialize an empty struct variable. */ lv_retval STRUCT := (null, null, null); BEGIN /* Loop through list of values to find only the numbers. */ FOR i IN 1..ARRAY_LENGTH(pv_list,1) LOOP /* Order if statements by evaluation. */ CASE /* Check for a value with only digits. */ WHEN lv_retval.xnumber IS NULL AND REGEXP_MATCH(pv_list[i],'^[0-9]+$') IS NOT NULL THEN lv_retval.xnumber := pv_list[i]; /* Check for a valid date. */ WHEN lv_retval.xdate IS NULL AND verify_date(pv_list[i]) IS NOT NULL THEN lv_retval.xdate := pv_list[i]; /* Check for a string with characters, whitespace, and digits. */ WHEN lv_retval.xstring IS NULL AND REGEXP_MATCH(pv_list[i],'^[A-Za-z 0-9]+$') IS NOT NULL THEN lv_retval.xstring := pv_list[i]; ELSE NULL; END CASE; END LOOP; /* Print the results. */ RETURN lv_retval; END; $$ LANGUAGE plpgsql; |
There are two test cases for the cast_strings function. One uses a DO-block and the other a query.
- The first use-case checks with a DO-block:
DO $$ DECLARE lv_list VARCHAR(11)[] := ARRAY['86','1944-04-25','Happy']; lv_struct STRUCT; BEGIN /* Pass the array of strings and return a record type. */ lv_struct := cast_strings(lv_list); /* Print the elements returned. */ RAISE NOTICE '[%]', lv_struct.xnumber; RAISE NOTICE '[%]', lv_struct.xdate; RAISE NOTICE '[%]', lv_struct.xstring; END; $$;
It should return:
psql:verify_pg.SQL:263: NOTICE: [86] psql:verify_pg.SQL:263: NOTICE: [1944-04-25] psql:verify_pg.SQL:263: NOTICE: [Happy]
The program returns a structure with values converted into their appropriate data type.
- The second use-case checks with a query:
WITH get_struct AS (SELECT cast_strings(ARRAY['99','2015-06-14','Agent 99']) AS mystruct) SELECT (mystruct).xnumber , (mystruct).xdate , (mystruct).xstring FROM get_struct;
It should return:
xnumber | xdate | xstring ---------+------------+---------- 99 | 2015-06-14 | Agent 99 (1 row)
The query defines a call to the cast_strings function with a valid set of values and then displays the elements of the returned structure.
As always, I hope this helps those looking for how to solve this type of problem. Just a quick reminder that this was written and tested in PostgreSQL 14.
PL/SQL List to Struct
Every now and then, I get questions from folks about how to tune in-memory elements of their PL/SQL programs. This blog post address one of those core issues that some PL/SQL programmers avoid.
Specifically, it addresses how to convert a list of values into a structure (in C/C++ its a struct, in Java its an ArrayList, and PL/SQL it’s a table of scalar or object types). Oracle lingo hides the similarity by calling either an Attribute Definition Type (ADT) or User-Defined Type (UDT). The difference in the Oracle space is that an ADT deals with a type defined in DBMS_STANDARD package, which is more or less like a primitive type in Java.
Oracle does this for two reasons:
- They handle lists of standard types in a difference C++ class than they do UDT types.
- They rigidly adhere to Interface Definition Language (IDL) principles.
The cast_strings function converts a list of strings into a record data structure. It lets the list of strings have either a densely or sparsely populated list of values, and it calls the verify_date function to identify a DATE data type and regular expressions to identify numbers and strings.
You need to build a UDT object type and lists of both ADT and UDT data types.
/* Create a table of strings. */ CREATE OR REPLACE TYPE tre AS TABLE OF VARCHAR2(20); / /* Create a structure of a date, number, and string. */ CREATE OR REPLACE TYPE struct IS OBJECT ( xdate DATE , xnumber NUMBER , xstring VARCHAR2(20)); / /* Create a table of tre type. */ CREATE OR REPLACE TYPE structs IS TABLE OF struct; / |
The cast_strings function is defined below:
CREATE OR REPLACE FUNCTION cast_strings ( pv_list TRE ) RETURN struct IS /* Declare a UDT and initialize an empty struct variable. */ lv_retval STRUCT := struct( xdate => NULL , xnumber => NULL , xstring => NULL); BEGIN /* Loop through list of values to find only the numbers. */ FOR i IN 1..pv_list.LAST LOOP /* Ensure that a sparsely populated list can't fail. */ IF pv_list.EXISTS(i) THEN /* Order if number evaluation before string evaluation. */ CASE WHEN lv_retval.xnumber IS NULL AND REGEXP_LIKE(pv_list(i),'^[[:digit:]]*$') THEN lv_retval.xnumber := pv_list(i); WHEN verify_date(pv_list(i)) THEN IF lv_retval.xdate IS NULL THEN lv_retval.xdate := pv_list(i); ELSE lv_retval.xdate := NULL; END IF; WHEN lv_retval.xstring IS NULL AND REGEXP_LIKE(pv_list(i),'^[[:alnum:]]*$') THEN lv_retval.xstring := pv_list(i); ELSE NULL; END CASE; END IF; END LOOP; /* Print the results. */ RETURN lv_retval; END; / |
There are three test cases for this function:
- The first use-case checks whether the input parameter is a sparsely or densely populated list:
DECLARE /* Declare an input variable of three or more elements. */ lv_list TRE := tre('Berlin','25','09-May-1945','45'); /* Declare a variable to hold the compound type values. */ lv_struct STRUCT; BEGIN /* Make the set sparsely populated. */ lv_list.DELETE(2); /* Test the cast_strings function. */ lv_struct := cast_strings(lv_list); /* Print the values of the compound variable. */ dbms_output.put_line(CHR(10)); dbms_output.put_line('xstring ['||lv_struct.xstring||']'); dbms_output.put_line('xdate ['||TO_CHAR(lv_struct.xdate,'DD-MON-YYYY')||']'); dbms_output.put_line('xnumber ['||lv_struct.xnumber||']'); END; /
It should return:
xstring [Berlin] xdate [09-MAY-1945] xnumber [45]
The program defines two numbers and deletes the first number, which is why it prints the second number.
- The second use-case checks with a list of only one element:
SELECT TO_CHAR(xdate,'DD-MON-YYYY') AS xdate , xnumber , xstring FROM TABLE(structs(cast_strings(tre('catch22','25','25-Nov-1945'))));
It should return:
XDATE XNUMBER XSTRING -------------------- ---------- -------------------- 25-NOV-1945 25 catch22
The program returns a structure with values converted into their appropriate data type.
- The third use-case checks with a list of two elements:
SELECT TO_CHAR(xdate,'DD-MON-YYYY') AS xdate , xnumber , xstring FROM TABLE(structs(cast_strings(tre('catch22','25','25-Nov-1945')) ,cast_strings(tre('31-APR-2017','1918','areodromes'))));
It should return:
XDATE XNUMBER XSTRING -------------------- ---------- -------------------- 25-NOV-1945 25 catch22 1918 areodromes
The program defines calls the cast_strings with a valid set of values and an invalid set of values. The invalid set of values contains a bad date in the set of values.
As always, I hope this helps those looking for how to solve this type of problem.
PL/SQL CASE Not Found
I was working on some test cases for my students and changing the behavior of a verify_date function that I wrote years ago to validate and returns valid dates when they’re passed as strings. The original program returned today’s date when the date was invalid.
The new function returns a BOOLEAN value of false by default and true when the string validates as a date. Unfortunately, I introduced a mistake that didn’t use to exist in Oracle 11g, which was the version when I wrote the original function.
The test cases in Oracle 21c raises the following error when an invalid date is passed to the CASE statement by the cast_strings function that calls the new verify_date function:
FROM TABLE(structs(cast_strings(tre('31-APR-2017','1917','dirk')))) * ERROR AT line 2: ORA-06592: CASE NOT found WHILE executing CASE statement ORA-06512: AT "C##STUDENT.VERIFY_DATE", line 30 ORA-06512: AT "C##STUDENT.CAST_STRINGS", line 18 |
As you can see, the test case uses ’31-APR-2017′ as an incorrect date to verify the use-case. The error occurred because the ELSE clause in the CASE statement wasn’t provided. Previously, the ELSE clause was optional and setting the lv_retval return variable to FALSE in the DECLARE block made it unnecessary.
The fixed code follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | CREATE OR REPLACE FUNCTION verify_date ( pv_date_in VARCHAR2) RETURN BOOLEAN IS /* Local variable to ensure case-insensitive comparison. */ lv_date_in VARCHAR2(11); /* Local return variable. */ lv_date BOOLEAN := FALSE; BEGIN /* Convert string input to uppercase month. */ lv_date_in := UPPER(pv_date_in); /* Check for a DD-MON-RR or DD-MON-YYYY string. */ IF REGEXP_LIKE(lv_date_in,'^[0-9]{2,2}-[ADFJMNOS][ACEOPU][BCGLNPRTVY]-([0-9]{2,2}|[0-9]{4,4})$') THEN /* Case statement checks for 28 or 29, 30, or 31 day month. */ CASE /* Valid 31 day month date value. */ WHEN SUBSTR(lv_date_in,4,3) IN ('JAN','MAR','MAY','JUL','AUG','OCT','DEC') AND TO_NUMBER(SUBSTR(pv_date_in,1,2)) BETWEEN 1 AND 31 THEN lv_date := TRUE; /* Valid 30 day month date value. */ WHEN SUBSTR(lv_date_in,4,3) IN ('APR','JUN','SEP','NOV') AND TO_NUMBER(SUBSTR(pv_date_in,1,2)) BETWEEN 1 AND 30 THEN lv_date := TRUE; /* Valid 28 or 29 day month date value. */ WHEN SUBSTR(lv_date_in,4,3) = 'FEB' THEN /* Verify 2-digit or 4-digit year. */ IF (LENGTH(pv_date_in) = 9 AND MOD(TO_NUMBER(SUBSTR(pv_date_in,8,2)) + 2000,4) = 0 OR LENGTH(pv_date_in) = 11 AND MOD(TO_NUMBER(SUBSTR(pv_date_in,8,4)),4) = 0) AND TO_NUMBER(SUBSTR(pv_date_in,1,2)) BETWEEN 1 AND 29 THEN lv_date := TRUE; ELSE /* Not a leap year. */ IF TO_NUMBER(SUBSTR(pv_date_in,1,2)) BETWEEN 1 AND 28 THEN lv_date := TRUE; END IF; END IF; ELSE NULL; END CASE; END IF; /* Return date. */ RETURN lv_date; EXCEPTION WHEN VALUE_ERROR THEN RETURN lv_date; END; / |
The new ELSE clause in on lines 31 and 32, and the converted function works. I also added a local lv_date_in variable to hold an uppercase version of an input string to: ensure a case-insensitive comparison of the month value, and avoid a having to pass the input as an IN OUT mode parameter. Typically, I leave off exception handlers because mistyping or copying for newer programmers becomes easier, but in this case I added an exception handler for strings that are larger than 11-characters.
As always, I hope this helps those looking for a solution to a coding problem.
PL/SQL List Function
Students wanted to see how to write PL/SQL functions that accept, process, and return lists of values. I thought it would be cool to also demonstrate coupling of loop behaviors and wrote the example using the 12-Days of Christmas lyrics.
The twelve_days function accepts two different collections. One is an Attribute Data Type (ADT) and the other a User-Defined Type (UDT). An ADT is based on a scalar data type, and a UDT is based on an object type. Object types are basically data structures, and they support both positional and named notation for variable assignments.
The twelve_days function returns a list of string, which is an ADT of the VARCHAR2 data type. Creating the ADT types is easy and a single step, like:
/* Create a days object type. */ CREATE OR REPLACE TYPE days IS TABLE OF VARCHAR2(8); / /* Create a string object type. */ CREATE OR REPLACE TYPE song IS TABLE OF VARCHAR2(36); / |
Creating the UDT is more complex and requires two steps. You need to create the UDT object type, or structure, and then the list based on the UDT object type, like:
/* Create a lyric object type. */ CREATE OR REPLACE TYPE lyric IS OBJECT ( DAY VARCHAR2(8) , gift VARCHAR2(24)); / /* Create a lyrics object type. */ CREATE OR REPLACE TYPE lyrics IS TABLE OF LYRIC; / |
Now, you can create the twelve_days function that uses these ADT and UDT types, like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | CREATE OR REPLACE FUNCTION twelve_days ( pv_days DAYS , pv_gifts LYRICS ) RETURN song IS /* Initialize the collection of lyrics. */ lv_retval SONG := song(); /* Local procedure to add to the song. */ PROCEDURE ADD ( pv_input VARCHAR2 ) IS BEGIN lv_retval.EXTEND; lv_retval(lv_retval.COUNT) := pv_input; END ADD; BEGIN /* Read forward through the days. */ FOR i IN 1..pv_days.COUNT LOOP ADD('On the ' || pv_days(i) || ' day of Christmas'); ADD('my true love sent to me:'); /* Read backward through the lyrics based on the ascending value of the day. */ FOR j IN REVERSE 1..i LOOP IF i = 1 THEN ADD('-'||'A'||' '||pv_gifts(j).gift); ELSE ADD('-'||pv_gifts(j).DAY||' '||pv_gifts(j).gift); END IF; END LOOP; /* A line break by verse. */ ADD(CHR(13)); END LOOP; /* Return the song's lyrics. */ RETURN lv_retval; END; / |
You may notice the local add procedure on lines 10 thru 15. It lets you perform the two tasks required for populating an element in a SQL object type list in one line in the main body of the twelve_days function.
The add procedure first uses the EXTEND function to allocate space before assigning the input value to the newly allocated element in the list. Next, you can call the function inside the following SQL query:
SELECT column_value AS "12-Days of Christmas" FROM TABLE(twelve_days(days('first','second','third','fourth' ,'fifth','sixth','seventh','eighth' ,'nineth','tenth','eleventh','twelfth') ,lyrics(lyric(DAY => 'and a', gift => 'Partridge in a pear tree') ,lyric(DAY => 'Two', gift => 'Turtle doves') ,lyric(DAY => 'Three', gift => 'French hens') ,lyric(DAY => 'Four', gift => 'Calling birds') ,lyric(DAY => 'Five', gift => 'Golden rings' ) ,lyric(DAY => 'Six', gift => 'Geese a laying') ,lyric(DAY => 'Seven', gift => 'Swans a swimming') ,lyric(DAY => 'Eight', gift => 'Maids a milking') ,lyric(DAY => 'Nine', gift => 'Ladies dancing') ,lyric(DAY => 'Ten', gift => 'Lords a leaping') ,lyric(DAY => 'Eleven',gift => 'Pipers piping') ,lyric(DAY => 'Twelve',gift => 'Drummers drumming')))); |
It will print:
12-Days of Christmas ------------------------------------ On the first day of Christmas my true love sent to me: -A Partridge in a pear tree On the second day of Christmas my true love sent to me: -Two Turtle doves -and a Partridge in a pear tree On the third day of Christmas my true love sent to me: -Three French hens -Two Turtle doves -and a Partridge in a pear tree ... redacted for space ... On the twelfth day of Christmas my true love sent to me: -Twelve Drummers drumming -Eleven Pipers piping -Ten Lords a leaping -Nine Ladies dancing -Eight Maids a milking -Seven Swans a swimming -Six Geese a laying -Five Golden rings -Four Calling birds -Three French hens -Two Turtle doves -and a Partridge in a pear tree |
As always, I hope the example helps those looking for a solution to this type of problem.
Oracle ODBC DSN
As I move forward with trying to build an easy to use framework for data analysts who use multiple database backends and work on Windows OS, here’s a complete script that lets you run any query stored in a file to return a CSV file. It makes the assumption that you opted to put the user ID and password in the Windows ODBC DSN, and only provides the ODBC DSN name to make the connection to the ODBC library and database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 | # A local function for verbose reporting. function Get-Message ($param, $value = $null) { if (!($value)) { Write-Host "Evaluate swtich [" $param "]" } else { Write-Host "Evaluate parameter [" $param "] and [" $value "]" } } # Read SQLStatement file and minimally parse it. function Get-SQLStatement ($sqlStatement) { # Set localvariable for return string value. $statement = "" # Read a file line-by-line. foreach ($line in Get-Content $sqlStatement) { # Use regular expression to replace multiple whitespace. $line = $line -replace '\s+', ' ' # Add a whitespace to avoid joining keywords from different lines; # and remove trailing semicolons which are unneeded. if (!($line.endswith(";"))) { $statement += $line + " " } else { $statement += $line.trimend(";") } } # Returned minimally parsed statement. return $statement } # Set default type of SQL statement value to a query. $stmt = "select" # Set a variable to hold a SQL statement from a file. $query = "" # Set default values for SQL input and output files. $outFile = "output.csv" $sqlFile = "query.sql" # Set default path to: %USERPROFILE%\AppData\Local\Temp folder, but ir # the tilde (~) in lieu of the %USERPROFILE% environment variable value. $path = "~\AppData\Local\Temp" # Set a verbose switch. $verbose = $false # Wrap the Parameter call to avoid a type casting warning. try { param ( [Parameter(Mandatory)][hashtable]$args ) } catch {} # Check for switches and parameters with arguments. for ($i = 0; $i -lt $args.count; $i += 1) { if (($args[$i].startswith("-")) -and ($args[$i + 1].startswith("-"))) { if ($args[$i] = "-v") { $verbose = $true } # Print to verbose console. if ($verbose) { Get-Message $args[$i] }} elseif ($args[$i].startswith("-")) { # Print to verbose console. if ($verbose) { Get-Message $args[$i] $args[$i + 1] } # Evaluate and take action on parameters and values. if ($args[$i] -eq "-o") { $outfile = $args[$i + 1] } elseif ($args[$i] -eq "-q") { $sqlFile = $args[$i + 1] } elseif ($args[$i] -eq "-p") { $path = $args[$i + 1] } } } # Set a PowerShell Virtual Drive. New-PSDrive -Name folder -PSProvider FileSystem -Description 'Forder Location' ` -Root $path | Out-Null # Remove the file only when it exists. if (Test-Path folder:$outFile) { Remove-Item -Path folder:$outFile } # Read SQL file into minimally parsed string. if (Test-Path folder:$sqlFile) { $query = Get-SQLStatement $sqlFile } # Set a ODBC DSN connection string. $ConnectionString = 'DSN=OracleGeneric' # Set an Oracle Command Object for a query. $Connection = New-Object System.Data.Odbc.OdbcConnection; $Connection.ConnectionString = $ConnectionString # Attempt connection. try { $Connection.Open() # Create a SQL command. $Command = $Connection.CreateCommand(); $Command.CommandText = $query; # Attempt to read SQL command. try { $row = $Command.ExecuteReader(); # Read while records are found. while ($row.Read()) { # Initialize output for each row. $output = "" # Navigate across all columns (only two in this example). for ($column = 0; $column -lt $row.FieldCount; $column += 1) { # Mechanic for comma-delimit between last and first name. if ($output.length -eq 0) { $output += $row[$column] } else { $output += ", " + $row[$column] } } # Write the output from the database to a file. Add-Content -Value $output -Path folder:$outFile } } catch { Write-Error "Message: $($_.Exception.Message)" Write-Error "StackTrace: $($_.Exception.StackTrace)" Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)" } finally { # Close the reader. $row.Close() } } catch { Write-Error "Message: $($_.Exception.Message)" Write-Error "StackTrace: $($_.Exception.StackTrace)" Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)" } finally { $Connection.Close() } |
You can use a command-line call like this:
powershell ./OracleContact.ps1 -v -o output.csv -q script.sql -p . |
It produces the following verbose output to the console:
Evaluate swtich [ -v ] Evaluate parameter [ -o ] and [ output.csv ] Evaluate parameter [ -q ] and [ script.sql ] Evaluate parameter [ -p ] and [ . ] |
You can suppress printing to the console by eliminating the -v switch from the parameter list.
As always, I hope this helps those looking for a solution to less tedious interactions with the Oracle database.