MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘Database Design’ Category

Transaction Management

without comments

Transaction Management

Learning Outcomes

  • Learn how to use Multiversion Concurrency Control (MVCC).
  • Learn how to manage ACID-compliant transactions.
  • Learn how to use:

    • SAVEPOINT Statement
    • COMMIT Statement
    • ROLLBACK Statement

Lesson Material

Transaction Management involves two key components. One is Multiversion Concurrency Control (MVCC) so one user doesn’t interfere with another user. The other is data transactions. Data transactions packag SQL statements in the scope of an imperative language that uses Transaction Control Language (TCL) to extend ACID-compliance from single SQL statements to groups of SQL statements.

Multiversion Concurrency Control (MVCC)

Multiversion Concurrency Control (MVCC) uses database snapshots to provide transactions with memory-persistent copies of the database. This means that users, via their SQL statements, interact with the in-memory copies of data rather than directly with physical data. MVCC systems isolate user transactions from each other and guarantee transaction integrity by preventing dirty transactions, writes to the data that shouldn’t happen and that make the data inconsistent. Oracle Database 12c prevents dirty writes by its MVCC and transaction model.

Transaction models depend on transactions, which are ACID-compliant blocks of code. Oracle Database 12c provides an MVCC architecture that guarantees that all changes to data are ACID-compliant, which ensures the integrity of concurrent operations on data—transactions.

ACID-compliant transactions meet four conditions:

Atomic
They complete or fail while undoing any partial changes.
Consistent
They change from one state to another the same way regardless of whether
the change is made through parallel actions or serial actions.
Isolated
Partial changes are never seen by other users or processes in the concurrent system.
Durable
They are written to disk and made permanent when completed.

Oracle Database 12c manages ACID-compliant transactions by writing them to disk first, as redo log files only or as both redo log files and archive log files. Then it writes them to the database. This multiple-step process with logs ensures that Oracle database’s buffer cache (part of the instance memory) isn’t lost from any completed transaction. Log writes occur before the acknowledgement-of-transactions process occurs.

The smallest transaction in a database is a single SQL statement that inserts, updates, or deletes rows. SQL statements can also change values in one or more columns of a row in a table. Each SQL statement is by itself an ACID-compliant and MVCC-enabled transaction when managed by a transaction-capable database engine. The Oracle database is always a transaction-capable system. Transactions are typically a collection of SQL statements that work in close cooperation to accomplish a business objective. They’re often grouped into stored programs, which are functions, procedures, or triggers. Triggers are specialized programs that audit or protect data. They enforce business rules that prevent unauthorized changes to the data.

SQL statements and stored programs are foundational elements for development of business applications. They contain the interaction points between customers and the data and are collectively called the application programming interface (API) to the database. User forms (typically web forms today) access the API to interact with the data. In well-architected business application software, the API is the only interface that the form developer interacts with.

Database developers, such as you and I, create these code components to enforce business rules while providing options to form developers. In doing so, database developers must guard a few things at all cost. For example, some critical business logic and controls must prevent changes to the data in specific tables, even changes in API programs. That type of critical control is often written in database triggers. SQL statements are events that add, modify, or delete data. Triggers guarantee that API code cannot make certain additions, modifications, or deletions to critical resources, such as tables. Triggers can run before or after SQL statements. Their actions, like the SQL statements themselves, are temporary until the calling scope sends an instruction to commit the work performed.

A database trigger can intercept values before they’re placed in a column, and it can ensure that only certain values can be inserted into or updated in a column. A trigger overrides an INSERT or UPDATE statement value that violates a business rule and then it either raises an error and aborts the transaction or changes the value before it can be inserted or updated into the table. Chapter 12 offers examples of both types of triggers in Oracle Database 12c.
MVCC determines how to manage transactions. MVCC guarantees how multiple users’ SQL statements interact in an ACID compliant manner. The next two sections qualify how data transactions work and how MVCC locks and isolates partial results from data transactions.

Data Transaction

Data Manipulation Language (DML) commands are the SQL statements that transact against the data. They are principally the INSERT, UPDATE, and DELETE statements. The INSERT statement adds new rows in a table, the UPDATE statement modifies columns in existing rows, and the DELETE statement removes a row from a table.

The Oracle MERGE statement transacts against data by providing a conditional insert or update feature. The MERGE statement lets you add new rows when they don’t exist or change column values in rows that do exist.

Inserting data seldom encounters a conflict with other SQL statements because the values become a new row or rows in a table. Updates and deletes, on the other hand, can and do encounter conflicts with other UPDATE and DELETE statements. INSERT statements that encounter conflicts occur when columns in a new row match a preexisting row’s uniquely constrained columns. The insertion is disallowed because only one row can contain the unique column set.

These individual transactions have two phases in transactional databases such as Oracle. The first phase involves making a change that is visible only to the user in the current session. The user then has the option of committing the change, which makes it permanent, or rolling back the change, which undoes the transaction. Developers use Transaction Control Language (TCL) commands to confirm or cancel transactions. The COMMIT statement confirms or makes permanent any change, and the ROLLBACK statement cancels or undoes any change.

A generic transaction lifecycle for a two-table insert process implements a business rule that specifies that neither INSERT statement works unless they both work. Moreover, if the first INSERT statement fails, the second INSERT statement never runs; and if the second INSERT statement fails, the first INSERT statement is undone by a ROLLBACK statement to a SAVEPOINT.

After a failed transaction is unwritten, good development practice requires that you write the failed event(s) to an error log table. The write succeeds because it occurs after the ROLLBACK statement but before the COMMIT statement.

A SQL statement followed by a COMMIT statement is called a transaction process, or a two-phase commit (2PC) protocol. ACID-compliant transactions use a 2PC protocol to manage one SQL statement or collections of SQL statements. In a 2PC protocol model, the INSERT, UPDATE, MERGE, or DELETE DML statement starts the process and submits changes. These DML statements can also act as events that fire database triggers assigned to the table being changed.

Transactions become more complex when they include database triggers because triggers can inject an entire layer of logic within the transaction scope of a DML statement. For example, database triggers can do the following:

  • Run code that verifies, changes, or repudiates submitted changes
  • Record additional information after validation in other tables (they can’t write to the table being changed—or, in database lexicon, “mutated”
  • Throw exceptions to terminate a transaction when the values don’t meet business rules

As a general rule, triggers can’t contain a COMMIT or ROLLBACK statement because they run inside the transaction scope of a DML statement. Oracle databases give developers an alternative to this general rule because they support autonomous transactions. Autonomous transactions run outside the transaction scope of the triggering DML statement. They can contain a COMMIT statement and act independently of the calling scope statement. This means an autonomous trigger can commit a transaction when the calling transaction fails.

As independent statements or collections of statements add, modify, and remove rows, one statement transacts against data only by locking rows: the SELECT statement. A SELECT statement typically doesn’t lock rows when it acts as a cursor in the scope of a stored program. A cursor is a data structure that contains rows of one-to-many columns in a stored program. This is also known as a list of record structures.

Cursors act like ordinary SQL queries, except they’re managed by procedure programs row by row. There are many examples of procedural programming languages. PL/SQL and SQL/PSM programming languages are procedural languages designed to run inside the database. C, C++, C#, Java, Perl, and PHP are procedural languages that interface with the database through well-defined interfaces, such as Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC).

Cursors can query data two ways. One way locks the rows so that they can’t be changed until the cursor is closed; closing the cursor releases the lock. The other way doesn’t lock the rows, which allows them to be changed while the program is working with the data set from the cursor. The safest practice is to lock the rows when you open the cursor, and that should always be the case when you’re inserting, updating, or deleting rows that depend on the values in the cursor not changing until the transaction lifecycle of the program unit completes.

Loops use cursors to process data sets. That means the cursors are generally opened at or near the beginning of program units. Inside the loop the values from the cursor support one to many SQL statements for one to many tables.

Stored and external programs create their operational scope inside a database connection when they’re called by another program. External programs connect to a database and enjoy their own operational scope, known as a session scope. The session defines the programs’ operational scope. The operational scope of a stored program or external program defines the transaction scope. Inside the transaction scope, the programs interact with data in tables by inserting, updating, or deleting data until the operations complete successfully or encounter a critical failure. These stored program units commit changes when everything completes successfully, or they roll back changes when any critical instruction fails. Sometimes, the programs are written to roll back changes when any instruction fails.

In the Oracle Database, the most common clause to lock rows is the FOR UPDATE clause, which is appended to a SELECT statement. An Oracle database also supports a WAIT n seconds or NOWAIT option. The WAIT option is a blessing when you want to reply to an end user form’s request and can’t make the change quickly. Without this option, a change could hang around for a long time, which means virtually indefinitely to a user trying to run your application. The default value in an Oracle database is NOWAIT, WAIT without a timeout, or wait indefinitely.

You should avoid this default behavior when developing program units that interact with customers. The Oracle Database also supports a full table lock with the SQL LOCK TABLE command, but you would need to embed the command inside a stored or external program’s instruction set.

Written by maclochlainn

April 5th, 2022 at 2:20 pm

Oracle Unit Test

without comments

A unit test script may contain SQL or PL/SQL statements or it may call another script file that contains SQL or PL/SQL statements. Moreover, a script file is a way to bundle several activities into a single file because most unit test programs typically run two or more instructions as unit tests.

Unconditional Script File

You can write a simple unit test like the example program provided in the Lab 1 Help Section, which includes conditional logic. However, you can write a simpler script that is unconditional and raises exceptions when preconditions do not exist.

The following script file creates a one table and one_s sequence. The DROP TABLE and DROP SEQUENCE statements have the same precondition, which is that the table or sequence must previously exist.

-- Drop table one.
DROP TABLE one;
 
-- Crete table one.
CREATE TABLE one
( one_id    NUMBER
, one_text  VARCHAR2(10));
 
-- Drop sequence one_s.
DROP SEQUENCE one_s;
 
-- Create sequence one_s.
CREATE SEQUENCE one_s;

After writing the script file, you can save it in the lab2 subdirectory as the unconditional.sql file. After you login to the SQL*Plus environment from the lab2 subdirectory. You call the unconditional.sql script file from inside the SQL*Plus environment with the following syntax:

@unconditional.sql

It will display the following output, which raises an exception when the one table or one_s sequence does not already exist in the schema or database:

DROP TABLE one
           *
ERROR at line 1:
ORA-00942: table or view does not exist
 
Table created.
 
DROP SEQUENCE one_s
              *
ERROR at line 1:
ORA-02289: sequence does not exist
 
Sequence created.

An unconditional script raises exceptions when a precondition of the statement does not exist. The precondition is not limited to objects, like the table or sequence; and the precondition may be specific data in one or several rows of one or several tables. You can avoid raising conditional errors by writing conditional scripts.

Conditional Script File

A conditional script file contains statements that check for a precondition before running a statement, which effectively promotes their embedded statements to a lambda function. The following logic recreates the logic of the unconditional.sql script file as a conditional script file:

-- Conditionally drop a table and sequence.
BEGIN
  FOR i IN (SELECT   object_name
            ,        object_type
            FROM     user_objects
            WHERE    object_name IN ('ONE','ONE_S')
            ORDER BY object_type ) LOOP
    IF i.object_type = 'TABLE' THEN
      EXECUTE IMMEDIATE 'DROP TABLE '||i.object_name||' CASCADE CONSTRAINTS';
    ELSE
      EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.object_name;
    END IF;
  END LOOP;
END;
/
 
-- Crete table one.
CREATE TABLE one
( one_id    NUMBER
, one_text  VARCHAR2(10));
 
-- Create sequence one_s.
CREATE SEQUENCE one_s;

You can save this script in the lab2 subdirectory as conditional.sql and then unit test it in SQL*Plus. You must manually drop the one table and one_s sequence before running the conditional.sql script to test the preconditions.

You will see that the conditional.sql script does not raise an exception because the one table or one_s sequence is missing. It should generate output to the console, like this:

PL/SQL procedure successfully completed.
 
Table created.
 
Sequence created.

As a rule, you should always write conditional script files. Unconditional script files throw meaningless errors, which may cause your good code to fail a deployment test that requires error free code.

Written by maclochlainn

April 5th, 2022 at 1:59 pm

Selective Aggregation

without comments

Selective Aggregation

Learning Outcomes

  • Learn how to combine CASE operators and aggregation functions.
  • Learn how to selective aggregate values.
  • Learn how to use SQL to format report output.

Selective aggregation is the combination of the CASE operator and aggregation functions. Any aggregation function adds, sums, or averages the numbers that it finds; and when you embed the results of a CASE operator inside an aggregation function you get a selective result. The selectivity is determined by the WHEN clause of a CASE operator, which is more or less like an IF statement in an imperative programming language.

The prototype for selective aggregation is illustrated with a SUM function below:

SELECT   SUM(CASE
               WHEN left_operand = right_operand THEN result
               WHEN left_operand > right_operand THEN result
               WHEN left_operand IN (SET OF comma-delimited VALUES) THEN result
               WHEN left_operand IN (query OF results) THEN result
               ELSE alt_result
             END) AS selective_aggregate
FROM     some_table;

A small example let’s you see how selective aggregation works. You create a PAYMENT table and PAYMENT_S sequence for this example, as follows:

-- Create a PAYMENT table.
CREATE TABLE payment
( payment_id     NUMBER
, payment_date   DATE	      CONSTRAINT nn_payment_1 NOT NULL
, payment_amount NUMBER(20,2) CONSTRAINT nn_payment_2 NOT NULL
, CONSTRAINT pk_payment PRIMARY KEY (payment_id));
 
-- Create a PAYMENT_S sequence.
CREATE SEQUENCE payment_s;

After you create the table and sequence, you should insert some data. You can match the values below or choose your own values. You should just insert values for a bunch of rows.

After inserting 10,000 rows, you can get an unformatted total with the following query:

-- Query total amount.
SELECT   SUM(payment_amount) AS payment_total
FROM     payment;

It outputs the following:

PAYMENT_TOTAL
-------------
   5011091.75

You can nest the result inside the TO_CHAR function to format the output, like

-- Query total formatted amount.
SELECT   TO_CHAR(SUM(payment_amount),'999,999,999.00') AS payment_total
FROM     payment;

It outputs the following:

PAYMENT_TOTAL
---------------
   5,011,091.75

Somebody may suggest that you use a PIVOT function to rotate the data into a summary by month but the PIVOT function has limits. The pivoting key must be numeric and the column values will use only those numeric values.

-- Pivoted summaries by numeric monthly value.
SELECT   *
FROM    (SELECT EXTRACT(MONTH FROM payment_date) payment_month
         ,      payment_amount
         FROM   payment)
         PIVOT (SUM(payment_amount) FOR payment_month IN
                 (1,2,3,4,5,6,7,8,9,10,11,12));

It outputs the following:

	 1	    2	       3	  4	     5		6	   7	      8 	 9	   10	      11	 12
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
 245896.55  430552.36  443742.63  457860.27  470467.18	466370.71  415158.28  439898.72  458998.09  461378.56  474499.22  246269.18

You can use selective aggregation to get the results by a character label, like

SELECT   SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 1
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "JAN"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 2
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "FEB"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 3
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "MAR"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) IN (1,2,3)
             AND  EXTRACT(YEAR FROM payment_date) = 2019 THEN payment_amount
           END) AS "1FQ"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 4
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "APR"
FROM     payment;

It outputs the following:

       JAN	  FEB	     MAR	1FQ	   APR
---------- ---------- ---------- ---------- ----------
 245896.55  430552.36  443742.63 1120191.54  457860.27

You can format the output with a combination of the TO_CHAR and LPAD functions. The TO_CHAR allows you to add a formatting mask, complete with commas and two mandatory digits to the right of the decimal point. The reformatted query looks like

COL JAN FORMAT A13 HEADING "Jan"
COL FEB FORMAT A13 HEADING "Feb"
COL MAR FORMAT A13 HEADING "Mar"
COL 1FQ FORMAT A13 HEADING "1FQ"
COL APR FORMAT A13 HEADING "Apr"
SELECT   LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 1
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "JAN"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 2
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "FEB"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 3
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "MAR"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) IN (1,2,3)
             AND  EXTRACT(YEAR FROM payment_date) = 2019 THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "1FQ"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 4
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "APR"
FROM     payment;

It displays the formatted output:

Jan	      Feb	    Mar 	  1FQ		Apr
------------- ------------- ------------- ------------- -------------
   245,896.55	 430,552.36    443,742.63  1,120,191.54    457,860.27

INSERT Statement

without comments

INSERT Statement

Learning Outcomes

  • Learn how to use positional- and named-notation in INSERT statements.
  • Learn how to use the VALUES clause in INSERT statements.
  • Learn how to use subqueries in INSERT statements.

The INSERT statement lets you enter data into tables and views in two ways: via an INSERT statement with a VALUES clause and via an INSERT statement with a query. The VALUES clause takes a list of literal values (strings, numbers, and dates represented as strings), expression values (return values from functions), or variable values.

Query values are results from SELECT statements that are subqueries (covered earlier in this appendix). INSERT statements work with scalar, single-row, and multiple-row subqueries. The list of columns in the VALUES clause or SELECT clause of a query (a SELECT list) must map to the positional list of columns that defines the table. That list is found in the data dictionary or catalog. Alternatively to the list of columns from the data catalog, you can provide a named list of those columns. The named list overrides the positional (or default) order from the data catalog and must provide at least all mandatory columns in the table definition. Mandatory columns are those that are not null constrained.

Oracle databases differ from other databases in how they implement the INSERT statement. Oracle doesn’t support multiple-row inserts with a VALUES clause. Oracle does support default and override signatures as qualified in the ANSI SQL standards. Oracle also provides a multiple- table INSERT statement. This section covers how you enter data with an INSERT statement that is based on a VALUES clause or a subquery result statement. It also covers multiple-table INSERT statements.

The INSERT statement has one significant limitation: its default signature. The default signature is the list of columns that defines the table in the data catalog. The list is defined by the position and data type of columns. The CREATE statement defines the initial default signature, and the ALTER statement can change the number, data types, or ordering of columns in the default signature.

The default prototype for an INSERT statement allows for an optional column list that overrides the default list of columns. When you provide the column list you choose to implement named-notation, which is the right way to do it. Relying on the insertion order of the columns is a bad idea. An INSERT statement without a list of column names is a position-notation statement. Position-notation is bad because somebody can alter that order and previously written INSERT statements will break or put data in the wrong columns.

Like methods in OOPLs, an INSERT statement without the optional column list constructs an instance (or row) of the table using the default constructor. The override constructor for a row is defined by any INSERT statement when you provide an optional column list. That’s because it overrides the default constructor.

The generic prototype for an INSERT statement is confusing when it tries to capture both the VALUES clause and the result set from a query. Therefore, I’ve opted to provide two generic prototypes.

Insert by value

The first uses the VALUES clause:

INSERT
INTO table_name
[( column1, column2, column3, ...)] VALUES
( value1, value2, value3, ...);

Notice that the prototype for an INSERT statement with the result set from a query doesn’t use the VALUES clause at all. A parsing error occurs when the VALUES clause and query both occur in an INSERT statement.

The second prototype uses a query and excludes the VALUES clause. The subquery may return one to many rows of data. The operative rule is that all columns in the query return the same number of rows of data, because query results should be rectangles—rectangles made up of one to many rows of columns.

Insert by subquery

Here’s the prototype for an INSERT statement that uses a subquery:

INSERT
INTO table_name
[( column1, column2, column3, ...)]
( SELECT value1, value2, value3, ... FROM table_name WHERE ...);

A query, or SELECT statement, returns a SELECT list. The SELECT list is the list of columns, and it’s evaluated by position and data type. The SELECT list must match the definition of the table or the override signature provided.

Default signatures present a risk of data corruption through insertion anomalies, which occur when you enter bad data in tables. Mistakes transposing or misplacing values can occur more frequently with a default signature, because the underlying table structure can change. As a best practice, always use named notation by providing the optional list of values; this should help you avoid putting the right data in the wrong place.

The following subsections provide examples that use the default and override syntax for INSERT statements in Oracle databases. The subsections also cover multiple-table INSERT statements and a RETURNING INTO clause, which is an extension of the ANSI SQL standard. Oracle uses the RETURNING INTO clause to manage large objects, to return autogenerated identity column values, and to support some of the features of Oracle’s dynamic SQL. Note that Oracle also supports a bulk INSERT statement, which requires knowledge of PL/SQL.

Written by maclochlainn

April 5th, 2022 at 1:23 pm

MySQL CSV Output

without comments

Saturday, I posted how to use Microsoft ODBC DSN to connect to MySQL. Somebody didn’t like the fact that the PowerShell program failed to write a *.csv file to disk because the program used the Write-Host command to write to the content of the query to the console.

I thought that approach was a better as an example. However, it appears that it wasn’t because not everybody knows simple redirection. The original program can transfer the console output to a file, like:

powershell .\MySQLODBC.ps1 > output.csv

So, the first thing you need to do is add a parameter list, like:

2
3
4
param (
  [Parameter(Mandatory)][string]$fileName
)

Anyway, it’s trivial to demonstrate how to modify the PowerShell program to write to a disk. You should also create a virtual PowerShell drive before writing the file. That’s because you can change the physical directory anytime you want with minimal changes to rest of your code’s file references.

You can create a PowerShell virtual drive with the following command:

7
8
New-PSDrive -Name test -PSProvider FileSystem -Description 'Test area' `
            -Root C:\Data\cit225\mysql\test

but, it will write the following to console:

Name           Used (GB)     Free (GB) Provider      Root                                                                                 CurrentLocation
----           ---------     --------- --------      ----                                                                                 ---------------
test                0.00         28.74 FileSystem    C:\Data\cit225\mysql\test

You can suppress the console output with Microsoft’s version of redirection to the void (> /dev/null), which pipes (|) the standard out (stdout) to Out-Null, like:

7
8
New-PSDrive -Name test -PSProvider FileSystem -Description 'Test area' `
            -Root C:\Data\cit225\mysql\test | Out-Null

Since the program may run before an output file has been created, or after its been created and removed, you need to check whether the file exists before attempting to remove it. PowerShell provides the Test-Path command to check for the existence of a file and the Remove-Item command to remove a file, like:

11
12
if (Test-Path test:$fileName) {
  Remove-Item -Path test:$fileName }

Then, you simply replace the Write-Host call in the other program with the Add-Content command:

Add-Content -Value $output -Path test:$fileName

Now, the PowerShell script file writes the MySQL query’s output to an output.csv file. You can call the MySQLContact.ps1 script file with the following syntax:

powershell MySQLContact.ps1 output.csv

In case these changes don’t make sense outside the scope of the full script, here is the rewritten script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# Define parameter list for mandatory file name.
param (
  [Parameter(Mandatory)][string]$fileName
)
 
# Define a PowerShell Virtual Drive.
New-PSDrive -Name test -PSProvider FileSystem -Description 'Test area' `
            -Root C:\Data\cit225\mysql\test | Out-Null
 
# Remove the file only when it exists.
if (Test-Path test:$fileName) {
  Remove-Item -Path test:$fileName }
 
# Define a ODBC DSN connection string.
$ConnectionString = 'DSN=MySQLODBC2'
 
# Define a MySQL Command Object for a non-query.
$Connection = New-Object System.Data.Odbc.OdbcConnection;
$Connection.ConnectionString = $ConnectionString
 
# Attempt connection.
try {
  $Connection.Open()
 
  # Create a SQL command.
  $Command = $Connection.CreateCommand();
  $Command.CommandText = "SELECT last_name " + 
                         ",      first_name " +
                         "FROM   contact " +
                         "ORDER BY 1, 2";
 
  # Attempt to read SQL command.
  try {
    $row = $Command.ExecuteReader();
 
    # Read while records are found.
    while ($row.Read()) {
      # Initialize output for each row.
      $output = ""
 
      # Navigate across all columns (only two in this example).
      for ($column = 0; $column -lt $row.FieldCount; $column += 1) {
        # Mechanic for comma-delimit between last and first name.  
        if ($output.length -eq 0) { 
          $output += $row[$column] }
        else {
          $output += ", " + $row[$column] }
      }
        # Write the output from the database to a file.
        Add-Content -Value $output -Path test:$fileName
    }
 
  } catch {
    Write-Error "Message: $($_.Exception.Message)"
    Write-Error "StackTrace: $($_.Exception.StackTrace)"
    Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
  } finally {
    # Close the reader.
    $row.Close() }
 
} catch {
  Write-Error "Message: $($_.Exception.Message)"
  Write-Error "StackTrace: $($_.Exception.StackTrace)"
  Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
} finally {
  $Connection.Close() }

While I understand you might want to go to this level of effort if you where building a formal cmdlet, I’m not convinced its worth the effort in an ordinary PowerShell script. However, I don’t like to leave a question unanswered.

Written by maclochlainn

April 4th, 2022 at 12:45 am

PostgreSQL+PowerShell

with one comment

This post explains and demonstrates how to install, configure, and use the psqlODBC (32-bit) and psqlODBC (64-bit) libraries to connect your Microsoft PowerShell programs to a locally installed PostgreSQL 14 database. It relies on you previously installing and configuring a PostgreSQL 14 database. This post is a step-by-step guide to installing PostgreSQL 14 on Windows 10, and this post shows you how to configure the PostgreSQL 14 database.

If you didn’t follow the instructions to get the psqlODBC libraries in the installation blog post, you will need to get those libraries, as qualified by Microsoft with the PostgreSQL Stack Builder.

You can launch PostgreSQL Stack Builder after the install by clicking on Start -> PostgreSQL -> Stack Builder. Choose to enable Stack Builder to change your system and install the psqlODBC libraries. After you’ve installed the psqlODBC library, use Windows search field to find the ODBC Data Sources dialog and run it as administrator.

There are six steps to setup, test, and save your ODBC Data Source Name (DSN). You can click on the images on the right to launch them in a more readable format or simply read the instructions.

PostgreSQL ODBC Setup Steps

  1. The Microsoft DSN (Data Source Name) dialog automatically elects the User DSN tab. Click on the System DSN tab.

  1. The view under the System DSN is exactly like the User DSN tab. Click the Add button to start the workflow.

  1. The Create New Data Source dialog requires you select the PostgreSQL ODBC Driver(UNICODE) option from the list and click the Finish button to proceed.

  1. The PostgreSQL Unicode ODBC Driver Setup dialog should complete the prompts as follows below and consistent with the PostgreSQL 14 Configuration blog. If you opt for localhost as the server value because you have a DCHP IP address, make sure you’ve configured your hosts file in the C:\Windows\System32\drivers\etc directory. You should enter the following two lines in the hosts file:

    127.0.0.1  localhost
    ::1        localhost

    These are the string values you should enter in the PostgreSQL Unicode ODBC Driver Setup dialog:

    Data Source: PostgreSQL35W
    Database:    videodb
    Server:      localhost
    User Name:   student
    Description: PostgreSQL
    SSL Mode:    disable
    Port:        5432
    Password:    student

    After you complete the entry, click the Test button.

  1. The Connection Test dialog should return a “Connection successful” message. Click the OK button to continue.

  1. The ODBC Data Source Administrator dialog should show the PostgreSQL35W System Data Source. Click the OK button to continue.

After you have created the System PostgreSQL ODBC Setup, it’s time to build a PowerShell Cmdlet (or, Commandlet). Some documentation and blog notes incorrectly suggest you need to write a connection string with a UID and password, like:

$ConnectionString = 'DSN=PostgreSQL35W;Uid=student;Pwd=student'

The UID and password is unnecessary in the connection string. As a rule, the UID and password are only necessary in the ODBC DSN, like:

$ConnectionString = 'DSN=PostgreSQL35W'

You can create a readcursor.ps1 Cmdlet like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# Define a ODBC DSN connection string.
$ConnectionString = 'DSN=PostgreSQL35W'
 
# Define a MySQL Command Object for a non-query.
$Connection = New-Object System.Data.Odbc.OdbcConnection;
$Connection.ConnectionString = $ConnectionString
 
# Attempt connection.
try {
  $Connection.Open()
 
  # Create a SQL command.
  $Command = $Connection.CreateCommand();
  $Command.CommandText = "SELECT current_database();";
 
  # Attempt to read SQL command.
  try {
    $Reader = $Command.ExecuteReader();
 
    # Read while records are found.
    while ($Reader.Read()) {
      Write-Host "Current Database [" $Reader[0] "]"}
 
  } catch {
    Write-Error "Message: $($_.Exception.Message)"
    Write-Error "StackTrace: $($_.Exception.StackTrace)"
    Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
  } finally {
    # Close the reader.
    $Reader.Close() }
 
} catch {
  Write-Error "Message: $($_.Exception.Message)"
  Write-Error "StackTrace: $($_.Exception.StackTrace)"
  Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
} finally {
  $Connection.Close() }

Line 14 assigns a SQL query that returns a single row with one column as the CommandText of a Command object. Line 22 reads the zero position of a row or record set with only one column.

You call the readcursor.ps1 Cmdlet with the following syntax:

powershell .\readcursor.ps1

It returns:

Current Database [ videodb ]

A more realistic way to write a query would return multiple rows with a set of two or more columns. The following program queries a table with multiple rows of two columns, but the program logic can manage any number of columns.

# Define a ODBC DSN connection string.
$ConnectionString = 'DSN=PostgreSQL35W'
 
# Define a MySQL Command Object for a non-query.
$Connection = New-Object System.Data.Odbc.OdbcConnection;
$Connection.ConnectionString = $ConnectionString
 
# Attempt connection.
try {
  $Connection.Open()
 
  # Create a SQL command.
  $Command = $Connection.CreateCommand();
  $Command.CommandText = "SELECT last_name, first_name FROM contact ORDER BY 1, 2";
 
  # Attempt to read SQL command.
  try {
    $row = $Command.ExecuteReader();
 
    # Read while records are found.
    while ($row.Read()) {
      # Initialize output for each row.
      $output = ""
 
      # Navigate across all columns (only two in this example).
      for ($column = 0; $column -lt $row.FieldCount; $column += 1) {
        # Mechanic for comma-delimit between last and first name.  
        if ($output.length -eq 0) { 
          $output += $row[$column] }
        else {
          $output += ", " + $row[$column] }
        }
        # Write the output from the database.
        Write-Host $output
      }
 
  } catch {
    Write-Error "Message: $($_.Exception.Message)"
    Write-Error "StackTrace: $($_.Exception.StackTrace)"
    Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
  } finally {
    # Close the reader.
    $row.Close() }
 
} catch {
  Write-Error "Message: $($_.Exception.Message)"
  Write-Error "StackTrace: $($_.Exception.StackTrace)"
  Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
} finally {
  $Connection.Close() }

You call the readcontact.ps1 Cmdlet with the following syntax:

powershell .\readcontact.ps1

It returns an ordered set of comma-separated values, like

Clinton, Goeffrey
Gretelz, Simon
Moss, Wendy
Royal, Elizabeth
Smith, Brian
Sweeney, Ian
Sweeney, Matthew
Sweeney, Meaghan
Vizquel, Doreen
Vizquel, Oscar
Winn, Brian
Winn, Randi

As always, I hope this helps those looking for a complete concrete example of how to make Microsoft Powershell connect and query results from a PostgreSQL database.

Oracle’s Sparse Lists

without comments

Oracle’s PL/SQL Programming Language is really quite nice. I’ve written 8 books on it and still have fun coding in it. One nasty little detail about Oracle’s lists, introduced in Oracle 8 as PL/SQL Tables according their documentation, is they rely on sequential numeric indexes. Unfortunately, Oracle lists support a DELETE method, which can create gaps in the sequential indexes.

Oracle calls a sequence without gaps densely populated and a sequence with gaps sparsely populated. This can cause problems when PL/SQL code inadvertently removes elements at the beginning, end, or somewhere in the middle of the list. That’s because a program can then pass the sparsely populated list as a parameter to another stored function or procedure where the developer may traverse the list in a for-loop. That traversal may raise an exception in a for-loop, like this when it has gaps in the index sequence:

DECLARE
*
ERROR AT line 1:
ORA-01403: no data found
ORA-06512: AT line 20

Oracle’s myriad built-in libraries don’t offer a function to compact a sparsely populated list into a densely populated list. This post provides a compact stored procedure that converts a sparsely populated list to a densely populated list.

The first step to using the compact stored procedure requires that you create an object type in SQL, like this list of 20-character strings:

DROP TYPE list;
CREATE OR REPLACE
  TYPE list IS TABLE OF VARCHAR2(20);
/

Now, you can implement the compact stored procedure by passing the User-Defined Type as it’s sole parameter.

CREATE OR REPLACE
  PROCEDURE compact ( sparse IN OUT LIST ) IS
    /* Declare local variables. */
    iterator  NUMBER;           -- Leave iterator as null.
 
    /* Declare new list. */
    dense     LIST := list();
  BEGIN
    /*
      Initialize the iterator with the starting value, which is
      necessary because the first element of the original list
      could have been deleted in earlier operations. Setting the
      initial iterator value to the first numeric index value
      ensures you start at the lowest available index value.
    */
    iterator := sparse.FIRST;
 
    /* Convert sparsely populated list to densely populated. */
    WHILE (iterator <= sparse.LAST) LOOP
      dense.EXTEND;
      dense(dense.COUNT) := sparse(iterator);
      iterator := sparse.NEXT(iterator);
    END LOOP;
 
    /* Replace the input parameter with the compacted list. */
    sparse := dense;
  END;
/

Before we test the compact stored procedure, let’s create deleteElement stored procedure for our testing:

CREATE OR REPLACE
  PROCEDURE deleteElement ( sparse   IN OUT LIST
                          , element  IN     NUMBER ) IS
  BEGIN
    /* Delete a value. */
    sparse.DELETE(element);
  END;
/

Now, let’s use an anonymous block to test compacting a sparsely populated list into a densely populated list. The test program will remove the first, last, and one element in the middle before printing the sparsely populated list’s index and string values. This test will show you gaps in the remaining non-sequential index values.

After you see the gaps, the test program compacts the remaining list values into a new densely populated list. It then prints the new index values with the data values.

DECLARE
  /* Declare a four item list. */
  lv_strings  LIST := list('one','two','three','four','five','six','seven');
BEGIN
  /* Check size of list. */
  dbms_output.put_line('Print initial list size:  ['||lv_strings.COUNT||']');
  dbms_output.put_line('===================================');
 
  /* Delete a value. */
  deleteElement(lv_strings,lv_strings.FIRST);
  deleteElement(lv_strings,3);
  deleteElement(lv_strings,lv_strings.LAST);
 
  /* Check size of list. */
  dbms_output.put_line('Print modified list size: ['||lv_strings.COUNT||']');
  dbms_output.put_line('Print max index and size: ['||lv_strings.LAST||']['||lv_strings.COUNT||']');
  dbms_output.put_line('===================================');
  FOR i IN 1..lv_strings.LAST LOOP
    IF lv_strings.EXISTS(i) THEN
      dbms_output.put_line('List list index and item: ['||i||']['||lv_strings(i)||']');
    END IF;
  END LOOP;
 
  /* Call a procedure by passing current sparse collection and
     the procedure returns dense collection. */
  dbms_output.put_line('===================================');
  dbms_output.put_line('Compacting list.');
  compact(lv_strings);
  dbms_output.put_line('===================================');
 
  /* Print the new maximum index value and list size. */
  dbms_output.put_line('Print new index and size: ['||lv_strings.LAST||']['||lv_strings.COUNT||']');
  dbms_output.put_line('===================================');
  FOR i IN 1..lv_strings.COUNT LOOP
    dbms_output.put_line('List list index and item: ['||i||']['||lv_strings(i)||']');
  END LOOP;
  dbms_output.put_line('===================================');
END;
/

It produces output, like:

Print initial list size:  [7]
===================================
Print modified list size: [4]
Print max index and size: [6][4]
===================================
List list index and item: [2][two]
List list index and item: [4][four]
List list index and item: [5][five]
List list index and item: [6][six]
===================================
Compacting list.
===================================
Print new index and size: [4][4]
===================================
List list index and item: [1][two]
List list index and item: [2][four]
List list index and item: [3][five]
List list index and item: [4][six]
===================================

You can extend this concept by creating User-Defined Types with multiple attributes, which are essentially lists of tuples (to draw on Pythonic lingo).

Written by maclochlainn

October 4th, 2021 at 11:49 pm

MySQL Outer Joins

without comments

The students needed yet another example of LEFT JOIN, RIGHT JOIN, and FULL JOIN syntax (by combining a left and right join with the UNION set operator). To that end, I put this set of examples together.

The example also shows how to order the result set from a derived table with the UNION operator. It uses the WITH clause to build a Common Table Expression (CTE), which allows the query to order the UNION set operator’s product based on the left and right join queries. It uses a CASE statement to order the result sets. The left_table is the parent table and the right_table is the child table in the relationship, which means the right_table holds a left_id foreign key column that lets you connect matching rows in the left_table.

You build the little model with the following script:

-- -----------------------------------------------------------------
-- Drop the demonstration tables.
-- -----------------------------------------------------------------
DROP TABLE IF EXISTS left_table, right_table;
 
-- -----------------------------------------------------------------
-- Create left_table.
-- -----------------------------------------------------------------
CREATE TABLE left_table
( left_id        int unsigned primary key auto_increment
, leftstring     varchar(10));
 
-- -----------------------------------------------------------------
-- Create left_table.
-- -----------------------------------------------------------------
CREATE TABLE right_table
( right_id       int unsigned primary key auto_increment
, left_id        int unsigned
, rightstring         varchar(10));
 
-- -----------------------------------------------------------------
-- Insert five rows to the left table, which holds a 
-- left_id primary key column.
-- -----------------------------------------------------------------
INSERT INTO left_table (leftstring) values ('One');
INSERT INTO left_table (leftstring) values ('Two');
INSERT INTO left_table (leftstring) values ('Three');
INSERT INTO left_table (leftstring) values ('Four');
INSERT INTO left_table (leftstring) values ('Five');
 
-- -----------------------------------------------------------------
-- Delete row four to create a gap.
-- -----------------------------------------------------------------
DELETE FROM left_table where left_id = 4;
 
-- -----------------------------------------------------------------
--  Insert four rows, skipping a foreign key value for the 
--  left_id primary key value of 2.
-- -----------------------------------------------------------------
 
INSERT INTO right_table (rightstring,left_id) values ('One',1);
INSERT INTO right_table (rightstring,left_id) values ('Three',3);
INSERT INTO right_table (rightstring,left_id) values ('Four',4);
INSERT INTO right_table (rightstring,left_id) values ('Five',5);

Here are the join statements:

INNER JOIN

The INNER JOIN only returns those rows that match between a primary and foreign key column or set of columns.

SELECT l.left_id
,      l.leftstring
,      r.left_id
,      r.right_id
,      r.rightstring
FROM   left_table l INNER JOIN right_table r
ON     l.left_id = r.left_id;

It produces the following result set:

+---------+------------+---------+----------+-------------+
| left_id | leftstring | left_id | right_id | rightstring |
+---------+------------+---------+----------+-------------+
|       1 | One        |       1 |        1 | One         |
|       3 | Three      |       3 |        2 | Three       |
|       5 | Five       |       5 |        4 | Five        |
+---------+------------+---------+----------+-------------+
3 rows in set (0.00 sec)

LEFT OUTER JOIN

The LEFT OUTER JOIN only returns those rows that match between a primary and foreign key column or set of columns and any rows in the table on the lefthand side of the join that fail to match with any row on the righthand side of the join. The non-matching rows are also known as the right complement of the join.

1
2
3
4
5
6
7
SELECT l.left_id
,      l.leftstring
,      r.left_id
,      r.right_id
,      r.rightstring
FROM   left_table l LEFT JOIN right_table r
ON     l.left_id = r.left_id;

It produces the following result set:

+---------+------------+---------+----------+-------------+
| left_id | leftstring | left_id | right_id | rightstring |
+---------+------------+---------+----------+-------------+
|       1 | One        |       1 |        1 | One         |
|       2 | Two        |    NULL |     NULL | NULL        |
|       3 | Three      |       3 |        2 | Three       |
|       5 | Five       |       5 |        4 | Five        |
+---------+------------+---------+----------+-------------+
4 rows in set (0.00 sec)

Add the following line 8 to the query and you get only those rows in the lefthand table that have no child-related rows in the righthand table. These rows are sometimes called childless parent rows. More or less, the use case for this type of query is to find order headers without order lines.

6
7
8
FROM   left_table l LEFT JOIN right_table r
ON     l.left_id = r.left_id
WHERE  r.left_id IS NULL;

It produces the following result set:

+---------+------------+---------+----------+-------------+
| left_id | leftstring | left_id | right_id | rightstring |
+---------+------------+---------+----------+-------------+
|       2 | Two        |    NULL |     NULL | NULL        |
+---------+------------+---------+----------+-------------+
1 row in set (0.00 sec)

RIGHT OUTER JOIN

1
2
3
4
5
6
7
SELECT l.left_id
,      l.leftstring
,      r.left_id
,      r.right_id
,      r.rightstring
FROM   left_table l RIGHT JOIN right_table r
ON     l.left_id = r.left_id;

It produces the following result set:

+---------+------------+---------+----------+-------------+
| left_id | leftstring | left_id | right_id | rightstring |
+---------+------------+---------+----------+-------------+
|       1 | One        |       1 |        1 | One         |
|       3 | Three      |       3 |        2 | Three       |
|    NULL | NULL       |       4 |        3 | Four        |
|       5 | Five       |       5 |        4 | Five        |
+---------+------------+---------+----------+-------------+
4 rows in set (0.00 sec)

Add the following line 8 to the query and you get only those rows in the righthand table that have no parent-related rows in the lefthand table. These rows are sometimes called orphans because they have no parent row. More or less, the use case for this type of query is to find latent order lines after deleting the order header.

6
7
8
FROM   left_table l LEFT JOIN right_table r
ON     l.left_id = r.left_id
WHERE  l.left_id IS NULL;

It produces the following result set:

+---------+------------+---------+----------+-------------+
| left_id | leftstring | left_id | right_id | rightstring |
+---------+------------+---------+----------+-------------+
|    NULL | NULL       |       4 |        3 | Four        |
+---------+------------+---------+----------+-------------+
1 row in set (0.00 sec)

FULL OUTER JOIN

The full outer join doesn’t exist in MySQL, so you combine a LEFT OUTER JOIN and RIGHT OUTER JOIN with the UNION operator. The UNION operator eliminates the duplicate row from the intersection of the joins.

Here’s the full query:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
WITH cte AS
(SELECT l.left_id AS primary_left_id
,      l.leftstring
,      r.left_id  AS foreign_left_id
,      r.right_id
,      r.rightstring
FROM   left_table l LEFT JOIN right_table r
ON     l.left_id = r.left_id
UNION
SELECT l.left_id  AS primary_left_id
,      l.leftstring
,      r.left_id  AS foreign_left_id
,      r.right_id
,      r.rightstring
FROM   left_table l RIGHT JOIN right_table r
ON     l.left_id = r.left_id)
SELECT * FROM cte
ORDER BY
CASE
  WHEN 'One'   IN (leftstring, rightstring) THEN 1
  WHEN 'Two'   IN (leftstring, rightstring)  THEN 2
  WHEN 'Three' IN (leftstring, rightstring)  THEN 3
  WHEN 'Four'  IN (leftstring, rightstring)  THEN 4
  WHEN 'Five'  IN (leftstring, rightstring)  THEN 5
END;

It produces the following result set:

+-----------------+------------+-----------------+----------+-------------+
| primary_left_id | leftstring | foreign_left_id | right_id | rightstring |
+-----------------+------------+-----------------+----------+-------------+
|               1 | One        |               1 |        1 | One         |
|               2 | Two        |            NULL |     NULL | NULL        |
|               3 | Three      |               3 |        2 | Three       |
|            NULL | NULL       |               4 |        3 | Four        |
|               5 | Five       |               5 |        4 | Five        |
+-----------------+------------+-----------------+----------+-------------+
5 rows in set (0.00 sec)

Add the following lines 18 and 19 to the query and you get only those rows that are childless parent rows or orphaned child rows. More or less, the use case for this type of query is to find both order headers without order lines and order lines abandoned by deleted order headers.

17
18
19
SELECT * FROM cte
WHERE  primary_left_id IS NULL
OR     foreign_left_id IS NULL

It produces the following result set:

+-----------------+------------+-----------------+----------+-------------+
| primary_left_id | leftstring | foreign_left_id | right_id | rightstring |
+-----------------+------------+-----------------+----------+-------------+
|               2 | Two        |            NULL |     NULL | NULL        |
|            NULL | NULL       |               4 |        3 | Four        |
+-----------------+------------+-----------------+----------+-------------+
2 rows in set (0.00 sec)

The ORDER BY clause used is a variation on the more common choice of:

  WHEN leftstring = 'One'   OR rightstring = 'One'   THEN 1

The position of two string literals in any statement is a bad idea. Inverting the literal on the right and using a IN operator gives you a better and clearer WHEN statement:

  WHEN 'One'   IN (leftstring, rightstring) THEN 1

As always, I hope this helps those looking for syntax examples.

Written by maclochlainn

June 15th, 2021 at 10:36 pm

MySQL sakila Database

without comments

While I thought my instructions were clear, it appears there should have been more in my examples for using the MySQL MSI. A key thing that happened is that students opted not to install:

Samples and Examples 8.0.22

Unfortunately, they may not have read the Preface of Alan Beaulieu’s Learning SQL, 3rd Edition where he explains how to manually download the files from the MySQL web site. Here are those, very clear, instructions (pg. XV) with my additions in italics for the MySQL Shell:

First, you will need to launch the mysql command-line client or the mysqlsh command-line shell, and provide a password, and then perform the following steps:

  1. Go to https://dev.mysql.com/doc/index-other.html and download the files for the “sakila database” under the Example Database section.
  2. Put the files in the local directory such as C:\temp\sakila-db (used for the next two steps, but overwrite with your directory path).
  3. Type

    source c:\temp\sakila-db\sakila-schema.sql

    and press enter.

  4. Type

    source c:\temp\sakila-db\sakila-data.sql

    and press enter.

These instructions let you create the sakila database without rerunning the MSI to add a product. Naturally, you can avoid these steps by using the GUI approach provided in the MySQL MSI file.

As always, I hope this helps those looking for how to solve problems.

Written by maclochlainn

January 9th, 2021 at 11:22 am

MySQL Self-Join

with 4 comments

I’m switching to MySQL and leveraging Alan Beaulieu’s Learning SQL as a supporting reference for my Database Design and Development course. While reviewing Alan’s Chapter 5: Querying Multiple Tables, I found his coverage of using self-joins minimal.

In fact, he adds a prequel_film_id column to the film table in the sakila database and then a single row to demonstrate a minimal self-join query. I wanted to show them how to view a series of rows interconnected by a self-join, like the following:

SELECT   f.title AS film
,        fp.title AS prequel
FROM     film f LEFT JOIN film fp
ON       f.prequel_id = fp.film_id
WHERE    f.series_name = 'Harry Potter'
ORDER BY f.series_number;

It returns the following result set:

+----------------------------------------------+----------------------------------------------+
| film                                         | prequel                                      |
+----------------------------------------------+----------------------------------------------+
| Harry Potter and the Chamber of Secrets      | Harry Potter and the Sorcerer's Stone        |
| Harry Potter and the Prisoner of Azkaban     | Harry Potter and the Chamber of Secrets      |
| Harry Potter and the Goblet of Fire          | Harry Potter and the Prisoner of Azkaban     |
| Harry Potter and the Order of the Phoenix    | Harry Potter and the Goblet of Fire          |
| Harry Potter and the Half Blood Prince       | Harry Potter and the Order of the Phoenix    |
| Harry Potter and the Deathly Hallows: Part 1 | Harry Potter and the Half Blood Prince       |
| Harry Potter and the Deathly Hallows: Part 2 | Harry Potter and the Deathly Hallows: Part 1 |
+----------------------------------------------+----------------------------------------------+
7 rows in set (0.00 sec)

Then, I thought about what questions the students might ask. For example, why doesn’t the query return the first film that doesn’t have a prequel. So, I took the self-join to the next level to display the first film having no prequel, like this:

SELECT   f.title AS film
,        IFNULL(
           CASE
             WHEN NOT f.film_id = fp.film_id
             AND      f.prequel_id = fp.film_id THEN fp.title		   
           END,'None') AS prequel
FROM     film f LEFT JOIN film fp
ON       f.prequel_id = fp.film_id
WHERE    f.series_name = 'Harry Potter'
ORDER BY f.series_number;

The CASE operator in the SELECT-list filters the result set by eliminating rows erroneously returned. Without the CASE filter, the query would return the original Harry Potter and the Sorcerer’s Stone film matched agains a NULL and all of the other sequels. The CASE operator effectively limits the result set for the LEFT JOIN to only the following data:

+----------------------------------------------+----------------------------------------------+
| film                                         | prequel                                   |
+----------------------------------------------+----------------------------------------------+
| Harry Potter and the Sorcerer's Stone        | NULL                                         |
+----------------------------------------------+----------------------------------------------+

The IFNULL() built-in function lets you replace the NULL value returned as the prequel’s title value. The IFNULL() function substitutes a 'None' string literal for a NULL value. The query returns the following result set:

+----------------------------------------------+----------------------------------------------+
| film                                         | prequel                                      |
+----------------------------------------------+----------------------------------------------+
| Harry Potter and the Sorcerer's Stone        | None                                         |
| Harry Potter and the Chamber of Secrets      | Harry Potter and the Sorcerer's Stone        |
| Harry Potter and the Prisoner of Azkaban     | Harry Potter and the Chamber of Secrets      |
| Harry Potter and the Goblet of Fire          | Harry Potter and the Prisoner of Azkaban     |
| Harry Potter and the Order of the Phoenix    | Harry Potter and the Goblet of Fire          |
| Harry Potter and the Half Blood Prince       | Harry Potter and the Order of the Phoenix    |
| Harry Potter and the Deathly Hallows: Part 1 | Harry Potter and the Half Blood Prince       |
| Harry Potter and the Deathly Hallows: Part 2 | Harry Potter and the Deathly Hallows: Part 1 |
+----------------------------------------------+----------------------------------------------+
8 rows in set (0.01 sec)

Alan’s modification of the sakila.film table had the following two related design flaws:

  • It didn’t provide a way to guarantee the ordering of films with prequels because relational databases don’t guarantee ordered result sets unless you use an ORDER BY clause, which typically requires a column to order.
  • It didn’t provide a way to isolate a series of films.

I modified the film table differently by adding the series_name, series_number, and prequel_id columns. The series_name column lets you group results and the series_number column lets you order by a preserved sequence that you store as part of the data The prequel_id column lets you connect to the prequel film, much like the backward portion of a doubly linked list.

The new sakila.film table is:

+----------------------+---------------------------------------------------------------------+------+-----+-------------------+-----------------------------------------------+
| Field                | Type                                                                | Null | Key | Default           | Extra                                         |
+----------------------+---------------------------------------------------------------------+------+-----+-------------------+-----------------------------------------------+
| film_id              | smallint unsigned                                                   | NO   | PRI | NULL              | auto_increment                                |
| title                | varchar(255)                                                        | NO   | MUL | NULL              |                                               |
| description          | text                                                                | YES  |     | NULL              |                                               |
| release_year         | year                                                                | YES  |     | NULL              |                                               |
| language_id          | tinyint unsigned                                                    | NO   | MUL | NULL              |                                               |
| original_language_id | tinyint unsigned                                                    | YES  | MUL | NULL              |                                               |
| rental_duration      | tinyint unsigned                                                    | NO   |     | 3                 |                                               |
| rental_rate          | decimal(4,2)                                                        | NO   |     | 4.99              |                                               |
| length               | smallint unsigned                                                   | YES  |     | NULL              |                                               |
| replacement_cost     | decimal(5,2)                                                        | NO   |     | 19.99             |                                               |
| rating               | enum('G','PG','PG-13','R','NC-17')                                  | YES  |     | G                 |                                               |
| special_features     | set('Trailers','Commentaries','Deleted Scenes','Behind the Scenes') | YES  |     | NULL              |                                               |
| last_update          | timestamp                                                           | NO   |     | CURRENT_TIMESTAMP | DEFAULT_GENERATED on update CURRENT_TIMESTAMP |
| series_name          | varchar(20)                                                         | YES  |     | NULL              |                                               |
| series_number        | int unsigned                                                        | YES  |     | NULL              |                                               |
| prequel              | int unsigned                                                        | YES  |     | NULL              |                                               |
+----------------------+---------------------------------------------------------------------+------+-----+-------------------+-----------------------------------------------+
16 rows in set (0.21 sec)

After adding the three new columns, I inserted eight rows for the original Harry Potter films. You can use the following script in the MySQL client (mysql) to add the columns and insert the data to test the preceding queries:

-- Use sakila database.
USE sakila;
 
-- Add a prequel_id column to the sakila.film table.
ALTER TABLE film
ADD (series_name    varchar(20)),
ADD (series_number  int unsigned),
ADD (prequel_id     int unsigned);
 
-- Set primary to foreign key local variable.
SET @sv_film_id = 0;
 
-- Insert Harry Potter films in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Sorcerer''s Stone'
,'A film about a young boy who on his eleventh birthday discovers, he is the orphaned boy of two powerful wizards and has unique magical powers.'
, 2001
, 1
, NULL
, 3
, 0.99
, 152
, 19.99
,'PG'
,'Trailers'
,'2001-11-04'
,'Harry Potter'
, 1
, NULL );
 
-- Assign the last generated primary key value to the local variable.
SET @sv_film_id := last_insert_id();
 
-- Insert 2nd film in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Chamber of Secrets'
,'A film where Harry returning to Hogwarts, still famous and a hero, when strange things start to happen ... people are turning to stone and no-one knows what, or who, is doing it.'
, 2002
, 1
, NULL
, 3
, 0.99
, 160
, 19.99
,'PG'
,'Trailers'
,'2002-11-15'
,'Harry Potter'
, 2
, @sv_film_id );
 
-- Assign the last generated primary key value to the local variable.
SET @sv_film_id := last_insert_id();
 
-- Insert 3rd film in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Prisoner of Azkaban'
,'A film where Harry, Ron, and Hermione return for their third year at Hogwarts and are forced to face escaped prisoner, Sirius Black.'
, 2004
, 1
, NULL
, 3
, 0.99
, 141
, 19.99
,'PG'
,'Trailers'
,'2004-06-04'
,'Harry Potter'
, 3
, @sv_film_id );
 
-- Assign the last generated primary key value to the local variable.
SET @sv_film_id := last_insert_id();
 
-- Insert 4th film in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Goblet of Fire'
,'A film where where Harry Potter''s name emerges from the Goblet of Fire, and he becomes a competitor in a grueling battle for glory among three wizarding schools - the Triwizard Tournament.'
, 2005
, 1
, NULL
, 3
, 0.99
, 157
, 19.99
,'PG'
,'Trailers'
,'2005-11-18'
,'Harry Potter'
, 4
, @sv_film_id );
 
-- Assign the last generated primary key value to the local variable.
SET @sv_film_id := last_insert_id();
 
-- Insert 5th film in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Order of the Phoenix'
,'A film where Lord Voldemort has returned, but the Ministry of Magic is doing everything it can to keep the wizarding world from knowing the truth.'
, 2007
, 1
, NULL
, 3
, 0.99
, 138
, 19.99
,'PG-13'
,'Trailers'
,'2007-07-12'
,'Harry Potter'
, 5
, @sv_film_id );
 
-- Assign the last generated primary key value to the local variable.
SET @sv_film_id := last_insert_id();
 
-- Insert 6th film in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Half Blood Prince'
,'A film where Voldemort is tightening his grip on Hogwarts and it is no longer the safe haven it once was. Harry and Dumbledore work to find the key to unlock the Dark Lord''s defenses.'
, 2009
, 1
, NULL
, 3
, 0.99
, 153
, 19.99
,'PG'
,'Trailers'
,'2009-07-15'
,'Harry Potter'
, 6
, @sv_film_id );
 
-- Assign the last generated primary key value to the local variable.
SET @sv_film_id := last_insert_id();
 
-- Insert 7th film in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Deathly Hallows: Part 1'
,'A film where Harry, Ron and Hermione set out on their perilous mission to track down and destroy the Horcruxes - the keys to Voldemort''s immortality.'
, 2010
, 1
, NULL
, 3
, 0.99
, 146
, 19.99
,'PG-13'
,'Trailers'
,'2010-11-19'
,'Harry Potter'
, 7
, @sv_film_id );
 
-- Assign the last generated primary key value to the local variable.
SET @sv_film_id := last_insert_id();
 
-- Insert 8th film in sakila.film table with classic values clause.
INSERT INTO film
( title
, description
, release_year
, language_id
, original_language_id
, rental_duration
, rental_rate
, length
, replacement_cost
, rating
, special_features
, last_update
, series_name
, series_number
, prequel_id )
VALUES
('Harry Potter and the Deathly Hallows: Part 2'
,'A film where Harry, Ron and Hermione set out on their perilous mission to track down and destroy the Horcruxes - the keys to Voldemort''s immortality.'
, 2011
, 1
, NULL
, 3
, 0.99
, 130
, 19.99
,'PG-13'
,'Trailers'
,'2011-07-15'
,'Harry Potter'
, 8
, @sv_film_id );

You can put the following commands into a SQL script file to revert the sakila.film table to its base configuration:

DELETE FROM film WHERE film_id > 1000;
ALTER TABLE film DROP COLUMN series_name;
ALTER TABLE film DROP COLUMN series_number;
ALTER TABLE film DROP COLUMN prequel_id;
ALTER TABLE film AUTO_INCREMENT = 1000;

As always, I hope this helps those looking for how to solve a new problem.

Written by maclochlainn

November 22nd, 2020 at 4:03 pm