MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘sql’ Category

MySQL RegExp Default

with 4 comments

We had an interesting set of questions regarding the REGEXP comparison operator in MySQL today in both sections of Database Design and Development. They wanted to know the default behavior.

For example, we built a little movie table so that we didn’t change their default sakila example database. The movie table was like this:

CREATE TABLE movie
( movie_id     int unsigned primary key auto_increment
, movie_title  varchar(60)) auto_increment=1001;

Then, I inserted the following rows:

INSERT INTO movie 
( movie_title )
VALUES
 ('The King and I')
,('I')
,('The I Inside')
,('I am Legend');

Querying all results with this query:

SELECT * FROM movie;

It returns the following results:

+----------+----------------+
| movie_id | movie_title    |
+----------+----------------+
|     1001 | The King and I |
|     1002 | I              |
|     1003 | The I Inside   |
|     1004 | I am Legend    |
+----------+----------------+
4 rows in set (0.00 sec)

The following REGEXP returns all the rows because it looks for a case insensitive “I” anywhere in the string.

SELECT movie_title
FROM   movie
WHERE  movie_title REGEXP 'I';

The implicit regular expression is actually:

WHERE  movie_title REGEXP '^.*I.*$';

It looks for zero-to-many of any character before and after the “I“. You can get any string beginning with an “I” with the “^I“; and any string ending with an “I” with the “I$“. Interestingly, the “I.+$” should only match strings with one or more characters after the “I“, but it returns:

+----------------+
| movie_title    |
+----------------+
| The King and I |
| The I Inside   |
| I am Legend    |
+----------------+
3 rows in set (0.00 sec)

This caught me by surprise because I was lazy. As pointed out in the comment, it only appears to substitute a “.*“, or zero-to-many evaluation for the “.+” because it’s a case-insensitive search. There’s another lowercase “i” in the “The King and I” and that means the regular expression returns true because that “i” has one-or-more following characters. If we convert it to a case-sensitive comparison with the keyword binary, it works as expected because it ignores the lowercase “i“.

WHERE  binary movie_title REGEXP '^.*I.*$';

This builds on my 10-year old post on Regular Expressions. As always, I hope these notes helps others discovering features and behaviors of the MySQL database, and Bill thanks for catching my error.

Written by maclochlainn

April 29th, 2022 at 11:50 pm

Oracle ODBC DSN

without comments

As I move forward with trying to build an easy to use framework for data analysts who use multiple database backends and work on Windows OS, here’s a complete script that lets you run any query stored in a file to return a CSV file. It makes the assumption that you opted to put the user ID and password in the Windows ODBC DSN, and only provides the ODBC DSN name to make the connection to the ODBC library and database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# A local function for verbose reporting.
function Get-Message ($param, $value = $null) {
  if (!($value)) {
    Write-Host "Evaluate swtich    [" $param "]" } 	  
  else {
    Write-Host "Evaluate parameter [" $param "] and [" $value "]" } 
}
 
# Read SQLStatement file and minimally parse it.
function Get-SQLStatement ($sqlStatement) {
  # Set localvariable for return string value.
  $statement = ""
 
  # Read a file line-by-line.
  foreach ($line in Get-Content $sqlStatement) {
    # Use regular expression to replace multiple whitespace.
    $line = $line -replace '\s+', ' '
 
    # Add a whitespace to avoid joining keywords from different lines;
    # and remove trailing semicolons which are unneeded.
    if (!($line.endswith(";"))) {
      $statement += $line + " " }
    else {
      $statement += $line.trimend(";") }
  }
  # Returned minimally parsed statement.
  return $statement
}
 
# Set default type of SQL statement value to a query.
$stmt = "select"
 
# Set a variable to hold a SQL statement from a file.
$query = ""
 
# Set default values for SQL input and output files.
$outFile = "output.csv"
$sqlFile = "query.sql"
 
# Set default path to: %USERPROFILE%\AppData\Local\Temp folder, but ir 
# the tilde (~) in lieu of the %USERPROFILE% environment variable value.
$path = "~\AppData\Local\Temp"
 
# Set a verbose switch.
$verbose = $false
 
# Wrap the Parameter call to avoid a type casting warning.
try {
  param (
    [Parameter(Mandatory)][hashtable]$args
  )
}
catch {}
 
# Check for switches and parameters with arguments.
for ($i = 0; $i -lt $args.count; $i += 1) {
  if (($args[$i].startswith("-")) -and ($args[$i + 1].startswith("-"))) {
    if ($args[$i] = "-v") {
      $verbose = $true }
      # Print to verbose console.
    if ($verbose) { Get-Message $args[$i] }}
  elseif ($args[$i].startswith("-")) {
    # Print to verbose console.
    if ($verbose) { Get-Message $args[$i] $args[$i + 1] }
 
    # Evaluate and take action on parameters and values.
    if ($args[$i] -eq "-o") {
      $outfile = $args[$i + 1] }
    elseif ($args[$i] -eq "-q") {
      $sqlFile = $args[$i + 1] }
    elseif ($args[$i] -eq "-p") {
      $path = $args[$i + 1] }
  }
}
 
# Set a PowerShell Virtual Drive.
New-PSDrive -Name folder -PSProvider FileSystem -Description 'Forder Location' `
            -Root $path | Out-Null
 
# Remove the file only when it exists.
if (Test-Path folder:$outFile) {
  Remove-Item -Path folder:$outFile }
 
# Read SQL file into minimally parsed string.
if (Test-Path folder:$sqlFile) {
  $query = Get-SQLStatement $sqlFile }
 
# Set a ODBC DSN connection string.
$ConnectionString = 'DSN=OracleGeneric'
 
# Set an Oracle Command Object for a query.
$Connection = New-Object System.Data.Odbc.OdbcConnection;
$Connection.ConnectionString = $ConnectionString
 
# Attempt connection.
try {
  $Connection.Open()
 
  # Create a SQL command.
  $Command = $Connection.CreateCommand();
  $Command.CommandText = $query;
 
  # Attempt to read SQL command.
  try {
    $row = $Command.ExecuteReader();
 
    # Read while records are found.
    while ($row.Read()) {
      # Initialize output for each row.
      $output = ""
 
      # Navigate across all columns (only two in this example).
      for ($column = 0; $column -lt $row.FieldCount; $column += 1) {
        # Mechanic for comma-delimit between last and first name.  
        if ($output.length -eq 0) { 
          $output += $row[$column] }
        else {
          $output += ", " + $row[$column] }
      }
      # Write the output from the database to a file.
      Add-Content -Value $output -Path folder:$outFile
    }
  } catch {
    Write-Error "Message: $($_.Exception.Message)"
    Write-Error "StackTrace: $($_.Exception.StackTrace)"
    Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
  } finally {
    # Close the reader.
    $row.Close() }
} catch {
  Write-Error "Message: $($_.Exception.Message)"
  Write-Error "StackTrace: $($_.Exception.StackTrace)"
  Write-Error "LoaderExceptions: $($_.Exception.LoaderExceptions)"
} finally {
  $Connection.Close() }

You can use a command-line call like this:

powershell ./OracleContact.ps1 -v -o output.csv -q script.sql -p .

It produces the following verbose output to the console:

Evaluate swtich    [ -v ]
Evaluate parameter [ -o ] and [ output.csv ]
Evaluate parameter [ -q ] and [ script.sql ]
Evaluate parameter [ -p ] and [ . ]

You can suppress printing to the console by eliminating the -v switch from the parameter list.

As always, I hope this helps those looking for a solution to less tedious interactions with the Oracle database.

Transaction Management

without comments

Transaction Management

Learning Outcomes

  • Learn how to use Multiversion Concurrency Control (MVCC).
  • Learn how to manage ACID-compliant transactions.
  • Learn how to use:

    • SAVEPOINT Statement
    • COMMIT Statement
    • ROLLBACK Statement

Lesson Material

Transaction Management involves two key components. One is Multiversion Concurrency Control (MVCC) so one user doesn’t interfere with another user. The other is data transactions. Data transactions packag SQL statements in the scope of an imperative language that uses Transaction Control Language (TCL) to extend ACID-compliance from single SQL statements to groups of SQL statements.

Multiversion Concurrency Control (MVCC)

Multiversion Concurrency Control (MVCC) uses database snapshots to provide transactions with memory-persistent copies of the database. This means that users, via their SQL statements, interact with the in-memory copies of data rather than directly with physical data. MVCC systems isolate user transactions from each other and guarantee transaction integrity by preventing dirty transactions, writes to the data that shouldn’t happen and that make the data inconsistent. Oracle Database 12c prevents dirty writes by its MVCC and transaction model.

Transaction models depend on transactions, which are ACID-compliant blocks of code. Oracle Database 12c provides an MVCC architecture that guarantees that all changes to data are ACID-compliant, which ensures the integrity of concurrent operations on data—transactions.

ACID-compliant transactions meet four conditions:

Atomic
They complete or fail while undoing any partial changes.
Consistent
They change from one state to another the same way regardless of whether
the change is made through parallel actions or serial actions.
Isolated
Partial changes are never seen by other users or processes in the concurrent system.
Durable
They are written to disk and made permanent when completed.

Oracle Database 12c manages ACID-compliant transactions by writing them to disk first, as redo log files only or as both redo log files and archive log files. Then it writes them to the database. This multiple-step process with logs ensures that Oracle database’s buffer cache (part of the instance memory) isn’t lost from any completed transaction. Log writes occur before the acknowledgement-of-transactions process occurs.

The smallest transaction in a database is a single SQL statement that inserts, updates, or deletes rows. SQL statements can also change values in one or more columns of a row in a table. Each SQL statement is by itself an ACID-compliant and MVCC-enabled transaction when managed by a transaction-capable database engine. The Oracle database is always a transaction-capable system. Transactions are typically a collection of SQL statements that work in close cooperation to accomplish a business objective. They’re often grouped into stored programs, which are functions, procedures, or triggers. Triggers are specialized programs that audit or protect data. They enforce business rules that prevent unauthorized changes to the data.

SQL statements and stored programs are foundational elements for development of business applications. They contain the interaction points between customers and the data and are collectively called the application programming interface (API) to the database. User forms (typically web forms today) access the API to interact with the data. In well-architected business application software, the API is the only interface that the form developer interacts with.

Database developers, such as you and I, create these code components to enforce business rules while providing options to form developers. In doing so, database developers must guard a few things at all cost. For example, some critical business logic and controls must prevent changes to the data in specific tables, even changes in API programs. That type of critical control is often written in database triggers. SQL statements are events that add, modify, or delete data. Triggers guarantee that API code cannot make certain additions, modifications, or deletions to critical resources, such as tables. Triggers can run before or after SQL statements. Their actions, like the SQL statements themselves, are temporary until the calling scope sends an instruction to commit the work performed.

A database trigger can intercept values before they’re placed in a column, and it can ensure that only certain values can be inserted into or updated in a column. A trigger overrides an INSERT or UPDATE statement value that violates a business rule and then it either raises an error and aborts the transaction or changes the value before it can be inserted or updated into the table. Chapter 12 offers examples of both types of triggers in Oracle Database 12c.
MVCC determines how to manage transactions. MVCC guarantees how multiple users’ SQL statements interact in an ACID compliant manner. The next two sections qualify how data transactions work and how MVCC locks and isolates partial results from data transactions.

Data Transaction

Data Manipulation Language (DML) commands are the SQL statements that transact against the data. They are principally the INSERT, UPDATE, and DELETE statements. The INSERT statement adds new rows in a table, the UPDATE statement modifies columns in existing rows, and the DELETE statement removes a row from a table.

The Oracle MERGE statement transacts against data by providing a conditional insert or update feature. The MERGE statement lets you add new rows when they don’t exist or change column values in rows that do exist.

Inserting data seldom encounters a conflict with other SQL statements because the values become a new row or rows in a table. Updates and deletes, on the other hand, can and do encounter conflicts with other UPDATE and DELETE statements. INSERT statements that encounter conflicts occur when columns in a new row match a preexisting row’s uniquely constrained columns. The insertion is disallowed because only one row can contain the unique column set.

These individual transactions have two phases in transactional databases such as Oracle. The first phase involves making a change that is visible only to the user in the current session. The user then has the option of committing the change, which makes it permanent, or rolling back the change, which undoes the transaction. Developers use Transaction Control Language (TCL) commands to confirm or cancel transactions. The COMMIT statement confirms or makes permanent any change, and the ROLLBACK statement cancels or undoes any change.

A generic transaction lifecycle for a two-table insert process implements a business rule that specifies that neither INSERT statement works unless they both work. Moreover, if the first INSERT statement fails, the second INSERT statement never runs; and if the second INSERT statement fails, the first INSERT statement is undone by a ROLLBACK statement to a SAVEPOINT.

After a failed transaction is unwritten, good development practice requires that you write the failed event(s) to an error log table. The write succeeds because it occurs after the ROLLBACK statement but before the COMMIT statement.

A SQL statement followed by a COMMIT statement is called a transaction process, or a two-phase commit (2PC) protocol. ACID-compliant transactions use a 2PC protocol to manage one SQL statement or collections of SQL statements. In a 2PC protocol model, the INSERT, UPDATE, MERGE, or DELETE DML statement starts the process and submits changes. These DML statements can also act as events that fire database triggers assigned to the table being changed.

Transactions become more complex when they include database triggers because triggers can inject an entire layer of logic within the transaction scope of a DML statement. For example, database triggers can do the following:

  • Run code that verifies, changes, or repudiates submitted changes
  • Record additional information after validation in other tables (they can’t write to the table being changed—or, in database lexicon, “mutated”
  • Throw exceptions to terminate a transaction when the values don’t meet business rules

As a general rule, triggers can’t contain a COMMIT or ROLLBACK statement because they run inside the transaction scope of a DML statement. Oracle databases give developers an alternative to this general rule because they support autonomous transactions. Autonomous transactions run outside the transaction scope of the triggering DML statement. They can contain a COMMIT statement and act independently of the calling scope statement. This means an autonomous trigger can commit a transaction when the calling transaction fails.

As independent statements or collections of statements add, modify, and remove rows, one statement transacts against data only by locking rows: the SELECT statement. A SELECT statement typically doesn’t lock rows when it acts as a cursor in the scope of a stored program. A cursor is a data structure that contains rows of one-to-many columns in a stored program. This is also known as a list of record structures.

Cursors act like ordinary SQL queries, except they’re managed by procedure programs row by row. There are many examples of procedural programming languages. PL/SQL and SQL/PSM programming languages are procedural languages designed to run inside the database. C, C++, C#, Java, Perl, and PHP are procedural languages that interface with the database through well-defined interfaces, such as Java Database Connectivity (JDBC) and Open Database Connectivity (ODBC).

Cursors can query data two ways. One way locks the rows so that they can’t be changed until the cursor is closed; closing the cursor releases the lock. The other way doesn’t lock the rows, which allows them to be changed while the program is working with the data set from the cursor. The safest practice is to lock the rows when you open the cursor, and that should always be the case when you’re inserting, updating, or deleting rows that depend on the values in the cursor not changing until the transaction lifecycle of the program unit completes.

Loops use cursors to process data sets. That means the cursors are generally opened at or near the beginning of program units. Inside the loop the values from the cursor support one to many SQL statements for one to many tables.

Stored and external programs create their operational scope inside a database connection when they’re called by another program. External programs connect to a database and enjoy their own operational scope, known as a session scope. The session defines the programs’ operational scope. The operational scope of a stored program or external program defines the transaction scope. Inside the transaction scope, the programs interact with data in tables by inserting, updating, or deleting data until the operations complete successfully or encounter a critical failure. These stored program units commit changes when everything completes successfully, or they roll back changes when any critical instruction fails. Sometimes, the programs are written to roll back changes when any instruction fails.

In the Oracle Database, the most common clause to lock rows is the FOR UPDATE clause, which is appended to a SELECT statement. An Oracle database also supports a WAIT n seconds or NOWAIT option. The WAIT option is a blessing when you want to reply to an end user form’s request and can’t make the change quickly. Without this option, a change could hang around for a long time, which means virtually indefinitely to a user trying to run your application. The default value in an Oracle database is NOWAIT, WAIT without a timeout, or wait indefinitely.

You should avoid this default behavior when developing program units that interact with customers. The Oracle Database also supports a full table lock with the SQL LOCK TABLE command, but you would need to embed the command inside a stored or external program’s instruction set.

Written by maclochlainn

April 5th, 2022 at 2:20 pm

Oracle Unit Test

without comments

A unit test script may contain SQL or PL/SQL statements or it may call another script file that contains SQL or PL/SQL statements. Moreover, a script file is a way to bundle several activities into a single file because most unit test programs typically run two or more instructions as unit tests.

Unconditional Script File

You can write a simple unit test like the example program provided in the Lab 1 Help Section, which includes conditional logic. However, you can write a simpler script that is unconditional and raises exceptions when preconditions do not exist.

The following script file creates a one table and one_s sequence. The DROP TABLE and DROP SEQUENCE statements have the same precondition, which is that the table or sequence must previously exist.

-- Drop table one.
DROP TABLE one;
 
-- Crete table one.
CREATE TABLE one
( one_id    NUMBER
, one_text  VARCHAR2(10));
 
-- Drop sequence one_s.
DROP SEQUENCE one_s;
 
-- Create sequence one_s.
CREATE SEQUENCE one_s;

After writing the script file, you can save it in the lab2 subdirectory as the unconditional.sql file. After you login to the SQL*Plus environment from the lab2 subdirectory. You call the unconditional.sql script file from inside the SQL*Plus environment with the following syntax:

@unconditional.sql

It will display the following output, which raises an exception when the one table or one_s sequence does not already exist in the schema or database:

DROP TABLE one
           *
ERROR at line 1:
ORA-00942: table or view does not exist
 
Table created.
 
DROP SEQUENCE one_s
              *
ERROR at line 1:
ORA-02289: sequence does not exist
 
Sequence created.

An unconditional script raises exceptions when a precondition of the statement does not exist. The precondition is not limited to objects, like the table or sequence; and the precondition may be specific data in one or several rows of one or several tables. You can avoid raising conditional errors by writing conditional scripts.

Conditional Script File

A conditional script file contains statements that check for a precondition before running a statement, which effectively promotes their embedded statements to a lambda function. The following logic recreates the logic of the unconditional.sql script file as a conditional script file:

-- Conditionally drop a table and sequence.
BEGIN
  FOR i IN (SELECT   object_name
            ,        object_type
            FROM     user_objects
            WHERE    object_name IN ('ONE','ONE_S')
            ORDER BY object_type ) LOOP
    IF i.object_type = 'TABLE' THEN
      EXECUTE IMMEDIATE 'DROP TABLE '||i.object_name||' CASCADE CONSTRAINTS';
    ELSE
      EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.object_name;
    END IF;
  END LOOP;
END;
/
 
-- Crete table one.
CREATE TABLE one
( one_id    NUMBER
, one_text  VARCHAR2(10));
 
-- Create sequence one_s.
CREATE SEQUENCE one_s;

You can save this script in the lab2 subdirectory as conditional.sql and then unit test it in SQL*Plus. You must manually drop the one table and one_s sequence before running the conditional.sql script to test the preconditions.

You will see that the conditional.sql script does not raise an exception because the one table or one_s sequence is missing. It should generate output to the console, like this:

PL/SQL procedure successfully completed.
 
Table created.
 
Sequence created.

As a rule, you should always write conditional script files. Unconditional script files throw meaningless errors, which may cause your good code to fail a deployment test that requires error free code.

Written by maclochlainn

April 5th, 2022 at 1:59 pm

Selective Aggregation

without comments

Selective Aggregation

Learning Outcomes

  • Learn how to combine CASE operators and aggregation functions.
  • Learn how to selective aggregate values.
  • Learn how to use SQL to format report output.

Selective aggregation is the combination of the CASE operator and aggregation functions. Any aggregation function adds, sums, or averages the numbers that it finds; and when you embed the results of a CASE operator inside an aggregation function you get a selective result. The selectivity is determined by the WHEN clause of a CASE operator, which is more or less like an IF statement in an imperative programming language.

The prototype for selective aggregation is illustrated with a SUM function below:

SELECT   SUM(CASE
               WHEN left_operand = right_operand THEN result
               WHEN left_operand > right_operand THEN result
               WHEN left_operand IN (SET OF comma-delimited VALUES) THEN result
               WHEN left_operand IN (query OF results) THEN result
               ELSE alt_result
             END) AS selective_aggregate
FROM     some_table;

A small example let’s you see how selective aggregation works. You create a PAYMENT table and PAYMENT_S sequence for this example, as follows:

-- Create a PAYMENT table.
CREATE TABLE payment
( payment_id     NUMBER
, payment_date   DATE	      CONSTRAINT nn_payment_1 NOT NULL
, payment_amount NUMBER(20,2) CONSTRAINT nn_payment_2 NOT NULL
, CONSTRAINT pk_payment PRIMARY KEY (payment_id));
 
-- Create a PAYMENT_S sequence.
CREATE SEQUENCE payment_s;

After you create the table and sequence, you should insert some data. You can match the values below or choose your own values. You should just insert values for a bunch of rows.

After inserting 10,000 rows, you can get an unformatted total with the following query:

-- Query total amount.
SELECT   SUM(payment_amount) AS payment_total
FROM     payment;

It outputs the following:

PAYMENT_TOTAL
-------------
   5011091.75

You can nest the result inside the TO_CHAR function to format the output, like

-- Query total formatted amount.
SELECT   TO_CHAR(SUM(payment_amount),'999,999,999.00') AS payment_total
FROM     payment;

It outputs the following:

PAYMENT_TOTAL
---------------
   5,011,091.75

Somebody may suggest that you use a PIVOT function to rotate the data into a summary by month but the PIVOT function has limits. The pivoting key must be numeric and the column values will use only those numeric values.

-- Pivoted summaries by numeric monthly value.
SELECT   *
FROM    (SELECT EXTRACT(MONTH FROM payment_date) payment_month
         ,      payment_amount
         FROM   payment)
         PIVOT (SUM(payment_amount) FOR payment_month IN
                 (1,2,3,4,5,6,7,8,9,10,11,12));

It outputs the following:

	 1	    2	       3	  4	     5		6	   7	      8 	 9	   10	      11	 12
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
 245896.55  430552.36  443742.63  457860.27  470467.18	466370.71  415158.28  439898.72  458998.09  461378.56  474499.22  246269.18

You can use selective aggregation to get the results by a character label, like

SELECT   SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 1
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "JAN"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 2
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "FEB"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 3
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "MAR"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) IN (1,2,3)
             AND  EXTRACT(YEAR FROM payment_date) = 2019 THEN payment_amount
           END) AS "1FQ"
,        SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 4
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END) AS "APR"
FROM     payment;

It outputs the following:

       JAN	  FEB	     MAR	1FQ	   APR
---------- ---------- ---------- ---------- ----------
 245896.55  430552.36  443742.63 1120191.54  457860.27

You can format the output with a combination of the TO_CHAR and LPAD functions. The TO_CHAR allows you to add a formatting mask, complete with commas and two mandatory digits to the right of the decimal point. The reformatted query looks like

COL JAN FORMAT A13 HEADING "Jan"
COL FEB FORMAT A13 HEADING "Feb"
COL MAR FORMAT A13 HEADING "Mar"
COL 1FQ FORMAT A13 HEADING "1FQ"
COL APR FORMAT A13 HEADING "Apr"
SELECT   LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 1
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "JAN"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 2
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "FEB"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 3
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "MAR"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) IN (1,2,3)
             AND  EXTRACT(YEAR FROM payment_date) = 2019 THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "1FQ"
,        LPAD(TO_CHAR(SUM(
           CASE
             WHEN EXTRACT(MONTH FROM payment_date) = 4
             AND  EXTRACT(YEAR FROM payment_date) = 2019  THEN payment_amount
           END),'9,999,999.00'),13,' ') AS "APR"
FROM     payment;

It displays the formatted output:

Jan	      Feb	    Mar 	  1FQ		Apr
------------- ------------- ------------- ------------- -------------
   245,896.55	 430,552.36    443,742.63  1,120,191.54    457,860.27

INSERT Statement

without comments

INSERT Statement

Learning Outcomes

  • Learn how to use positional- and named-notation in INSERT statements.
  • Learn how to use the VALUES clause in INSERT statements.
  • Learn how to use subqueries in INSERT statements.

The INSERT statement lets you enter data into tables and views in two ways: via an INSERT statement with a VALUES clause and via an INSERT statement with a query. The VALUES clause takes a list of literal values (strings, numbers, and dates represented as strings), expression values (return values from functions), or variable values.

Query values are results from SELECT statements that are subqueries (covered earlier in this appendix). INSERT statements work with scalar, single-row, and multiple-row subqueries. The list of columns in the VALUES clause or SELECT clause of a query (a SELECT list) must map to the positional list of columns that defines the table. That list is found in the data dictionary or catalog. Alternatively to the list of columns from the data catalog, you can provide a named list of those columns. The named list overrides the positional (or default) order from the data catalog and must provide at least all mandatory columns in the table definition. Mandatory columns are those that are not null constrained.

Oracle databases differ from other databases in how they implement the INSERT statement. Oracle doesn’t support multiple-row inserts with a VALUES clause. Oracle does support default and override signatures as qualified in the ANSI SQL standards. Oracle also provides a multiple- table INSERT statement. This section covers how you enter data with an INSERT statement that is based on a VALUES clause or a subquery result statement. It also covers multiple-table INSERT statements.

The INSERT statement has one significant limitation: its default signature. The default signature is the list of columns that defines the table in the data catalog. The list is defined by the position and data type of columns. The CREATE statement defines the initial default signature, and the ALTER statement can change the number, data types, or ordering of columns in the default signature.

The default prototype for an INSERT statement allows for an optional column list that overrides the default list of columns. When you provide the column list you choose to implement named-notation, which is the right way to do it. Relying on the insertion order of the columns is a bad idea. An INSERT statement without a list of column names is a position-notation statement. Position-notation is bad because somebody can alter that order and previously written INSERT statements will break or put data in the wrong columns.

Like methods in OOPLs, an INSERT statement without the optional column list constructs an instance (or row) of the table using the default constructor. The override constructor for a row is defined by any INSERT statement when you provide an optional column list. That’s because it overrides the default constructor.

The generic prototype for an INSERT statement is confusing when it tries to capture both the VALUES clause and the result set from a query. Therefore, I’ve opted to provide two generic prototypes.

Insert by value

The first uses the VALUES clause:

INSERT
INTO table_name
[( column1, column2, column3, ...)] VALUES
( value1, value2, value3, ...);

Notice that the prototype for an INSERT statement with the result set from a query doesn’t use the VALUES clause at all. A parsing error occurs when the VALUES clause and query both occur in an INSERT statement.

The second prototype uses a query and excludes the VALUES clause. The subquery may return one to many rows of data. The operative rule is that all columns in the query return the same number of rows of data, because query results should be rectangles—rectangles made up of one to many rows of columns.

Insert by subquery

Here’s the prototype for an INSERT statement that uses a subquery:

INSERT
INTO table_name
[( column1, column2, column3, ...)]
( SELECT value1, value2, value3, ... FROM table_name WHERE ...);

A query, or SELECT statement, returns a SELECT list. The SELECT list is the list of columns, and it’s evaluated by position and data type. The SELECT list must match the definition of the table or the override signature provided.

Default signatures present a risk of data corruption through insertion anomalies, which occur when you enter bad data in tables. Mistakes transposing or misplacing values can occur more frequently with a default signature, because the underlying table structure can change. As a best practice, always use named notation by providing the optional list of values; this should help you avoid putting the right data in the wrong place.

The following subsections provide examples that use the default and override syntax for INSERT statements in Oracle databases. The subsections also cover multiple-table INSERT statements and a RETURNING INTO clause, which is an extension of the ANSI SQL standard. Oracle uses the RETURNING INTO clause to manage large objects, to return autogenerated identity column values, and to support some of the features of Oracle’s dynamic SQL. Note that Oracle also supports a bulk INSERT statement, which requires knowledge of PL/SQL.

Written by maclochlainn

April 5th, 2022 at 1:23 pm

Dynamic Drop Table

without comments

I always get interesting feedback on some posts. On my test case for discovering the STR_TO_DATE function’s behavior, the comment was tragically valid. I failed to cleanup after my test case. That was correct, and I should have dropped param table and the two procedures.

While appending the drop statements is the easiest, I thought it was an opportunity to have a bit of fun and write another procedure that will cleanup test case tables within the test_month_name procedure. Here’s sample dynamic drop_table procedure that you can use in other MySQL stored procedures:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
CREATE PROCEDURE drop_table
( table_name  VARCHAR(64))
BEGIN
 
  /* Declare a local variable for the SQL statement. */
  DECLARE stmt VARCHAR(1024);
 
  /* Set a session variable with two parameter markers. */
  SET @SQL := CONCAT('DROP TABLE ',table_name);
 
  /* Check if the constraint exists. */    
  IF EXISTS (SELECT NULL
             FROM   information_schema.tables t
             WHERE  t.table_schema = database()
             AND    t.table_name = table_name)
  THEN
 
    /* Dynamically allocated and run statement. */
    PREPARE stmt FROM @SQL;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
  END IF;
 
END;
$$

You can now put a call to the drop_table procedure in the test_month_name procedure from the earlier post. For convenience, here’s the modified test_month_name procedure with the call on line #33 right before you leave the loop and procedure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
CREATE PROCEDURE test_month_name()
BEGIN
 
  /* Declare a handler variable. */
  DECLARE month_name  VARCHAR(9);
 
  /* Declare a handler variable. */
  DECLARE fetched  INT DEFAULT 0;
 
  /* Cursors must come after variables and before event handlers. */
  DECLARE month_cursor CURSOR FOR
    SELECT m.month_name
    FROM   month m;
 
  /* Declare a not found record handler to close a cursor loop. */
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET fetched = 1;
 
  /* Open cursor and start simple loop. */
  OPEN month_cursor;
  cursor_loop:LOOP
 
    /* Fetch a record from the cursor. */
    FETCH month_cursor
    INTO  month_name;
 
    /* Place the catch handler for no more rows found
       immediately after the fetch operations. */
    IF fetched = 1 THEN 
      /* Fetch the partial strings that fail to find a month. */
      SELECT * FROM param;
 
      /* Conditionally drop the param table. */
      CALL drop_table('param');
 
      /* Leave the loop. */
      LEAVE cursor_loop;
    END IF;
 
    /* Call the subfunction because stored procedures do not
       support nested loops. */
    CALL read_string(month_name);
  END LOOP;
END;
$$

As always, I hope sample code examples help others solve problems.

Written by maclochlainn

February 12th, 2022 at 12:33 pm

Posted in MySQL,MySQL 8,sql

Tagged with ,

str_to_date Function

with 3 comments

As many know, I’ve adopted Learning SQL by Alan Beaulieu as a core reference for my database class. Chapter 7 in the book focuses on data generation, manipulation, and conversion.

The last exercise question in my check of whether they read the chapter and played with some of the discussed functions is:

  1. Use one or more temporal function to write a query that convert the ’29-FEB-2024′ string value into a default MySQL date format. The result should display:

    +--------------------+
    | mysql_default_date |
    +--------------------+
    | 2024-02-29         |
    +--------------------+
    1 row in set, 1 warning (0.00 sec)

If you’re not familiar with the behavior of MySQL functions, this could look like a difficult problem to solve. If you’re risk inclined you would probably try the STR_TO_DATE function but if you’re not risk inclined the description of the %m specifier might suggest you don’t have SQL builtin to solve the problem.

I use the problem to teach the students how to solve problems in SQL queries. The first step requires putting the base ’29-FEB-2024′ string value into a mystringstrings table, like:

DROP TABLE IF EXISTS strings;
CREATE TABLE strings
(mystring  VARCHAR(11));
 
SELECT 'Insert' AS statement;
INSERT INTO strings
(mystring)
VALUES
('29-FEB-2024');

The next step requires creating a query with:

  • A list of parameters in a Common Table Expression (CTE)
  • A CASE statement to filter results in the SELECT-list
  • A CROSS JOIN between the strings table and params CTE

The query would look like this resolves the comparison in the CASE statement through a case insensitive comparison:

SELECT 'Query' AS statement;
WITH params AS
(SELECT 'January' AS full_month
 UNION ALL
 SELECT 'February' AS full_month)
SELECT s.mystring
,      p.full_month
,      CASE
         WHEN SUBSTR(s.mystring,4,3) = SUBSTR(p.full_month,1,3) THEN
           STR_TO_DATE(REPLACE(s.mystring,SUBSTR(s.mystring,4,3),p.full_month),'%d-%M-%Y') 
       END AS converted_date
FROM   strings s CROSS JOIN params p;

and return:

+-------------+------------+----------------+
| mystring    | full_month | converted_date |
+-------------+------------+----------------+
| 29-FEB-2024 | January    | NULL           |
| 29-FEB-2024 | February   | 2024-02-29     |
+-------------+------------+----------------+
2 rows in set (0.00 sec)

The problem with the result set, or derived table, is the CROSS JOIN. A CROSS JOIN matches every row in one table with every row in another table or derived table from prior joins. That means you need to add a filter in the WHERE clause to ensure you only get matches between the strings and parameters, like the modified query:

WITH params AS 
(SELECT 'January' AS full_month 
 UNION ALL
 SELECT 'February' AS full_month)
SELECT s.mystring
,      p.full_month
,      CASE
         WHEN SUBSTR(s.mystring,4,3) = SUBSTR(p.full_month,1,3) THEN
           STR_TO_DATE(REPLACE(s.mystring,SUBSTR(s.mystring,4,3),p.full_month),'%d-%M-%Y') 
       END AS converted_date
FROM   strings s CROSS JOIN params p
WHERE  SUBSTR(s.mystring,4,3) = SUBSTR(p.full_month,1,3);

It returns a single row, like:

+-------------+------------+----------------+
| mystring    | full_month | converted_date |
+-------------+------------+----------------+
| 29-FEB-2024 | February   | 2024-02-29     |
+-------------+------------+----------------+
1 row in set (0.00 sec)

However, none of this is necessary because the query can be written like this:

SELECT STR_TO_DATE('29-FEB-2024','%d-%M-%Y') AS mysql_date;

It returns:

+------------+
| mysql_date |
+------------+
| 2024-02-29 |
+------------+
1 row in set (0.00 sec)

That’s because the STR_TO_DATE() function with the %M specifier resolves all months with three or more characters. Three characters are required because both Mar and May, and June and July can only be qualified by three characters. If you provide less than three characters of the month, the function returns a null value.

Here’s a complete test case that lets you discover all the null values that may occur with two few characters:

/* Conditionally drop the table. */
DROP TABLE IF EXISTS month, param;
 
/* Create a table. */
CREATE TABLE month
( month_name  VARCHAR(9));
 
/* Insert into the month table. */
INSERT INTO month
( month_name )
VALUES
 ('January')
,('February')
,('March')
,('April')
,('May')
,('June')
,('July')
,('August')
,('September')
,('October')
,('November')
,('December');
 
/* Create a table. */
CREATE TABLE param
( month   VARCHAR(9)
, needle  VARCHAR(9));
 
/* Conditionally drop the procedure. */
DROP PROCEDURE IF EXISTS read_string;
DROP PROCEDURE IF EXISTS test_month_name;
 
/* Reset the execution delimiter to create a stored program. */
DELIMITER $$
 
/* Create a procedure. */
CREATE PROCEDURE read_string(month_name  VARCHAR(9))
BEGIN
 
  /* Declare a handler variable. */
  DECLARE display     VARCHAR(17);
  DECLARE evaluate    VARCHAR(17);
  DECLARE iterator    INT DEFAULT 1;
  DECLARE partial     VARCHAR(9);
 
  /* Read the list of characters. */
  character_loop:LOOP
 
    /* Print the character list. */
    IF iterator > LENGTH(month_name) THEN
      LEAVE character_loop;
    END IF;
 
    /* Assign substring of month name. */
    SELECT SUBSTR(month_name,1,iterator) INTO partial;
    SELECT CONCAT('01-',partial,'-2024') INTO evaluate;
 
    /* Print only the strings too short to identify as the month. */
    IF STR_TO_DATE(evaluate,'%d-%M-%Y') IS NULL THEN
      INSERT INTO param
      ( month, needle )
      VALUES
      ( month_name, partial );
    END IF;
 
    /* Increment the counter. */
    SET iterator = iterator + 1;
 
  END LOOP;
END;
$$
 
/* Create a procedure. */
CREATE PROCEDURE test_month_name()
BEGIN
 
  /* Declare a handler variable. */
  DECLARE month_name  VARCHAR(9);
 
  /* Declare a handler variable. */
  DECLARE fetched  INT DEFAULT 0;
 
  /* Cursors must come after variables and before event handlers. */
  DECLARE month_cursor CURSOR FOR
    SELECT m.month_name
    FROM   month m;
 
  /* Declare a not found record handler to close a cursor loop. */
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET fetched = 1;
 
  /* Open cursor and start simple loop. */
  OPEN month_cursor;
  cursor_loop:LOOP
 
    /* Fetch a record from the cursor. */
    FETCH month_cursor
    INTO  month_name;
 
    /* Place the catch handler for no more rows found
       immediately after the fetch operations. */
    IF fetched = 1 THEN 
      /* Fetch the partial strings that fail to find a month. */
      SELECT * FROM param;
 
      /* Leave the loop. */
      LEAVE cursor_loop;
    END IF;
 
    /* Call the subfunction because stored procedures do not
       support nested loops. */
    CALL read_string(month_name);
  END LOOP;
END;
$$
 
/* Reset the delimter. */
DELIMITER ;
 
CALL test_month_name();

It returns the list of character fragments that fail to resolve English months:

+---------+--------+
| month   | needle |
+---------+--------+
| January | J      |
| March   | M      |
| March   | Ma     |
| April   | A      |
| May     | M      |
| May     | Ma     |
| June    | J      |
| June    | Ju     |
| July    | J      |
| July    | Ju     |
| August  | A      |
+---------+--------+
11 rows in set (0.02 sec)

There are two procedures because MySQL doesn’t support nested loops and uses a single-pass parser. So, the first read_string procedure is the inner loop and the second test_month_name procedure is the outer loop.

I wrote a follow-up to this post because of a reader’s question about not cleaning up the test case. In the other post, you will find a drop_table procedure that lets you dynamically drop the param table created to store the inner loop procedure’s results.

As always, I hope this helps those looking to open the hood and check the engine.

Written by maclochlainn

February 11th, 2022 at 1:13 am

Posted in MySQL,MySQL 8,sql

Tagged with ,

Case Sensitive Comparison

without comments

Sometimes you hear from some new developers that MySQL only makes case insensitive string comparisons. One of my students showed me their test case that they felt proved it:

SELECT STRCMP('a','A') WHERE 'a' = 'A';

Naturally, it returns 0, which means:

  • The values compared by the STRCMP() function makes a case insensitive comparison, and
  • The WHERE clause also compares strings case insensitively.

As a teacher, you’re gratified that the student took the time to build their own use cases. However, in this case I had to explain that while he was right about the STRCMP() function and the case insensitive comparison the student used in the WHERE clause was a choice, it wasn’t the only option. The student was wrong to conclude that MySQL couldn’t make case sensitive string comparisons.

I modified his sample by adding the required BINARY keyword for a case sensitive comparison in the WHERE clause:

SELECT STRCMP('a','A') WHERE BINARY 'a' = 'A';

It returns an empty set, which means the binary comparison in the WHERE clause is a case sensitive comparison. Then, I explained while the STRCMP() function performs a case insensitive match, the REPLACE() function performs a case sensitive one. Then, I gave him the following expanded use case for the two functions:

SELECT STRCMP('a','A')      AS test1
,      REPLACE('a','A','b') AS test2
,      REPLACE('a','a','b') AS test3;

It returns:

+-------+-------+-------+
| test1 | test2 | test3 |
+-------+-------+-------+
|     0 | a     | b     |
+-------+-------+-------+
1 row in set (0.00 sec)

The behavior of one function may be different than another as to how it compares strings, and its the developers responsibility to make sure they understand its behavior thoroughly before they use it. The binary comparison was a win for the student since they were building a website that needed that behavior from MySQL.

As always, I hope tidbits like this save folks time using MySQL.

Written by maclochlainn

February 10th, 2022 at 3:05 pm

Posted in MySQL,MySQL 8,sql

Tagged with ,

Read CSV with Python

without comments

In 2009, I showed an example of how to use the MySQL LOAD DATA INFILE command. Last year, I updated the details to reset the secure_file-priv privilege to use the LOAD DATA INFILE command, but you can avoid that approach with a simple Python 3 program like the one in this example. You also can use MySQL Shell’s new parallel table import feature, introduced in 8.0.17, as noted in a comment on this blog post.

The example requires creating an avenger table, avenger.csv file, a readWriteData.py Python script, run the readWriteData.py Python script, and a query that validates the insertion of the avenger.csv file’s data into the avenger table. The complete code in five steps using the sakila demonstration database:

  • Creating the avenger table with the create_avenger.sql script:

    -- Conditionally drop the avenger table.
    DROP TABLE IF EXISTS avenger;
     
    -- Create the avenger table.
    CREATE TABLE avenger
    ( avenger_id    int unsigned PRIMARY KEY AUTO_INCREMENT
    , first_name    varchar(20)
    , last_name     varchar(20)
    , avenger_name  varchar(20))
      ENGINE=InnoDB
      AUTO_INCREMENT=1001
      DEFAULT CHARSET=utf8mb4
      COLLATE=utf8mb4_0900_ai_ci;
  • Create the avenger.csv file with the following data:

    Anthony,Stark,Iron Man
    Thor,Odinson,God of Thunder
    Steven,Rogers,Captain America
    Bruce,Banner,Hulk
    Clinton,Barton,Hawkeye
    Natasha,Romanoff,Black Widow
    Peter,Parker,Spiderman
    Steven,Strange,Dr. Strange
    Scott,Lange,Ant-man
    Hope,van Dyne,Wasp
  • Create the readWriteFile.py Python 3 script:

    # Import libraries.
    import csv
    import mysql.connector
    from mysql.connector import errorcode
    from csv import reader
     
    #  Attempt the statement.
    # ============================================================
    #  Use a try-catch block to manage the connection.
    # ============================================================
    try:
      # Open connection.
      cnx = mysql.connector.connect( user='student'
                                   , password='student'
                                   , host='127.0.0.1'
                                   , database='sakila')
      # Create cursor.
      cursor = cnx.cursor()
     
      # Open file in read mode and pass the file object to reader.
      with open('avenger.csv', 'r') as read_obj:
        csv_reader = reader(read_obj)
     
        # Declare the dynamic statement.
        stmt = ("INSERT INTO avenger "
                "(first_name, last_name, avenger_name) "
                "VALUES "
                "(%s, %s, %s)")
     
        # Iterate over each row in the csv using reader object
        for row in csv_reader:
          cursor.execute(stmt, row)
     
        # Commit the writes.
        cnx.commit()
     
        #close the connection to the database.
        cursor.close()
     
    # Handle exception and close connection.
    except mysql.connector.Error as e:
      if e.errno == errorcode.ER_ACCESS_DENIED_ERROR:
        print("Something is wrong with your user name or password")
      elif e.errno == errorcode.ER_BAD_DB_ERROR:
        print("Database does not exist")
      else:
        print("Error code:", e.errno)        # error number
        print("SQLSTATE value:", e.sqlstate) # SQLSTATE value
        print("Error message:", e.msg)       # error message
     
    # Close the connection when the try block completes.
    else:
      cnx.close()
  • Run the readWriteFile.py file:

    python3 readWriteFile.py
  • Query the avenger table:

    SELECT * FROM avenger;

    It returns:

    +------------+------------+-----------+-----------------+
    | avenger_id | first_name | last_name | avenger_name    |
    +------------+------------+-----------+-----------------+
    |       1001 | Anthony    | Stark     | Iron Man        |
    |       1002 | Thor       | Odinson   | God of Thunder  |
    |       1003 | Steven     | Rogers    | Captain America |
    |       1004 | Bruce      | Banner    | Hulk            |
    |       1005 | Clinton    | Barton    | Hawkeye         |
    |       1006 | Natasha    | Romanoff  | Black Widow     |
    |       1007 | Peter      | Parker    | Spiderman       |
    |       1008 | Steven     | Strange   | Dr. Strange     |
    |       1009 | Scott      | Lange     | Ant-man         |
    |       1010 | Hope       | van Dyne  | Wasp            |
    +------------+------------+-----------+-----------------+
    10 rows in set (0.00 sec)

Written by maclochlainn

December 12th, 2021 at 12:17 am