MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Is SQL Programming

with 4 comments


Is SQL, or Structured Query Language, a programming language? That’s a great question! A question that many answer with emphasis: “No, SQL is not a programming language!” There are some who answer yes; and they usually qualify that answer with something like: “SQL is a programming language designed to communicate with relational databases.”

It strikes me that those saying “yes” are saying that SQL is only a collection of interface methods to read from and write to a database engine. Those saying SQL is not a programming language often qualify that a programming language must have conditional logic and iterative structures, which don’t exist in SQL.

There’s a third group that are fence sitters. They decline to say whether SQL is a programming language, but they also say individuals who only write SQL aren’t programmers. That’s a bit harsh from my perspective.

Before determining whether SQL is a programming language let’s define a programming language. Let’s define a programming language as a collection of lexical units, or building blocks, that build program units. Lexical units are typically organized as delimiters, identifiers, literals, and comments:

  • Delimiters include single or double quotes to identify strings and operators that let you assign and compare values.
  • Identifiers are reserved words, keywords, predefined identifiers (like data type names), user-defined variables, subroutines, or types.
  • Literals are typically numbers and strings, where some strings qualify as dates because they implement a default format mask to convert strings to dates or date-times.
  • Comments are simply delimited text that the program ignores but the programmer uses.

That means a programming language must let you define a variable, assign a value to a variable, iterate across a set of values, and make conditional statements. SQL meets these four conditions, but it does, as a set-programming language, qualify all variables as lists of tuples. Though it is possible to have variables with zero to many elements and one to many members in any given tuple. That means you can assign a literal value to to a one-element list with a single-member tuple, like you would a string or integer to a variable of that type.

As Kris Köhntopp commented, computer science defines a programming language as Turing Complete. As his comment qualifies and the Wikipedia page explains: “Turing completeness in declarative SQL is implemented through recursive common table expressions. Unsurprisingly, procedural extensions to SQL (PLSQL, etc.) are also Turing-complete.” While PostgreSQL introduces recursive query syntax through CTEs, it recently added the search and cycle feature in PostgreSQL 14. The recursive query feature has existed in the Oracle database since Oracle 8, but their documentation calls them hierarchical queries. I wrote a quick a tutorial on hierarchical queries in 2008.

For clarity, define and declare are two words that give grief to some newbies. Let’s qualify what they mean. Declare means to give a variable a name and data type. Define means to declare a variable and assign it a value. Another word for assigning a variable is initializing it. Unassigned variables are automatically assigned a default value or a null dependent on the programming language.

Let’s first declare a local variable, assign it to variable, and display the variable. The following example uses Node.js to define the input variable, assign the input variable to the display variable, and then print the display variable to console. Node.js requires that you assign an empty string to the display variable to define it as a string otherwise its type would be undefined, which is common behavior in dynamically typed languages.

/* Declare the display variable as a string. */
var display = ""
 
/* Define the input variable. */
var input = "Hello World!"
 
/* Assign the input variable contents to the display variable. */
display = input
 
/* Print the display variable contents to console. */
console.log(display)

It prints:

Hello World!

Let’s write the same type of program in MySQL. Like the Node.js, there are implementation differences. The biggest difference in MySQL or other relational databases occurs because SQL is a declarative set-based language. That means every variable is a collection of a record structure . You can only mimic a scalar or primitive data type variable by creating a record structure with a single member.

In the case below, there are four processing steps:

  • The ‘Hello World!’ literal value is assigned to an input variable.
  • The SELECT-list (or comma-delimited set of values in the SELECT clause) is assigned like a tuple to the struct collection variable by treating the query of the literal value as an expression.
  • The FROM clause returns the struct collection as the data set or as a derived table.
  • The topmost SELECT clause evaluates the struct collection row-by-row, like a loop, and assigns the input member to a display variable.

The query is:

SELECT  struct.input AS display
FROM   (SELECT 'Hello World!' AS input) struct;

Since the struct collection contains only one element, it displays the original literal value one time, like

+--------------+
| display      |
+--------------+
| Hello World! |
+--------------+
1 row in set (0.00 sec)

Let’s update the SQL syntax to the more readable, ANSI 1999 and forward, syntax with a Common Table Expression (CTE). CTEs are implemented by the WITH clause.

WITH struct AS
 (SELECT 'Hello World!' AS input)
SELECT struct.input AS display
FROM   struct;

The best thing about CTE values they run one-time and are subsequently available anywhere in your query, subqueries, or correlated subqueries. In short, there’s never an excuse to write a subquery twice in the same query.

Let’s look at loops and if-statements. Having established that we can assign a literal to a variable, re-assign the value from one variable to another, and then display the new variable, let’s assign a set of literal values to an array variable. As before, let’s use Node.js to structure the initial problem.

The program now assigns an array of strings to the input variable, uses a for-loop to read the values from the input array, and uses an if-statement with a regular expression evaluation. The if-statement determines which of the array value meets the condition by using a negating logical expression. That’s because the search() function returns a 0 or greater value when the needle value is found in the string and a -1 when not found. After validating that the needle variable value is found in an input string, the input value is assigned to the display variable.

/* Declare the display variable as a string. */
var display = ""
 
/* Declare a lookup variable. */
var needle = "Goodbye"
 
/* Define the input variable as an array of strings. */
var input = ["Hello World!"
            ,"Goodbye, Cruel World!"
            ,"Good morning, too early ..."]
 
/* Read through an array and assign the value that meets
 * the condition to the display variable. */
for (i = 0; i < input.length; i++)
  if (!(input[i].search(needle) < 0))
    display = input[i]
 
/* Print the display variable contents to console. */
console.log(display)

Then, it prints the display value:

Goodbye, Cruel World!

To replicate the coding approach in a query, there must be two CTEs. The needle CTE assigns a literal value of ‘goodbye’ to a one-element collection of a single-member tuple variable. The struct CTE creates a collection of strings by using the UNION ALL operator to append three unique tuples instead of one tuple as found in the early example.

The needle CTE returns a one-element collection of a single-member tuple variable. The struct CTE returns a three-element collection of a single-member tuple, which mimics an array of strings. The needle and struct CTEs return distinct variables with different data types. A cross join operation between the two CTEs puts their results together into the same context. It returns a Cartesian product that:

  • Adds a single-row tuples to each row of the query’s result set or derived table.
  • Adds a multiple-tuples to each row of the query’s result set or derived table by creating copies of each row (following the Cartesian set theory which multiplies rows and adds columns).

In this case, the Cartesian join adds a one-element needle CTE value to each element, or row, returned by the multiple-element struct CTE and produces the following derived table:

+-----------------------------+---------+
| display                     | lookup  |
+-----------------------------+---------+
| Hello World!                | goodbye |
| Goodbye, cruel world!       | goodbye |
| Good morning, too early ... | goodbye |
+-----------------------------+---------+
3 rows in set (0.00 sec)

The following query reads through the CTE collection like a loop and filters out any invalid input values. It uses the MySQL regular expression like function in the WHERE clause, which acts as a conditional or if-statement.

WITH needle AS
 (SELECT 'goodbye' AS lookup)
, struct AS
 (SELECT 'Hello World!' AS input
  UNION ALL
  SELECT 'Goodbye, cruel world!' AS input
  UNION ALL
  SELECT 'Good morning, too early ...' AS input)
 SELECT struct.input AS display
 FROM   struct CROSS JOIN needle
 WHERE  REGEXP_LIKE(struct.input, CONCAT('^.*',needle.lookup,'.*$'),'i');

It returns the one display value that meets the criteria:

+-----------------------+
| display               |
+-----------------------+
| Goodbye, cruel world! |
+-----------------------+
1 row in set (0.00 sec)

The comparisons of the imperative programming approach in Node.js and declarative programming approach should have established that SQL has all the elements of a programming language. That is, SQL has variable declaration and assignment and both iterative and conditional statements. SQL also has different styles for implementing variable declaration and the examples covered subqueries and CTEs with cross joins placing variables in common scope.

Comparative Approaches:

Next, let’s examine a problem that a programmer might encounter when they think SQL only queries or inserts, updates, or deletes single rows. With that perspective of SQL there’s often a limited perspective on how to write queries. Developers with this skill set level typically write only basic queries, which may include inner and outer joins and some aggregation statements.

Let’s assume the following for this programming assignment:

  • A sale table as your data source, and
  • A requirement to display the type, number, pre-tax sale amount, and percentage by type.

The sale table definition:

+------------+--------------+------+-----+---------+----------------+
| Field      | Type         | Null | Key | Default | Extra          |
+------------+--------------+------+-----+---------+----------------+
| sale_id    | int unsigned | NO   | PRI | NULL    | auto_increment |
| item_desc  | varchar(20)  | YES  |     | NULL    |                |
| unit_price | decimal(8,2) | YES  |     | NULL    |                |
| serial_no  | varchar(10)  | YES  |     | NULL    |                |
+------------+--------------+------+-----+---------+----------------+

A basic Node.js program may contain a SQL query that returns the item_desc and unit_price columns while counting the number of serial_no rows and summing the unit_price amounts (that assumes no discount sales, after all its Apple). That type of query leaves calculating the total amount of sales and percentage by type to the Node.js program.

const mysql = require('mysql') 
const connection = mysql.createConnection({ 
   host: 'localhost', 
   user: 'student', 
   password: 'student', 
   database: 'studentdb' 
}) 
 
connection.connect((err) => { 
 if (err) 
   throw err 
 else { 
   console.log('Connected to MySQL Server!\n') 
   connection.query("SELECT   s.item_desc " +
                    ",        s.unit_price " +
                    ",        COUNT(s.serial_no) AS quantity_sold " +
                    ",        SUM(s.unit_price) AS sales " +
                    "FROM     sale s " +
                    "GROUP BY s.item_desc " +
                    ",        s.unit_price", function (err, result) { 
     if (err) 
       throw err 
     else { 
       // Prints the index value in the RowDataPacket. 
       console.log(result)
       connection.end()
   }})} 
})

This program would return a JSON structure, like:

[ RowDataPacket {
    item_desc: 'MacBook Pro 16',
    unit_price: 2499,
    quantity_sold: 16,
    sales: 39984 },
  ...
  RowDataPacket {
    item_desc: 'MacBook Air M1',
    unit_price: 999,
    quantity_sold: 22,
    sales: 21978 } ]

While the remaining JavaScript code isn’t difficult to write, it’s unnecessary effort if the developer knew SQL well enough to program in it. The developer could simply re-write the query like the following and return the percentage by type value in the base JSON structure.

WITH sales AS
 (SELECT SUM(unit_price) AS total
  FROM   sale)
SELECT   s.item_desc
,        s.unit_price
,        COUNT(s.serial_no) AS quantity_sold
,        SUM(s.unit_price) AS sales
,        CONCAT(FORMAT((s.unit_price * COUNT(s.serial_no))/sales.total * 100,2),'%') AS percentage
FROM     sale s CROSS JOIN sales
GROUP BY s.item_desc
,        s.unit_price
,        sales.total;

The query uses the sales CTE to calculate and define a tuple with the total sales and adds a derived column calculating the percentage by type of device. It’s probably important to note that aggregation rules require you add the sales.total CTE tuple to the group by clause.

The new query returns this JSON list:

[ RowDataPacket {
    item_desc: 'MacBook Pro 16',
    unit_price: 2499,
    quantity_sold: 16,
    sales: 39984,
    percentage: '17.70%' },
  ...
  RowDataPacket {
    item_desc: 'MacBook Air M1',
    unit_price: 999,
    quantity_sold: 22,
    sales: 21978,
    percentage: '9.73%' } ]

The developer would get a complete JSON list when the new query replaces the old. It also would eliminate the need to write additional JavaScript to calculate the percentage by type of device.

Conclusions:

Leveraging the programming power of SQL is frequently possible in many frontend and backend programming solutions. However, the programming power of SQL is infrequently found in programming solutions. That leaves me to ask: “Is it possible that the almost systemic failure to leverage the programming capabilities of SQL is a result of biases by instructors and mentors to their own limited skill sets?” That likely might be true if their instructors and mentors held the belief that: “No, SQL is not a programming language!”

Candidly, folks that write SQL at the programming level almost always have concurrent mastery in two or more imperative programming languages. They’re probably the ones who say, “SQL is a programming language designed to communicate with relational databases.”

Who are those pesky fence sitters? You remember those, don’t you. They’re the ones who declined to take a position on whether SQL is a programming language. Are they the developers who are still learning, and those without an entrenched, preconceived, or learned bias? Or, do they wonder if SQL is Turing complete?

Written by maclochlainn

June 12th, 2022 at 7:36 pm