MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘T-SQL’ Category

A T-SQL Table Function

with one comment

I had an interesting conversation about table functions in Oracle’s PL/SQL; and the fact that they’re not available in MySQL. When I explained they’re available in Microsoft T-SQL User-Defined Functions (UDFs), my students wanted a small example. One of them said they’d tried to do it but couldn’t get it to work because they found the Microsoft web pages difficult to read and use. Specifically, they didn’t like the sparseness of this one on how to create a function.

Here’s a quick definition of a UDF table function that runs in the studentdb schema (created in this post for migrating SQL Server into a MySQL database). The following getConquistador function takes a single string, which acts to filter the result set from a query positioned as the return value of the function. You should note that this is an implementation of Microsoft’s Common Language Infrastructure (CLI).

CREATE FUNCTION studentdb.getConquistador
(@nationality AS VARCHAR(30))
RETURNS TABLE
RETURN SELECT * FROM studentdb.conquistador WHERE nationality = @nationality;

Unlike Oracle SQL, where you need to use the TABLE function to read the content of a table result from a function, you don’t need anything other than the function call in the FROM clause of a T-SQL query. Here’s an example of calling the table function:

SELECT * FROM studentdb.getConquistador('German');

The complete result from the query would produce these results when run from the sqlcmd command-line interface:

conquistador_id conquistador          actual_name          nationality
--------------- --------------------- -------------------- ------------
             11 Nicolas de Federman   Nikolaus Federmann   German
             13 Jorge de la Espira    George von Speyer    German
 
(2 rows affected)

However, you also have the ability to query only rows of interest without any specialized syntax, like this:

1> USE studentdb;
2> SELECT conquistador AS "Conquistador"
3> ,      actual_name AS "Name"
4> FROM   studentdb.getConquistador('German');
5> GO

This produces the following two-column result set:

Conquistador          Name
--------------------- --------------------
Nicolas de Federman   Nikolaus Federmann
Jorge de la Espira    George von Speyer
 
(2 rows affected)

Hope this helps those interested in T-SQL UDFs.

Written by maclochlainn

February 12th, 2013 at 1:16 am

Why Stored Programs?

with 2 comments

Why should you use stored programs? Great question, here’s my little insight into a situation that I heard about in a large organization.

A very large organization is having a technology argument. In someway, like politics, half-truth drives this type of discussion. This company has hundreds of databases and they’re about half SQL Server and Oracle. The argument (half-truth) states that using T-SQL or PL/SQL yields “spaghetti” code!

It seems like an old argument from my perspective. After all, I’ve been working with T-SQL and PL/SQL for a long time. Spaghetti code exists in every language when unskilled programmers solve problems but the point here is one of software architecture, and an attempt to malign stored programming in general. Let’s examine the merit of the argument against stored programs.

First of all, the argument against stored programs is simply not true. SQL DML statements, like the INSERT, UPDATE, and DELETE statements should maintain ACID compliant interactions with a single table in a database. Unfortunately, the same statements create anomalies (errors) in a poorly designed database.

Stored programs provide the ability to perform ACID compliant interactions across a series of tables in a database. They may also hide database design errors and protect the data from corruption. The same can’t be said for Java or C# developers. Java and C# developers frequently fail to see database design errors or they overlook them as inconsequential. This type of behavior results in corrupt data.

It typically raises cost, errors, and overall application complexity when key logic migrates outside the database. If you’re asking why, that’s great. Here are my thoughts on why:

  1. Making a Java or C# programmer responsible for managing the transaction scope across multiple tables in a database is not trivial. It requires a Java programmer that truly has mastered SQL. As a rule, it means a programmer writes many more lines of logic in their code because they don’t understand how to use SQL. It often eliminates joins from being performed in the database where they would considerably outperform external language operations.
  2. Identifying bottlenecks and poor usage of data becomes much more complex for DBAs because small queries that avoid joins don’t appear problematic inside the database. DBAs don’t look at the execution or scope of transactions running outside of the database and you generally are left with anecdotal customer complaints about the inefficiency of the application. Therefore, you have diminished accountability.
  3. Developing a library of stored procedures (and functions) ensures the integrity of transaction management. It also provides a series of published interfaces to developers writing the application logic. The published interface provides a modular interface, and lets developers focus on delivering quality applications without worrying about the database design. It lowers costs and increases quality by focusing developers on their strengths rather than trying to make them generalists. That having been said, it should never mask a poorly designed database!
  4. Service level agreements are critical metrics in any organization because they compel efficiency. If you mix the logic of the database and the application layer together, you can’t hold the development team responsible for the interface or batch processing metrics because they’ll always “blame” the database. Likewise, you can’t hold the database team responsible for performance when their metrics will only show trivial DML statement processing. Moreover, the DBA team will always show you that it’s not their fault because they’ve got metrics!
  5. Removing transaction controls from the database server generally means you increase the analysis and design costs. That’s because few developers have deep understanding of a non-database programming language and the database. Likewise, input from DBAs is marginalized because the solution that makes sense is disallowed by design fiat. Systems designed in this type of disparate way often evolve into extremely awkward application models.

Interestingly, the effective use of T-SQL or PL/SQL often identifies, isolates, and manages issues in poorly designed database models. That’s because they focus on the integrity of transactions across tables and leverage native database features. They also act like CSS files, effectively avoiding the use of inline style or embedded SQL and transaction control statements.

Let’s face this fact; any person who writes something like “spaghetti” code in the original context is poorly informed. They’re typically trying to sidestep blame for an existing bad application design or drive a change of platform without cost justification.

My take on this argument is two fold. Technologists in the organization may want to dump what they have and play with something else; or business and IT management may want to sidestep the wrath of angry users by blaming their failure on technology instead of how they didn’t design, manage, or deliver it.

Oh, wait … isn’t that last paragraph the reason for the existence of pre-package software? ;-) Don’t hesitate to chime in, after all it’s just my off-the-cuff opinion.

Written by maclochlainn

October 6th, 2012 at 3:48 pm

SQL Automated Numbers

with 4 comments

I’ve begun putting together an online database tutorial and expanded this entry and added horizontal scrolling to it. You can find the improved version of the blog post as blog page here.

Surrogate keys are interesting structures in databases. They’re essential if you want to make sure you optimize your design. They’re also very useful when you want to capture the automatic numbering value for a prior INSERT statement and reuse the automatic numbering value as the foreign key value in a subsequent statement. It was interesting to see how they’re implemented differently across Oracle, MySQL, and SQL Server while providing the same utility.

Below is a synopsis of how you implement these in Oracle, MySQL, and SQL Server.

Oracle

The first thing to qualify is that Oracle is generally always in a transactional mode. That means you don’t need to do anything special to set this example up.

Oracle doesn’t support automated numbering in tables prior to Oracle 12c. Oracle 12c introduces identity columns, and the mechanics change. However, you can use sequences to mimic automated numbering prior to Oracle 12c and without identity columns in Oracle 12c. A sequence is a structure in the database that holds a current value, increments by a fixed value – typically 1. Sequences are available in SQL and PL/SQL scopes through two pseudo columns. The pseudo columns are .nextval and .currval (note the two r’s because it’s not a stray dog).

The sequence_name.nextval call in any session places the next number from the sequence into your Personal Global Area (PGA), which is a memory context. After you’ve called the sequence into memory, you can access it again by using sequence_name.currval. The sequence only changes when you call it again with the .nextval pseudo column.

-- Conditionally drop data sturctures - tables and sequences.
BEGIN
  FOR i IN (SELECT TABLE_NAME
            FROM   user_tables
			WHERE  TABLE_NAME IN ('ONE','TWO')) LOOP
    EXECUTE IMMEDIATE 'DROP TABLE '||i.TABLE_NAME||' CASCADE CONSTRAINT';
  END LOOP;
  FOR i IN (SELECT sequence_name
            FROM   user_sequences
			WHERE  sequence_name IN ('ONE_S1','TWO_S1')) LOOP
    EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name;
  END LOOP;
END;
/
 
-- Create base table and sequence.
CREATE TABLE one
( one_id   INT         NOT NULL CONSTRAINT pk_one PRIMARY KEY
, one_text VARCHAR(10) NOT NULL );
 
CREATE SEQUENCE one_s1;
 
-- Create dependent table and sequence.
CREATE TABLE two
( two_id   INT         NOT NULL CONSTRAINT pk_two PRIMARY KEY
, one_id   INT         NOT NULL
, two_text VARCHAR(10) NOT NULL );
 
CREATE SEQUENCE two_s1;
 
-- Insert rows into the tables with sequence values.
INSERT INTO one VALUES (one_s1.NEXTVAL,'One!');
INSERT INTO one VALUES (one_s1.NEXTVAL,'Two!');
INSERT INTO two VALUES (two_s1.NEXTVAL, one_s1.currval,'Other Two!');
 
-- Display the values inserted with sequences.
SELECT   o.one_id
,        o.one_text
,        t.two_id
,        t.two_text
FROM     one o JOIN two t ON o.one_id = t.one_id;

If you mimic automatic numbering with database triggers, you may not have access to the .currval value for the second INSERT statement. This occurs when you provide a NULL value expecting the trigger to manage .NEXTVAL call for you.

Transactions require that you keep the primary key value for the first table in a locally scoped variable for reuse. Then, you can pass it to the next INSERT statement. You do that with the .CURRVAL value.

You can make a potentially erroneous assumption that you’re the only user updating the table. Operating under that assumption, you can query the highest sequence number from the table before an insert, add one to it, and then attempt the INSERT statement. In a multi-user system, it’s possible that somebody beats you to the finish line with their INSERT statement. Your insert would then have a duplicate surrogate key value for the one_id column, and fail on an ORA-00001 error for a uniqueness violation on a primary key column.

A database trigger can help you avoid a race condition. The trigger would ensure sequence values are unique but it may also introduce problems. A common Oracle trigger with a pseudo automatic numbering paradigm is represented by the following trigger (found in APEX generated code).

CREATE OR REPLACE TRIGGER one_t1 
BEFORE INSERT ON one
FOR EACH ROW
BEGIN
  :NEW.one_id := one_s1.NEXTVAL;
END;
/

Caution is required on this type of automated sequence trigger. There are two problems with this type of trigger.

One scenario is where you include a call to sequence_name.NEXTVAL in your INSERT statement. It then increments the sequence, and attempts to insert the value whereupon the trigger fires and repeats the behavior. Effectively, this type of logic creates a sequence that increments by one when you submit a null value in the values clause and by two when you submit a sequence_name.NEXTVAL value.

Another scenario occurs when you attempt a bulk INSERT operation on the table. The sequence call and substitution occurs on each row of the sequence.

You face another problem when you rewrite the trigger to only fire when a surrogate primary key isn’t provided, like this:

CREATE OR REPLACE TRIGGER one_t1 
BEFORE INSERT ON one
FOR EACH ROW
WHEN (NEW.one_id IS NULL)  -- Asynchronous with bulk insert operations when a value is provided by the bulk operation to the surrogate key column.
BEGIN
  :NEW.one_id := one_s1.NEXTVAL;
END;
/

This trigger design causes a problem only with bulk INSERT statements. Effectively, the sequence remains unaltered when you provide surrogate key values as part of inserting an array of values. The next non-bulk INSERT statement would then grab the .NEXTVAL value, attempt to use it, and raise a unique constraint violation because the bulk operation probably already used the value from the sequence.

The fix to bulk operations requires that you lock the table, disable a trigger like this, and get the .NEXTVAL value. Then, you assign the .NEXTVAL value to two local variables. One of these remains unchanged while the other increments as you populate the array for the bulk insert operation. After assigning the result from the .NEXTVAL, you drop the sequence and find the highest key value for the bulk insertion operation, add one to the highest key value, and store it in another locally stored variable. You perform the bulk insert operation and then recreate the sequence with a value one greater than the highest value in the table, which should already be in a locally scored variable. Don’t forget that you’d locked the table, so unlock it now.

You should note that database triggers run in a subshell with access only to the immediate shell that fired them. Therefore, you can’t set a bind variable in a SQL*Plus session and subsequently reference it inside the trigger body because it doesn’t have access to the variable.

MySQL

MySQL supports automatic numbering but not a default transactional mode like Oracle. You need to disable auto commit and start a transaction. You also need to assign the last automatic numbering value to a variable before using it in a subsequent INSERT statement. You must also provide an overriding list of mandatory columns when you opt to exclude the automated numbering column value. The one thing that we should all appreciate about MySQL is their desire to stay close to and comply with ANSI standards.

-- Conditionally drop the tables.
DROP TABLE IF EXISTS one;
DROP TABLE IF EXISTS two;
 
-- Create the tables with a surrogate key that automatically increments.
CREATE TABLE one ( one_id   INT PRIMARY KEY AUTO_INCREMENT
                 , one_text VARCHAR(20));
 
CREATE TABLE two ( two_id INT PRIMARY KEY AUTO_INCREMENT
                 , one_id INT
                 , two_text VARCHAR(20));
 
-- Start transaction cycle.
START TRANSACTION;
 
-- Insert first row, transfer auto increment to memory.
INSERT INTO one (one_text) VALUES ('One');
 
-- Assign last auto increment to local scope variable, the = works too.
SET @one_fk := last_insert_id();
 
-- Insert second row with auto increment and local scope variable.
INSERT INTO b (one_id, two_text) VALUES (@one_fk,'Two');
 
COMMIT;
 
-- Display the values inserted with auto incremented values.
SELECT   o.one_id
,        o.one_text
,        t.two_id
,        t.two_text
FROM     one o JOIN two t ON o.one_id = t.one_id;

SQL Server

SQL Server supports automatic numbering but they call it the identity value. There are two ways to use it but the one I’m showing is for SQL Server 2005 or newer. You can replace the older @@identity for the SCOPE_IDENTITY() function call but Microsoft has already removed first level support from SQL Server 2000. While they’ve not said @@identity is deprecated, it sure appears that’s possible in a future release.

USE student;
 
BEGIN TRAN;
 
-- Conditionally drop tables when they exist.
IF OBJECT_ID('dbo.one','U') IS NOT NULL DROP TABLE dbo.one;
IF OBJECT_ID('dbo.two','U') IS NOT NULL DROP TABLE dbo.two;
 
-- Create auto incrementing tables.
CREATE TABLE one
( one_id   INT         NOT NULL IDENTITY(1,1) CONSTRAINT pk_one PRIMARY KEY
, one_text VARCHAR(10) NOT NULL );
 
CREATE TABLE two
( two_id   INT         NOT NULL IDENTITY(1,1) CONSTRAINT pk_two PRIMARY KEY
, one_id   INT         NOT NULL
, two_text VARCHAR(10) NOT NULL );
 
-- Insert the values, and magically no override signature required.
INSERT INTO one VALUES ('One!');
INSERT INTO one VALUES ('Two!');
INSERT INTO two VALUES (SCOPE_IDENTITY(),'Other Two!');
 
-- Query the results.
SELECT   o.one_id
,        o.one_text
,        t.two_id
,        t.two_text
FROM     one o JOIN two t ON o.one_id = t.one_id;
 
COMMIT TRAN;

You should note that T-SQL doesn’t require an override signature when you use an automatic numbering column. This is different, isn’t it?

While the prior example works with two tables, it doesn’t scale to a series of tables. You should consider the following assignment pattern when you’ll have multiple last identity values in a single transaction scope.

DECLARE @one_pk AS INT;
SET @one_pk = SCOPE_IDENTITY();

As mentioned, this style is important when you’ve got a series of primary and foreign keys to map in the scope of a single transaction. Also, I’d suggest that you put all the declarations at the beginning of the transaction’s scope.

As always, I hope this helps some folks.

Written by maclochlainn

May 16th, 2009 at 11:04 pm

T-SQL Hierarchical Query

with 5 comments

Playing around with Microsoft SQL Server 2008 Express edition, I’ve sorted through a bunch of tidbits. One that I thought was interesting, is how to perform a recursive or hierarchical query. This describes how you can perform the magic.

The official name of the WITH clause in Oracle’s lexicon (otherwise known as Oraclese) is a subquery factoring clause. You can find more on that in this earlier blog post. Microsoft has a different name for the WITH clause. They call it a Common Table Expression or CTE.

You perform recursive queries in Microsoft SQL Server 2008 by leveraging CTEs. I’ve modified the setup code from that earlier blog post to run in SQL Server 2008. You’ll find it at the bottom of this blog post.

Unless you want to write your own C# (.NET is the politically correct lingo) equivalent to Oracle’s SQL*Plus, you’ll need to run this script in the SQL Server Management Studio. Actually, you can use Microsoft SQL Server 2008′s command-line utility, which is called sqlcmd.exe but it is much less robust than SQL*Plus. In the Management Studio, you click File, then Open, and File… to load the file for execution, and then click the Execute button. You need to be careful you don’t click the Debug button, which is the green arrow to the right of the Execute button.

se2008executebutton

This is the magic query in the illustration. You can also find it in the source code. At the end of the day, I’m hard pressed to understand why they’d use a UNION ALL to support recursion.

tsql_recursivequery

The top-most CTE, or subquery factoring clause, simply joins the ORGANIZATION_NAME to the ORG_PARENT_ID and ORG_CHILD_ID columns to provide a single working source. The second CTE performs the recursion. The top-query sets the starting row, and the second query recursively navigates the tree. After all children are found, the first query moves to the next element in the table and recursively searches for its children.

You should note that the CTE self-references itself from inside the second query. Then, the external query (the non-CTE query) returns the results by querying the same CTE.

This logic behaves more like a nested loop, and actually fails to move down branches of the tree like a recursive program. Otherwise line 19 would be line 14 in the output. You could write another CTE to fix this shortfall, thereby mirroring a true recursive behavior, or you can write a stored procedure.

The illustrated query outputs the following hierarchical relationship, which navigates down the hierarchical tree:

tsql_recursiveresults

You can also go up any branch of the tree by changing some of the logic. You’ll find the query to navigate up the tree as the second query in the setup script at the end of the blog. It renders the following output:

tsql_recursiveresultsleafup

The blog will be updated if I discover the equivalent to the LEVEL in Oracle’s self-referencing semantics. If you know it, please share it with everybody.

Setup Script

Microsoft SQL Server 2008 Join Script

USE student;
 
BEGIN TRAN;
 
-- Conditionally drop tables when they exist.
IF OBJECT_ID('dbo.ORGANIZATION','U') IS NOT NULL DROP TABLE dbo.ORGANIZATION;
IF OBJECT_ID('dbo.ORG_STRUCTURE','U') IS NOT NULL DROP TABLE dbo.ORG_STRUCTURE;
 
-- Create the organization table. 
CREATE TABLE ORGANIZATION
( organization_id INT
, organization_name VARCHAR(10));
 
-- Seed the organizations. 
INSERT INTO dbo.ORGANIZATION VALUES
 (1,'One'), (2,'Two'), (3,'Three'), (4,'Four'), (5,'Five')
,(6,'Six'), (7,'Seven'), (8,'Eight'), (9,'Nine'), (10,'Ten')
,(11,'Eleven'), (12,'Twelve'), (13,'Thirteen'), (14,'Fourteen'), (15,'Fifteen')
,(16,'Sixteen'), (17,'Seventeen'), (18,'Eighteen'), (19,'Nineteen'), (20,'Twenty');
 
-- Create the organization structure table that holds the recursive key. 
CREATE TABLE org_structure
( org_structure_id INT
, org_parent_id INT
, org_child_id INT );
 
-- Seed the organization structures. 
INSERT INTO org_structure VALUES
 ( 1, 0, 1),( 1, 1, 2),( 1, 1, 3),( 1, 1, 4),( 1, 2, 5)
,( 1, 2, 6),( 1, 3, 7),( 1, 3, 8),( 1, 4, 9),( 1, 4,10)
,( 1, 5,11),( 1, 5,12),( 1, 6,13),( 1, 6,14),( 1, 7,15)
,( 1, 8,16),( 1, 8,17),( 1, 9,18),( 1, 9,19),( 1,14,20);
 
COMMIT TRAN;
 
-- Navigating down the tree from the root node.
WITH org_name AS
 (SELECT   os.org_parent_id AS org_parent_id
  ,        o1.organization_name AS org_parent_name
  ,        os.org_child_id AS org_child_id
  ,        o2.organization_name AS org_child_name
  FROM     dbo.organization o1 RIGHT JOIN dbo.org_structure os 
  ON       o1.organization_id = os.org_parent_id RIGHT JOIN dbo.organization o2
  ON       o2.organization_id = os.org_child_id)
, jn AS
 (SELECT   org_parent_id, org_parent_name
  ,        org_child_id, org_child_name
  FROM     org_name
  WHERE    org_parent_id = 1  
  UNION ALL
  SELECT   c.org_parent_id, c.org_parent_name
  ,        c.org_child_id, c.org_child_name
  FROM     jn AS p JOIN org_name AS c
  ON       c.org_parent_id = p.org_child_id)
SELECT   jn.org_parent_id, jn.org_parent_name
,        jn.org_child_id, jn.org_child_name
FROM     jn
ORDER BY 1;
 
-- Navigating up the tree from the 20th leaf-node child.
WITH org_name AS
 (SELECT   os.org_parent_id AS org_parent_id
  ,        o1.organization_name AS org_parent_name
  ,        os.org_child_id AS org_child_id
  ,        o2.organization_name AS org_child_name
  FROM     dbo.organization o1 RIGHT JOIN dbo.org_structure os 
  ON       o1.organization_id = os.org_parent_id RIGHT JOIN dbo.organization o2
  ON       o2.organization_id = os.org_child_id)
, jn AS
 (SELECT   org_parent_id, org_parent_name
  ,        org_child_id, org_child_name
  FROM     org_name
  WHERE    org_child_id = 20  
  UNION ALL
  SELECT   c.org_parent_id, c.org_parent_name
  ,        c.org_child_id, c.org_child_name
  FROM     jn AS p JOIN org_name AS c
  ON       c.org_child_id = p.org_parent_id)
SELECT   jn.org_parent_id, jn.org_parent_name
,        jn.org_child_id, jn.org_child_name
FROM     jn
ORDER BY 1 DESC;

Written by maclochlainn

April 3rd, 2009 at 8:28 pm