Archive for the ‘SQL Server 2008’ Category
SQL Aggregation Tutorial
I’ve been working on a Basic Aggregation tutorial for my students. I think this might be close to what may benefit them. However, I thought it would be great to put it out there and solicit ideas. If you have some on improving this post, please let me know.
My first take at the post …
This is a lesson on basic aggregation in SQL. Aggregation in SQL means counting, adding, and grouping by results of counts or sums. Aggregation is a critical part of using the SQL language. At a basic level, aggregation includes the COUNT, SUM, AVERAGE, MAX, and MIN aggregation functions; and the ORDER BY, GROUP BY, and HAVING clauses.
You’ll find the setup scripts for these examples at the bottom of this blog page. The best way to use this page is to copy the setup code, run it in your database, and then test the examples as you work though them.
Data set ↓
This section discusses the data set. You unfold it by clicking on the bold text above.
The illustration of aggregation is a challenge because a trivial set of numbers doesn’t present much challenge or sometimes relevance but it shows the basic concept. A more applied sample set makes mental calculation and comprehension of functions more difficult. As a result of that trade off, this uses two copies of the ordinal numbers (ordinal numbers are the single digit integers from zero to nine).
After you run the setup script, you can display the data set with the following query.
-- Oracle only SQL*Plus formatting command to smooth output display. SET PAGESIZE 99 -- Query ordered by the sets and then the integers in ascending order. SELECT * FROM ordinal ORDER BY 2, 4;
You could have sorted them more easily by using the ORDINAL_ID surrogate key column but it wouldn’t demonstrate the ORDER BY clause. In some database, like Oracle and MySQL, the ORDER BY lets you use column names, column aliases, or the position order of the column in the SELECT clause. Unfortunately, this isn’t true across all databases.
The ORDER BY clause does let you see that you can set nested ordering, like the numbers inside the data sets that are qualified by the LIST_SET column. This is demonstrated in the output below.
ORDINAL_ID LIST_SET LIST_NAME LIST_VALUE
---------- -------------------- ---------- ----------
1 Value Set A Zero 0
2 Value Set A One 1
3 Value Set A Two 2
4 Value Set A Three 3
5 Value Set A Four 4
6 Value Set A Five 5
7 Value Set A Six 6
8 Value Set A Seven 7
9 Value Set A Eight 8
10 Value Set A Nine 9
11 Value Set A
12 Value Set B Zero 0
13 Value Set B One 1
14 Value Set B Two 2
15 Value Set B Three 3
16 Value Set B Four 4
17 Value Set B Five 5
18 Value Set B Six 6
19 Value Set B Seven 7
20 Value Set B Eight 8
21 Value Set B Nine 9
22 Value Set BSingle row aggregation function use ↓
This section discusses and demonstrates the five aggregation functions in the post. They return a single row of data. Like the other sections, you unfold this by clicking on the bold text above.
COUNT function
The COUNT function has two behaviors that may apply. One is when you count rows and the other is when you count values. Both return only a single row.
Counting by Reference: The following example demonstrates counting rows. In this scenario, the COUNT function takes a single call parameter, which is an asterisk (*). The asterisk actually counts the references to the rows in the table, and in Oracle is equivalent to COUNT(ROWID). The pseudo column ROWID points to the physical block address where the row is stored in the database. As such, the asterisk acts very much like a pointer in the C or C++ language (that is if you substitute a block address for a memory address).
SELECT COUNT(*) FROM ordinal;
It returns
COUNT(*) ---------- 22
When you count rows, you count them whether they contain values or not. It is possible that you could insert a meaningless row with null values for all columns in a table that’s unconstrained. Moreover, the COUNT function counts all rows no matter whether contain any values.
Counting by Value: The following examples demonstrate approaches to counting values. It’s important to note that when you count values, you ignore null values. You can also count distinct things or all things when you count by value.
The LIST_SET column doesn’t contain any null values (as you can see by inspecting the data set early in this blog page). The following counts the number of values in the LIST_SET column. You could also substitute COUNT(ALL list_set) and it would return the same thing because ALL is the default and always provided unless you override it with a DISTINCT keyword.
SELECT COUNT(list_set) FROM ordinal;
It returns:
COUNT(*) ---------- 22
Since the LIST_SET column only contains one of two values, you can count that to make sure. You do it with the following syntax:
SELECT COUNT(DISTINCT list_set) FROM ordinal;
It returns:
COUNT(*) ---------- 2
The LIST_NAME and LIST_VALUE both contain two null values. Let’s see what they return when we count all or distinct values. This example simplifies it a step more by performing both counts side-by-side. This is possible because both columns return a single row.
SELECT COUNT(ALL list_name) AS name_number , COUNT(DISTINCT list_value) AS value_number FROM ordinal;
It returns the twenty total string values found in the first column and the ten unique numeric values found in the second column. The COUNT function with the DISTINCT filter acts like a COUNT function with a GROUP BY clause on the LIST_VALUE column, as you’ll see later in the discussion.
NAME_NUMBER VALUE_NUMBER ----------- ------------ 20 10
SUM, AVERAGE, MAX, and MIN functions
Math Operations by Value: These math aggregations are done with the SUM, AVERAGE, MAX, and MIN aggregation functions. They’re restricted to columns that contain numeric values. Each of them support the default ALL or DISTINCT keywords and they behave much as the COUNT function discussed earlier.
There is a significant difference between their functionality. You can count unique strings but you can’t sum a column of numbers based on that other column without a GROUP BY clause. You see examples of the GROUP BY in the next part of this post.
Since these behave more or less the same, their examples are grouped together in sections by summing and averaging together, and taking the maximum and minimum values together. For reference, the ordinal numbers sum to 45, two sets of them are naturally 90; and the average of evenly dispersed numbers is half, or 45, and their unique values 4.5.
SELECT SUM(ALL list_value) AS sum_all , SUM(DISTINCT list_value) AS sum_distinct , AVG(ALL list_value) AS avg_all , AVG(DISTINCT list_value) AS avg_distinct FROM ordinal;
It returns:
SUM_ALL SUM_DISTINCT AVG_ALL AVG_DISTINCT ---------- ------------ ---------- ------------ 90 45 4.5 4.5
The next section makes the same comparison with the MAX, and MIN functions. The maximum or minimums are the same because they the uniqueness doesn’t change the minimum or maximum of the sets.
SELECT MIN(ALL list_value) AS min_all , MIN(DISTINCT list_value) AS min_distinct , MAX(ALL list_value) AS max_all , MAX(DISTINCT list_value) AS max_distinct FROM ordinal;
It returns:
MIN_ALL MIN_DISTINCT MAX_ALL MAX_DISTINCT ---------- ------------ ---------- ------------ 0 0 9 9
Multiple row aggregation function use ↓
This section discusses and demonstrates the five aggregation functions where they return more than a single row of data, which involves grouping by non-aggregated columns. This section covers the GROUP BY and HAVING clauses. Like the other sections, you unfold this by clicking on the bold text above.
COUNT function
Counting by Groups: The following example demonstrates how we can count rows or values by groups. The example counts rows by reference and value, but it does so by grouping the return values on the unique string values in the LIST_SET column.
1 2 3 4 5 | SELECT list_set AS grouping_by_column , COUNT(*) , COUNT(list_value) FROM ordinal GROUP BY list_set; |
It returns the following values:
GROUPING_BY_COLUMN COUNT(*) COUNT(LIST_VALUE) -------------------- ---------- ----------------- Value SET A 11 10 Value SET B 11 10
The results show that there are eleven rows for each set, and that ten of the rows contain values while one holds a null value.
You can also group on columns that contain null values or sets of columns. The following example groups by two columns, where one contains a null value.
1 2 3 4 5 6 7 8 | SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , COUNT(*) , COUNT(list_value) FROM ordinal WHERE list_set = 'Value Set A' GROUP BY list_set , list_name; |
It returns the following values:
GROUPING_BY_NOT_NULL GROUP_BY_NULL_TOO COUNT(*) COUNT(LIST_VALUE) -------------------- ----------------- ---------- ----------------- Value SET A Zero 1 1 Value SET A Five 1 1 Value SET A Three 1 1 Value SET A Four 1 1 Value SET A One 1 1 Value SET A Two 1 1 Value SET A Eight 1 1 Value SET A Nine 1 1 Value SET A Seven 1 1 Value SET A Six 1 1 Value SET A 1 0
The last row returned is the one where the grouping by column value and counted column value are null. You should note that the GROUP BY applies to a string column and the return set includes a row grouped by its null value. The count of null column values is always zero.
Another thing that you may have not noticed is that the rows are no longer in ascending numeric order by LIST_NAME or LIST_VALUE columns. There’s a multiple edge sword when you examine why this occurs. Basically with regard to the LIST_NAME column, Oracle and many other databases use advanced sorting algorithms which may leave non-ordered sets as outcomes. The LIST_VALUE column is not in the GROUP BY clause and therefore can’t be used as an ORDER BY key.
When you attempt to use a column outside of the GROUP BY clause to order the return set, like this:
1 2 3 4 5 6 7 8 9 | SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , COUNT(*) , COUNT(list_value) FROM ordinal WHERE list_set = 'Value Set A' GROUP BY list_set , list_name ORDER BY list_value; |
You would see the following error in an Oracle Database 10g instance:
ORDER BY list_value * ERROR at line 9: ORA-00979: NOT a GROUP BY expression
You can filter this particular set because the conditions are limited, but this wouldn’t be a good idea in many other solution sets. However, it certainly highlights some of the potential for extraordinary sorting of result sets.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , COUNT(*) , COUNT(list_value) FROM ordinal WHERE list_set = 'Value Set A' GROUP BY list_set , list_name ORDER BY CASE WHEN list_name = 'Zero' THEN 0 WHEN list_name = 'One' THEN 1 WHEN list_name = 'Two' THEN 2 WHEN list_name = 'Three' THEN 3 WHEN list_name = 'Four' THEN 4 WHEN list_name = 'Five' THEN 5 WHEN list_name = 'Six' THEN 6 WHEN list_name = 'Seven' THEN 7 WHEN list_name = 'Eight' THEN 8 WHEN list_name = 'Nine' THEN 9 END; |
It returns the following values:
GROUPING_BY_NOT_NULL GROUP_BY_NULL_TOO COUNT(*) COUNT(LIST_VALUE) -------------------- ----------------- ---------- ----------------- Value SET A Zero 1 1 Value SET A One 1 1 Value SET A Two 1 1 Value SET A Three 1 1 Value SET A Four 1 1 Value SET A Five 1 1 Value SET A Six 1 1 Value SET A Seven 1 1 Value SET A Eight 1 1 Value SET A Nine 1 1 Value SET A 1 0
A better way to sort this type of data would involve putting it into an inline view and then joining it against the base data set. This allows you to use a non-grouping column as the sort key in an ORDER BY clause. Here’s an example of that approach but note it does take system overhead and should only be done when SQL must perform the sort operation.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | SELECT ilv.grouping_by_not_null , ilv.group_by_null_too , ilv.row_count , ilv.column_count FROM (SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , COUNT(*) AS row_count , COUNT(list_value) AS column_count FROM ordinal WHERE list_set = 'Value Set A' GROUP BY list_set , list_name) ilv LEFT JOIN ordinal o ON ilv.grouping_by_not_null = o.list_set AND ilv.group_by_null_too = o.list_name ORDER BY o.list_value; |
That returns a perfectly ordered set because the LEFT JOIN lets us capture the null value in the inline view without matching the two columns in the join condition. It returns the following set:
GROUPING_BY_NOT_NULL GROUP_BY_NULL_TOO ROW_COUNT COLUMN_COUNT -------------------- ----------------- ---------- ------------ Value Set A Zero 1 1 Value Set A One 1 1 Value Set A Two 1 1 Value Set A Three 1 1 Value Set A Four 1 1 Value Set A Five 1 1 Value Set A Six 1 1 Value Set A Seven 1 1 Value Set A Eight 1 1 Value Set A Nine 1 1 Value Set A 1 0
SUM, AVERAGE, MAX, and MIN functions
Math Operations by Groups: These math aggregations are done with the SUM, AVERAGE, MAX, and MIN aggregation functions. Like the previous single row return examples, they’re restricted to columns that contain numeric values. While they also support the default ALL or DISTINCT keywords, it didn’t seem necessary to demonstrate more of those behaviors here.
The following sums, averages, and gets both the minimum and maximum value by unique LIST_SET column values:
SELECT list_set AS grouping_by_not_null , SUM(list_value) AS ordinal_sum , AVG(list_value) AS ordinal_avg , MIN(list_value) AS ordinal_min , MAX(list_value) AS ordinal_max FROM ordinal GROUP BY list_set;
It returns the following two rows:
GROUPING_BY_NOT_NULL ORDINAL_SUM ORDINAL_AVG ORDINAL_MIN ORDINAL_MAX -------------------- ----------- ----------- ----------- ----------- Value SET A 45 4.5 0 9 Value SET B 45 4.5 0 9
Expanding the GROUP BY criteria to include the LIST_NAME column, you see what happens when the SUM and AVG functions work with only null values. They always return null when adding only column values with a null value. This is handy as you’ll see in the selective aggregation discussion in a subsequent tutorial.
As done in the prior example, this filters out the second value set, to focus on what’s happening with the rows grouped on two columns.
1 2 3 4 5 6 7 8 9 10 | SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , SUM(list_value) AS ordinal_sum , AVG(list_value) AS ordinal_avg , MIN(list_value) AS ordinal_min , MAX(list_value) AS ordinal_max FROM ordinal WHERE NOT list_set = 'Value Set B' GROUP BY list_set , list_name; |
It returns the following set of records, which are clearly out of order based on the algorithms that optimize aggregation. It’s not hard to read because there aren’t too many rows returned.
GROUPING_BY_NOT_NULL GROUP_BY_NULL_TOO ORDINAL_SUM ORDINAL_AVG ORDINAL_MIN ORDINAL_MAX -------------------- ----------------- ----------- ----------- ----------- ----------- Value SET A Zero 0 0 0 0 Value SET A Five 5 5 5 5 Value SET A Three 3 3 3 3 Value SET A Four 4 4 4 4 Value SET A One 1 1 1 1 Value SET A Two 2 2 2 2 Value SET A Eight 8 8 8 8 Value SET A Nine 9 9 9 9 Value SET A Seven 7 7 7 7 Value SET A Six 6 6 6 6 Value SET A
If SQL must sort these into numeric order, it can do so like the prior example with the result from the COUNT function. You must instrument a solution for it through an inline view. The following shows you how to do that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | SELECT ilv.grouping_by_not_null , ilv.group_by_null_too , ilv.ordinal_sum , ilv.ordinal_avg , ilv.ordinal_min , ilv.ordinal_max FROM (SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , SUM(list_value) AS ordinal_sum , AVG(list_value) AS ordinal_avg , MIN(list_value) AS ordinal_min , MAX(list_value) AS ordinal_max FROM ordinal WHERE list_set = 'Value Set B' GROUP BY list_set , list_name) ilv LEFT JOIN ordinal o ON ilv.grouping_by_not_null = o.list_set AND ilv.group_by_null_too = o.list_name ORDER BY o.list_value; |
Now you should see an ordered set like this:
GROUPING_BY_NOT_NULL GROUP_BY_NULL_TOO ORDINAL_SUM ORDINAL_AVG ORDINAL_MIN ORDINAL_MAX -------------------- ----------------- ----------- ----------- ----------- ----------- Value SET B Zero 0 0 0 0 Value SET B One 1 1 1 1 Value SET B Two 2 2 2 2 Value SET B Three 3 3 3 3 Value SET B Four 4 4 4 4 Value SET B Five 5 5 5 5 Value SET B Six 6 6 6 6 Value SET B Seven 7 7 7 7 Value SET B Eight 8 8 8 8 Value SET B Nine 9 9 9 9 Value SET B
Math Operations with the HAVING clause: The HAVING clause is very useful when you want to find rows that duplicate a column or set of column values. It’s also useful for sorting data sets.
The HAVING clause effects all aggregate rows and acts somewhat like a filter in the WHERE clause of a non-aggregated query. The difference is that the WHERE clause filters the base result set, while the HAVING clause filters the aggregated result set.
The following example captures the sum and average for only the odd LIST_VALUE column values. It does that by using modular (or modulo) mathematics through the MOD function in Oracle’s SQL implementation. Modulo math works by calculating the remainder of integer division. When you divide any number by two and the result is one, then you have an odd number.
Here’s the sample program that illustrates the HAVING clause in a CASE statement. The CASE statement only adds the number when it is an odd number. Any even number is zeroed out. The comparison operator ensures that only rows are returned when the SUM is greater than zero, which only happens for odd numbers.
1 2 3 4 5 6 7 8 9 10 11 | SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , SUM(list_value) AS ordinal_sum , AVG(list_value) AS ordinal_avg FROM ordinal WHERE list_set = 'Value Set A' HAVING SUM(CASE WHEN MOD(list_value,2) = 1 THEN list_value ELSE 0 END) > 0 GROUP BY list_set , list_name; |
The data set, as you’ve seen with others isn’t sorted in ascending order because of how the algorithms work. Since there are only five rows returned, it is easy to see that the HAVING clause let us narrow the return set to odd numbers. What we might overlook is that the HAVING is always a filtering statement. This means we need to have a comparative operator tied to the HAVING return value.
GROUPING_BY_NOT_NULL GROUP_BY_NULL_TOO ORDINAL_SUM ORDINAL_AVG -------------------- ----------------- ----------- ----------- Value Set A Five 5 5 Value Set A Three 3 3 Value Set A One 1 1 Value Set A Nine 9 9 Value Set A Seven 7 7
As you’ve seen in early examples, you can also put this inside an inline view and sort the aggregated results. Here’s the code to accomplish that.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | SELECT ilv.grouping_by_not_null , ilv.group_by_null_too , ilv.ordinal_sum , ilv.ordinal_avg FROM (SELECT list_set AS grouping_by_not_null , list_name AS group_by_null_too , SUM(list_value) AS ordinal_sum , AVG(list_value) AS ordinal_avg FROM ordinal WHERE list_set = 'Value Set A' HAVING SUM(CASE WHEN MOD(list_value,2) = 1 THEN list_value ELSE 0 END) > 0 GROUP BY list_set , list_name) ilv LEFT JOIN ordinal o ON ilv.grouping_by_not_null = o.list_set AND ilv.group_by_null_too = o.list_name ORDER BY o.list_value; |
It naturally returns the ordered set of aggregated results from only odd numbers in the base set of values.
GROUPING_BY_NOT_NULL GROUP_BY_NULL_TOO ORDINAL_SUM ORDINAL_AVG -------------------- ----------------- ----------- ----------- Value Set A One 1 1 Value Set A Three 3 3 Value Set A Five 5 5 Value Set A Seven 7 7 Value Set A Nine 9 9
Setup Script ↓
The drop down items, unfold with source code to seed the examples.
Oracle SQL Join Supporting Script ↓
This has the setup script for the example tables.
-- Conditionally drop the table and sequence. BEGIN FOR i IN (SELECT table_name FROM user_tables WHERE table_name = 'ORDINAL') LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name; END LOOP; FOR i IN (SELECT sequence_name FROM user_sequences WHERE sequence_name = 'ORDINAL_S1') LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name; END LOOP; END; / -- Create the aggregation sample table and sequence. CREATE TABLE ordinal ( ordinal_id NUMBER , list_set VARCHAR2(20) , list_name VARCHAR2(5) , list_value NUMBER); CREATE SEQUENCE ordinal_s1; -- Seeding values in the aggregation sample table. DECLARE -- Define local types. TYPE listg IS TABLE OF VARCHAR2(20); TYPE listn IS TABLE OF NUMBER; TYPE lists IS TABLE OF VARCHAR2(5); -- Declare local variables. groups LISTG := listg('Value Set A','Value Set B'); labels LISTS := lists('Zero','One','Two','Three','Four','Five','Six','Seven','Eight','Nine'); valuen LISTN := listn(0,1,2,3,4,5,6,7,8,9); BEGIN -- Outer loop sets the group level. FOR i IN 1..groups.COUNT LOOP -- Inner loop sets the row level. FOR j IN 1..labels.COUNT LOOP INSERT INTO ordinal VALUES (ordinal_s1.NEXTVAL,groups(i),labels(j),valuen(j)); END LOOP; -- Insert the null values for each group. INSERT INTO ordinal VALUES (ordinal_s1.NEXTVAL,groups(i),NULL,NULL); END LOOP; COMMIT; END; / -- Commit the inserts. COMMIT;
MySQL SQL Join Supporting Script ↓
This has the setup script for the example tables.
-- Conditionally drop the table. SELECT 'DROP TABLE ordinal' AS Statement; DROP TABLE IF EXISTS ordinal; -- Create the table. SELECT 'CREATE TABLE ordinal' AS Statement; CREATE TABLE ordinal ( ordinal_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT , list_set VARCHAR(20) , list_name VARCHAR(5) , list_value INT UNSIGNED); -- Seed the row values. SELECT 'INSERT INTO ordinal' AS Statement; INSERT INTO ordinal VALUES (NULL,'Value Set A','Zero','0') ,(NULL,'Value Set A','One','1') ,(NULL,'Value Set A','Two','2') ,(NULL,'Value Set A','Three','3') ,(NULL,'Value Set A','Four','4') ,(NULL,'Value Set A','Five','5') ,(NULL,'Value Set A','Six','6') ,(NULL,'Value Set A','Seven','7') ,(NULL,'Value Set A','Eight','8') ,(NULL,'Value Set A','Nine','9') ,(NULL,'Value Set A',NULL,NULL) ,(NULL,'Value Set B','Zero','0') ,(NULL,'Value Set B','One','1') ,(NULL,'Value Set B','Two','2') ,(NULL,'Value Set B','Three','3') ,(NULL,'Value Set B','Four','4') ,(NULL,'Value Set B','Five','5') ,(NULL,'Value Set B','Six','6') ,(NULL,'Value Set B','Seven','7') ,(NULL,'Value Set B','Eight','8') ,(NULL,'Value Set B','Nine','9') ,(NULL,'Value Set B',NULL,NULL); -- Commit the inserts. COMMIT;
The class, they survived …
The rumor is that my database is hard, but I’ve always hoped it was fun and laid the foundation of success for my students. As I walked into class to give them their final exam, they had their jackets on, which isn’t uncommon for Rexburg, Idaho in December. However, that’s normal outside but unusual in the heated classroom. Then, they all took their jackets off to show their new t-shirts.
They thought it would be fun to post on the blog, so here it is. The shirts says:
SELECT i.survived FROM michael_mclaughlin i WHERE class= CIT 320;
It’s unfortunate that they missed the enclosing quote marks around the string literal.
It should be like this:
SELECT i.survived FROM michael_mclaughlin i WHERE i.class= 'CIT 320';
Here’s to a great group of students who know how to read, write, and think SQL. Any openings out there for internships, please drop me a note.
Happy holidays!
Agnostic String Comparisons
Oracle™ spoils us for other databases. We begin to expect too much of other database products. At least, that’s the way that I felt while working comparative syntax samples out for my students. I assumed wrongly that all strings would compare based on case sensitive strings. I found that Oracle does that, but MySQL and SQL Server don’t do that with an ordinary = (equals) comparison operator.
Oracle
The = (equals) symbol compares case sensitive strings. The following query returns nothing because a title case string isn’t equal to an uppercase string (databases, as they should be).
SQL> SELECT 'Truth' AS OUTPUT FROM dual WHERE 'String' = 'STRING';
The query only resolves when the strings are case sensitive matches, like this:
SQL> SELECT 'Truth' AS OUTPUT FROM dual WHERE 'String' = 'String';
MySQL
The = (equals) symbol doesn’t compare case sensitive strings. The following query returns Truth (a case sensitive irony) because a title case string is equal to an uppercase string when you make the comparison with an equality symbol. You may also chuckle that you need the DUAL pseudo table to make this work when there’s a WHERE clause, covered in yesterday’s blog.
SQL> SELECT 'Truth' AS OUTPUT FROM dual WHERE 'String' = 'STRING';
You can make it case sensitive by using the strcmp function, like this:
SQL> SELECT 'Truth' AS OUTPUT FROM dual WHERE strcmp('String','STRING');
You can also make it case sensitive by using collation, which is the ability to shift it’s character set. Unfortunately, MySQL doesn’t support casting to a binary string, so you must apply a different character set for the equality comparision.
SQL> SELECT 'Truth' AS OUTPUT 2> FROM dual WHERE 'String' COLLATE latin1_bin = 'STRING' COLLATE latin1_bin;
SQL Server
The = (equals) symbol doesn’t compare case sensitive strings. The following query returns Truth (another case sensitive irony) because a title case string is equal to an uppercase string when you make the comparison with an equality symbol.
1> SELECT 'Truth' AS Output 2> WHERE CAST('String' AS VARBINARY) = CAST('STRING' AS VARBINARY); 3> go
You can make SQL Server resolve case sensitive strings by casting both of the strings to VARBINARY data types, like this:
1> SELECT 'Truth' AS Output 2> WHERE CAST('String' AS VARBINARY) = CAST('String' AS VARBINARY); 3> go
There is a clear lack of portability for basic comparison operations. I think its time that folks drop that time worn database agnosticism line because all it means is I don’t want to use that other database. I’ve heard the line too often with regard to PL/SQL, but oddly not about MySQL’s stored procedures or Microsoft’s T-SQL. Maybe I’m an Oracle bigot but it sure seems more ANSI standard like to me to compare strings with a simple = (equals) operator.
SQL Automated Numbers
I’ve begun putting together an online database tutorial and expanded this entry and added horizontal scrolling to it. You can find the improved version of the blog post as blog page here.
Surrogate keys are interesting structures in databases. They’re essential if you want to make sure you optimize your design. They’re also very useful when you want to capture the automatic numbering value for a prior INSERT statement and reuse the automatic numbering value as the foreign key value in a subsequent statement. It was interesting to see how they’re implemented differently across Oracle, MySQL, and SQL Server while providing the same utility.
Below is a synopsis of how you implement these in Oracle, MySQL, and SQL Server.
Oracle
The first thing to qualify is that Oracle is generally always in a transactional mode. That means you don’t need to do anything special to set this example up.
Oracle doesn’t support automated numbering in tables. However, you can use sequences to mimic automated numbering. A sequence is a structure in the database that holds a current value, increments by a fixed value – typically 1. Sequences are available in SQL and PL/SQL scopes through two pseudo columns. The pseudo columns are .nextval and .currval (note the two r’s because it’s not a stray dog).
The sequence_name.nextval call in any session places the next number from the sequence into your Personal Global Area (PGA), which is a memory context. After you’ve called the sequence into memory, you can access it again by using sequence_name.currval. The sequence only changes when you call it again with the .nextval pseudo column.
-- Conditionally drop data sturctures - tables and sequences. BEGIN FOR i IN (SELECT table_name FROM user_tables WHERE table_name IN ('ONE','TWO')) LOOP EXECUTE IMMEDIATE 'DROP TABLE '||i.table_name||' CASCADE CONSTRAINT'; END LOOP; FOR i IN (SELECT sequence_name FROM user_sequences WHERE sequence_name IN ('ONE_S1','TWO_S1')) LOOP EXECUTE IMMEDIATE 'DROP SEQUENCE '||i.sequence_name; END LOOP; END; / -- Create base table and sequence. CREATE TABLE one ( one_id INT NOT NULL CONSTRAINT pk_one PRIMARY KEY , one_text VARCHAR(10) NOT NULL ); CREATE SEQUENCE one_s1; -- Create dependent table and sequence. CREATE TABLE two ( two_id INT NOT NULL CONSTRAINT pk_two PRIMARY KEY , one_id INT NOT NULL , two_text VARCHAR(10) NOT NULL ); CREATE SEQUENCE two_s1; -- Insert rows into the tables with sequence values. INSERT INTO one VALUES (one_s1.NEXTVAL,'One!'); INSERT INTO one VALUES (one_s1.NEXTVAL,'Two!'); INSERT INTO two VALUES (two_s1.NEXTVAL, one_s1.currval,'Other Two!'); -- Display the values inserted with sequences. SELECT o.one_id , o.one_text , t.two_id , t.two_text FROM one o JOIN two t ON o.one_id = t.one_id;
If you mimic automatic numbering with database triggers, you don’t have access to the .currval value for the second INSERT statement. This requires you to keep the primary key value for the first table in a locally scoped variable for reuse. Then, you can pass it to the next INSERT statement. Alternatively, you can make the potentially erroneous assumption that you’re the only user updating the table and query the highest sequence number from the prior table.
While a database trigger would make sharing previously accessed sequence values, it does provide Oracle with a pseudo automatic numbering type. Here’s the generic trigger you would need.
CREATE OR REPLACE TRIGGER one_t1 BEFORE INSERT ON one FOR EACH ROW WHEN (new.one_id IS NULL) -- You want to include this for bulk inserts. BEGIN :new.one_id := one_s1.NEXTVAL; END; /
You should note that triggers effective run in a subshell with access only to the immediate shell that fired them. Therefore, you can’t set a bind variable in a SQL*Plus session and subsequently reference it inside the trigger body because it doesn’t have access to the variable. The WHEN clause lets you by-pass the trigger for bulk inserts but realistically it’ll still fire. It’s important to include it in the event somebody forgets to disable the trigger for a bulk insert. When you forget the clause in the code and a maintenance programmer forgets to disable it before a batch program execution, you typically see a unique key violation error.
MySQL
MySQL supports automatic numbering but not a default transactional mode like Oracle. You need to disable auto commit and start a transaction. You also need to assign the last automatic numbering value to a variable before using it in a subsequent INSERT statement. You must also provide an overriding list of mandatory columns when you opt to exclude the automated numbering column value. The one thing that we should all appreciate about MySQL is their desire to stay close to and comply with ANSI standards.
-- Conditionally drop the tables. DROP TABLE IF EXISTS one; DROP TABLE IF EXISTS two; -- Create the tables with a surrogate key that automatically increments. CREATE TABLE one ( one_id int PRIMARY KEY AUTO_INCREMENT , one_text varchar(20)); CREATE TABLE two ( two_id int PRIMARY KEY AUTO_INCREMENT , one_id int , two_text varchar(20)); -- Start transaction cycle. START TRANSACTION; -- Insert first row, transfer auto increment to memory. INSERT INTO one (one_text) VALUES ('One'); -- Assign last auto increment to local scope variable, the = works too. SET @one_fk := last_insert_id(); -- Insert second row with auto increment and local scope variable. INSERT INTO b (one_id, two_text) VALUES (@one_fk,'Two'); COMMIT; -- Display the values inserted with auto incremented values. SELECT o.one_id , o.one_text , t.two_id , t.two_text FROM one o JOIN two t ON o.one_id = t.one_id;
SQL Server
SQL Server supports automatic numbering but they call it the identity value. There are two ways to use it but the one I’m showing is for SQL Server 2005 or newer. You can replace the older @@identity for the SCOPE_IDENTITY() function call but Microsoft has already removed first level support from SQL Server 2000. While they’ve not said @@identity is deprecated, it sure appears that’s possible in a future release.
USE student; BEGIN TRAN; -- Conditionally drop tables when they exist. IF OBJECT_ID('dbo.one','U') IS NOT NULL DROP TABLE dbo.one; IF OBJECT_ID('dbo.two','U') IS NOT NULL DROP TABLE dbo.two; -- Create auto incrementing tables. CREATE TABLE one ( one_id INT NOT NULL IDENTITY(1,1) CONSTRAINT pk_one PRIMARY KEY , one_text VARCHAR(10) NOT NULL ); CREATE TABLE two ( two_id INT NOT NULL IDENTITY(1,1) CONSTRAINT pk_two PRIMARY KEY , one_id INT NOT NULL , two_text VARCHAR(10) NOT NULL ); -- Insert the values, and magically no override signature required. INSERT INTO one VALUES ('One!'); INSERT INTO one VALUES ('Two!'); INSERT INTO two VALUES (SCOPE_IDENTITY(),'Other Two!'); -- Query the results. SELECT o.one_id , o.one_text , t.two_id , t.two_text FROM one o JOIN two t ON o.one_id = t.one_id; COMMIT TRAN;
You should note that T-SQL doesn’t require an override signature when you use an automatic numbering column. This is different, isn’t it?
While the prior example works with two tables, it doesn’t scale to a series of tables. You should consider the following assignment pattern when you’ll have multiple last identity values in a single transaction scope.
DECLARE @one_pk AS INT; SET @one_pk = SCOPE_IDENTITY();
As mentioned, this style is important when you’ve got a series of primary and foreign keys to map in the scope of a single transaction. Also, I’d suggest that you put all the declarations at the beginning of the transaction’s scope.
As always, I hope this helps some folks.
Fix SQL Server 2008 Client
I finally got back to my Microsoft SQL Server 2008 Express installation for a bit of comparison documentation in the sqlcmd.exe (the client tool peer to sqlplus and mysql). After all the energy to install it, I found it didn’t work.
When I tried to connect with:
C:\>sqlcmd.exe
It raised the following error:
HResult 0x2, Level 16, State 1 Named Pipes Provider: Could not open a connection to SQL Server [2]. Sqlcmd: Error: Microsoft SQL Server Native Client 10.0 : A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.. Sqlcmd: Error: Microsoft SQL Server Native Client 10.0 : Login timeout expired.
A bit of poking around yielded an answer on Chris Rasmussen’s blog and a bit more about .NET pipes on Jesse Johnston’s blog. It appears Microsoft SQL Server 2008 Express edition ships and installs with their client tool effectively shutoff.
The client executable uses a pipe to communicate to the server. You need to change the pipe from this default configuration:
\\.\pipe\MSSQL$SQLEXPRESS\sql\query
To this working version
\\.\pipe\sql\query
Then, you need to enable it and restart the service. Here are the step-by-step instructions, so you don’t have to poke around.
You’ll need to launch the SQL Server Configuration Manager to make these changes. You’ll find it by navigating to Start and then open the Microsoft SQL Server 2008 folder. You’ll then select Configuration Tools and launch SQL Server Configuration Manager, as shown in the screen shot.

Launching that provides you the following screen shot:

Open the SQL Server Network Configuration detail and choose Protocols for SQLEXPRESS. Double click on the Named Pipes protocol name.

You then enable the protocol and change the Pipe Name as noted above and shown in the screen shot below:

Click the Apply button, and the only remaining step requires you to restart the service. The easiest way (thanks to Griffth) is to click on SQL Server Services in the left dialog, then right click the SQL Server (SQLEXPRESS) service and choose Restart.

An alternative would be to open a command line window (cmd.exe) and launch the services console with the following command:
C:\> services.msc
Click on SQL Server (SQLEXPRESS), then click the Restart the services link in the middle column, as shown.

Now you should be able to launch the SQL Server 2008 Express command line console. You can get some basic help with the tool by doing the following:
C:\Data>sqlcmd -? Microsoft (R) SQL Server Command Line Tool Version 10.0.1600.22 NT INTEL X86 Copyright (c) Microsoft Corporation. All rights reserved. usage: Sqlcmd [-U login id] [-P password] [-S server] [-H hostname] [-E trusted connection] [-d use database name] [-l login timeout] [-t query timeout] [-h headers] [-s colseparator] [-w screen width] [-a packetsize] [-e echo input] [-I Enable Quoted Identifiers] [-c cmdend] [-L[c] list servers[clean output]] [-q "cmdline query"] [-Q "cmdline query" and exit] [-m errorlevel] [-V severitylevel] [-W remove trailing spaces] [-u unicode output] [-r[0|1] msgs to stderr] [-i inputfile] [-o outputfile] [-z new password] [-f <codepage> | i:<codepage>[,o:<codepage>]] [-Z new password and exit] [-k[1|2] remove[replace] control characters] [-y variable length type display width] [-Y fixed length type display width] [-p[1] print statistics[colon format]] [-R use client regional setting] [-b On error batch abort] [-v var = "value"...] [-A dedicated admin connection] [-X[1] disable commands, startup script, enviroment variables [and exit]] [-x disable variable substitution] [-? show syntax summary]
VMWare and PowerShell
There is a nasty little surprise Microsoft PowerShell can deliver when you try to install it in a virtual machine. You can’t do it if you’ve built the VMWare Fusion instance with the default SCSI hard drive option.
It appears that Microsoft PowerShell, a prerequisite for Microsoft SQL Server 2008 Express edition, can’t work with the SCSI driver. At least, when I rebuilt the Microsoft Vista instance with an IDE hard drive it worked.
If anybody knows the details of why it fails, please share it with me and other readers.
Another tidbit about installing Microsoft SQL Server 2008 Express without having installed the .NET Development Suite, you must patch it. These were required on Microsoft Vista (32-bit) before you could successfully install SQL Server 2008 Express Edition.
1. Microsoft .Net Framework 3.5 SP1 (even if you’re more current).
2. Microsoft PluginInstall (verifies you’ve a licensed copy).
3. Microsoft Installer 4.5.
4. Microsoft Windows PowerShell 1.0.
The list of files for the installation on Windows Vista 32-bit are:

The installation screen shots from my earlier install are here. You must also import the correct certificate. A pre-installation Microsoft .NET Application Security warning will stop the installation completely. If you click the warning, you’ll see this message:

Enter the URL in your browser and you’ll get the following if it was successful.

I choose not to enable automatic updates because it often runs at awkward times. Ultimately, the installation worked fine.
T-SQL Hierarchical Query
Playing around with Microsoft SQL Server 2008 Express edition, I’ve sorted through a bunch of tidbits. One that I thought was interesting, is how to perform a recursive or hierarchical query. This describes how you can perform the magic.
The official name of the WITH clause in Oracle’s lexicon (otherwise known as Oraclese) is a subquery factoring clause. You can find more on that in this earlier blog post. Microsoft has a different name for the WITH clause. They call it a Common Table Expression or CTE.
You perform recursive queries in Microsoft SQL Server 2008 by leveraging CTEs. I’ve modified the setup code from that earlier blog post to run in SQL Server 2008. You’ll find it at the bottom of this blog post.
Unless you want to write your own C# (.NET is the politically correct lingo) equivalent to Oracle’s SQL*Plus, you’ll need to run this script in the SQL Server Management Studio. Actually, you can use Microsoft SQL Server 2008’s command-line utility, which is called sqlcmd.exe but it is much less robust than SQL*Plus. In the Management Studio, you click File, then Open, and File… to load the file for execution, and then click the Execute button. You need to be careful you don’t click the Debug button, which is the green arrow to the right of the Execute button.
![]()
This is the magic query in the illustration. You can also find it in the source code. At the end of the day, I’m hard pressed to understand why they’d use a UNION ALL to support recursion.
The top-most CTE, or subquery factoring clause, simply joins the ORGANIZATION_NAME to the ORG_PARENT_ID and ORG_CHILD_ID columns to provide a single working source. The second CTE performs the recursion. The top-query sets the starting row, and the second query recursively navigates the tree. After all children are found, the first query moves to the next element in the table and recursively searches for its children.
You should note that the CTE self-references itself from inside the second query. Then, the external query (the non-CTE query) returns the results by querying the same CTE.
This logic behaves more like a nested loop, and actually fails to move down branches of the tree like a recursive program. Otherwise line 19 would be line 14 in the output. You could write another CTE to fix this shortfall, thereby mirroring a true recursive behavior, or you can write a stored procedure.
The illustrated query outputs the following hierarchical relationship, which navigates down the hierarchical tree:

You can also go up any branch of the tree by changing some of the logic. You’ll find the query to navigate up the tree as the second query in the setup script at the end of the blog. It renders the following output:

The blog will be updated if I discover the equivalent to the LEVEL in Oracle’s self-referencing semantics. If you know it, please share it with everybody.
Setup Script
Microsoft SQL Server 2008 Join Script
USE student; BEGIN TRAN; -- Conditionally drop tables when they exist. IF OBJECT_ID('dbo.ORGANIZATION','U') IS NOT NULL DROP TABLE dbo.ORGANIZATION; IF OBJECT_ID('dbo.ORG_STRUCTURE','U') IS NOT NULL DROP TABLE dbo.ORG_STRUCTURE; -- Create the organization table. CREATE TABLE ORGANIZATION ( organization_id INT , organization_name VARCHAR(10)); -- Seed the organizations. INSERT INTO dbo.ORGANIZATION VALUES (1,'One'), (2,'Two'), (3,'Three'), (4,'Four'), (5,'Five') ,(6,'Six'), (7,'Seven'), (8,'Eight'), (9,'Nine'), (10,'Ten') ,(11,'Eleven'), (12,'Twelve'), (13,'Thirteen'), (14,'Fourteen'), (15,'Fifteen') ,(16,'Sixteen'), (17,'Seventeen'), (18,'Eighteen'), (19,'Nineteen'), (20,'Twenty'); -- Create the organization structure table that holds the recursive key. CREATE TABLE org_structure ( org_structure_id INT , org_parent_id INT , org_child_id INT ); -- Seed the organization structures. INSERT INTO org_structure VALUES ( 1, 0, 1),( 1, 1, 2),( 1, 1, 3),( 1, 1, 4),( 1, 2, 5) ,( 1, 2, 6),( 1, 3, 7),( 1, 3, 8),( 1, 4, 9),( 1, 4,10) ,( 1, 5,11),( 1, 5,12),( 1, 6,13),( 1, 6,14),( 1, 7,15) ,( 1, 8,16),( 1, 8,17),( 1, 9,18),( 1, 9,19),( 1,14,20); COMMIT TRAN; -- Navigating down the tree from the root node. WITH org_name AS (SELECT os.org_parent_id AS org_parent_id , o1.organization_name AS org_parent_name , os.org_child_id AS org_child_id , o2.organization_name AS org_child_name FROM dbo.organization o1 RIGHT JOIN dbo.org_structure os ON o1.organization_id = os.org_parent_id RIGHT JOIN dbo.organization o2 ON o2.organization_id = os.org_child_id) , jn AS (SELECT org_parent_id, org_parent_name , org_child_id, org_child_name FROM org_name WHERE org_parent_id = 1 UNION ALL SELECT c.org_parent_id, c.org_parent_name , c.org_child_id, c.org_child_name FROM jn AS p JOIN org_name AS c ON c.org_parent_id = p.org_child_id) SELECT jn.org_parent_id, jn.org_parent_name , jn.org_child_id, jn.org_child_name FROM jn ORDER BY 1; -- Navigating up the tree from the 20th leaf-node child. WITH org_name AS (SELECT os.org_parent_id AS org_parent_id , o1.organization_name AS org_parent_name , os.org_child_id AS org_child_id , o2.organization_name AS org_child_name FROM dbo.organization o1 RIGHT JOIN dbo.org_structure os ON o1.organization_id = os.org_parent_id RIGHT JOIN dbo.organization o2 ON o2.organization_id = os.org_child_id) , jn AS (SELECT org_parent_id, org_parent_name , org_child_id, org_child_name FROM org_name WHERE org_child_id = 20 UNION ALL SELECT c.org_parent_id, c.org_parent_name , c.org_child_id, c.org_child_name FROM jn AS p JOIN org_name AS c ON c.org_child_id = p.org_parent_id) SELECT jn.org_parent_id, jn.org_parent_name , jn.org_child_id, jn.org_child_name FROM jn ORDER BY 1 DESC;
SQL Server 2008 Express
Installing Microsoft SQL Server 2008 Express was an interesting experience. After downloading it, the software patched and rebooted my Microsoft Vista machine 4 times before beginning the installation. Once started, it only took 55 minutes and 37 seconds to install it.
Firewall Steps ↓
There are a few firewall steps that you need to perform before the installtion.
You should navigate to the Control Panel and open the Firewall dialog screen. These shots are taken based on Microsoft Vista Professional Edition. You should see a Firewall dialog like the following:

Click the Allow a program through Window Firewall link in the left hand side column. You’ll be taken to the following screen, where you should choose the Exceptions tab.

You can open ports by clicking on the Add port… button. It should take you to the following dialog.

You should open the relevant ports from the following table that are appropriate for your environment before beginning the installation.
| Port # | Purpose |
| 1433 | This is the default SQL Server port TCP connection. Since it appears you can’t deploy a named instance with SQL Server 2008 Expresss edition, this is the port you’ll need to open. |
| 1434 | This is the default SQL Server port UDP connection. It serves to provide the SQL Server browser service. |
| 135 | This is the Transact-SQL (T-SQL) debugger port, and it is only useful when using the Visual Studio. |
After you’ve opened the appropriate port, you can begin the installation.
SQL Server 2008 Express Installation
Here are the steps if you’re interested in installing it.
After you download the file, you’ll need to launch the program from the following dialog. Click the Run button to continue.

The next screen requests you accept the license agreement. Please make sure you do that or you could run into trouble later on if you violate the EULA. Click the Accept button to continue.

The installation will start and then you get to wait. It is at this point that the installer will examine and patch your operating system. While you think you’re up-to-date, you’ll probably see a couple patches applied before the real installation starts.

After you start the installation, you’ll see this dialog. It’ll even reappear after completing the install, and ultimately you may need to click the Cancel button. You don’t have to worry at that point because it’ll be installed.

After you’ve started, this is the dialog you’ll get after applying operating system patches. You’ve really no choice but to reboot if you want to install SQL Server 2008 Express successfully.

This is the first installation screen. Make sure you select the correct version of SQL Server 2008 Express. Typically, you’ll want the one with Advanced Services. Click the appropriate radio button, and then the Install button.

When you see this dialog, you’re installation has truly begun. Don’t click the Cancel button now.

The file extraction takes several minutes. Have something ready while you wait or take a break.

After the file extraction, you’ll see the SQL Server 2008 Setup Support Rules dialog. Click the OK button to continue.

You only get a choice on this dialog if you’re installing the full version. Choose Next to continue.

Check the I accept the license terms box, and then the Next button to continue.

All the rest was plumbing. Now you start the SQL Server 2008 installation. Click the Install button to continue.

You’ll see this for a few minutes before you’re prompted to continue.

You should get all check marks but a lot of folks forget to open the firewall ports. If you forgot, this is what you’ll get as output. Return to the top of the post and open the required ports before continuing. When you’ve opened the ports, click the Next button to continue.

You’ll get this prompt only when you failed to open the ports before beginning the installation. Click the OK button to continue.

The Feature Selection dialog is important. You should probably make the same choices as shown if you’re deploying a development machine. That’s really all you’d want to do with SQL Server 2008 Express edition. Click the Next button to continue.

Now you accept the installation of SQLExpress as the named instance by clicking the Next button.

Now you accept the 1,446 MB space requirement by clicking the Next button.

The account name must be an authorized user account defined by the operating system or left blank. If you want to set it up for local system authentication against Operating System accounts, you must select from the list of values. If you want to provide credentials manually, leave it blank! Click the Next button to continue.

The easiest way to enter a valid credential is to click the Add Current User button. It’ll load it right where you see authorized user in the screen shot. Click Next to continue.

Choose which pieces of information you’d like to send. Not checking anything is also an option. Click Next button to continue.

Now the install will check if everything works before attempting it with the rules you’ve entered. Click the Next button to continue.

After verifying the installation is possible, you’ll see what you’re installing before you click the third Install button to continue.

It is now time for another break while SQL Server 2008 Express edition installs.

If you arrive at the next screen, SQL Server 2008 Express is installed.

Now you’ll see the final installation dialog and the link to the installation log file.

Now the final prompt before configuration. Yes, you must now reboot the system again.

You’ve now installed SQL Server 2008 Express. However, if you want to use the command line client tool, sqlcmd.exe, there’s more work. I posted those steps in this other blog entry.
