MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘sql’ Category

SQL Injection Risks

with 11 comments

While I tried to deflect how you perform SQL Injection attacks against a MySQL procedure, my students requested that I post examples of what to do to avoid SQL injection, and what not to do to invite attacks. The best practice to avoid SQL injection attacks is too always bind inputs to data types, and avoid providing completely dynamic WHERE clauses.

Here’s the correct way to dynamically generate a result from a MySQL Stored Procedure:

CREATE PROCEDURE hello (IN pv_input VARCHAR(50))
BEGIN
  SELECT sample_id
  ,      sample_name
  FROM   sample
  WHERE  sample_name = pv_input;
END;
$$

A call to this hello procedure will only return the row or rows where the pv_input value matches the sample_name column value. Any attempt to exploit it like the one below fails.

CALL hello('\'Harriet\' OR 1 = 1');

It fails because there’s no matching 'Harriet' OR 1 = 1 in the table’s sample_name column. However, it works well when we submit 'Harriet' by herself, without the intended SQL inject phrase “OR 1 = 1“, as you can see:

+-----------+-------------+
| sample_id | sample_name |
+-----------+-------------+
|         2 | Harriet     |
+-----------+-------------+
1 row in set (0.00 sec)

There are two approaches that you should never put in your code because they can be readily exploited unless you carefully parse the incoming string parameter. The problem in both cases is causes by concatenation rather than binding. The first example is extremely unlikely as an error but possible.

CREATE PROCEDURE hello (IN pv_input VARCHAR(50))
BEGIN
  SET @sql := CONCAT('SELECT sample_id, sample_name FROM sample WHERE sample_name = ',pv_input);
  PREPARE stmt FROM @sql;
  EXECUTE stmt;
END;
$$

The last hello procedure using concatenation and a prepared statement is vulnerable to SQL injection. A call like the following would return all two rows in the sample table.

CALL hello('\'Juliette\' OR 1 = 1');

It would display:

+-----------+-------------+
| sample_id | sample_name |
+-----------+-------------+
|         1 | Hank        |
|         2 | Harriet     |
+-----------+-------------+
2 rows in set (0.00 sec)

While there’s no sample_name value of 'Juliette', the “OR 1 = 1” is true. Therefore, the SELECT statement filters out nothing and returns all the data from the table. It’s probably clear you’d never do this type of prepared statement inside a stored procedure, but most SQL Injection attacks exploit your scripting language implementation. Unfortunately, bad coding practices can infrequently expose this type of vulnerability; and they typically occur when a junior programmers is following a bad coding example.

A solution with the WHERE clause as part of the parameter would look like this:

CREATE PROCEDURE hello (IN pv_where VARCHAR(50))
BEGIN
  SELECT sample_id
  ,      sample_name
  FROM   sample
  pv_where;
END;
$$

The modified call:

CALL hello('WHERE sample_name = \'Juliette\' OR 1 = 1');

returns all rows from the table.

A solution without the WHERE clause as part of the parameter would look like the following but it fails during runtime and returns no rows [Updated in response to comment]. The failure has nothing to do with the comment’s dialog on the CONCAT function, which also added nothing to the example once I tested it.

CREATE PROCEDURE hello (IN pv_where VARCHAR(50))
BEGIN
  SELECT sample_id
  ,      sample_name
  FROM   sample
  WHERE  pv_where;
END;
$$

It’s simply works only when you provide a “1 = 1” or other comparison without embedded apostrophes (‘) but fails with embedded apostrophes. That means the following statement fails:

CALL hello('sample_name = \'Juliette\' OR 1 = 1');

but this SQL injection statement works:

CALL hello('1 = 1');

returns all rows from the table.

This example, when you omit the white space also works with embedded strings or numeric operands and an operator:

CREATE PROCEDURE hello (IN pv_where VARCHAR(50))
BEGIN
  SELECT sample_id
  ,      sample_name
  FROM   sample
  WHEREpv_where;
END;
$$

It returns all rows with a call like this:

CALL hello('sample_name = \'Juliette\' OR 1 = 1');

My take initially was that it might be a bug, and I logged one (Bug 68903). That’s was a dumb thing to do because WHEREpv_where simply becomes a table alias in the query.

In conclusion, the first example is a good practice. The other two should never exist! Well, they shouldn’t exist unless you’re parsing the web form inputs vigilantly.

Hope this helps those trying to understand how to avoid SQL injection attacks. Always try to solve dynamic SQL statement problems by binding variables into statements.

Written by maclochlainn

April 5th, 2013 at 8:35 pm

Conflict between identifiers

with one comment

Sometimes interesting problems lead to shock or dismay at the suppositions of why they occur. Why an ORA-22979 is raised is one of those, and the error is typically:

ERROR at line 1:
ORA-22979: cannot INSERT object VIEW REF OR user-defined REF

This error occurs on an INSERT statement if you follow the example from the Oracle 11gR2 Object-Relational Developer’s Guide, which also has various slightly modified examples in a couple PL/SQL books. It also happens on an UPDATE statement to populate REF values.

The conflict is typically between the uniqueness of the reference and an attempt to make a non-reference column of the object type a primary key constrained column and embedded object view. The source of the conflict is the OBJECT IDENTIFIER IS PRIMARY KEY associated with a primary key in the Oracle documentation. The two goals are mutually exclusive; only the reference or non-reference column can be the object identifier. Unfortunately, Oracle documentation contains both examples in different places without making any effective cross reference.

If you want to make a column of an object type a primary key for an object table (that is a table that uses an object type to define its structure) and the object view (the content of the embedded object type), you can’t include the OBJECT IDENTIFIER IS PRIMARY KEY clause when you want to populate the REF column of the object type. Here’s an example that uses a column of the object type as a primary key and leaves the REF column empty:

-- Create the BASE_T type, or specification for IDL.
CREATE OR REPLACE
  TYPE base_t IS OBJECT
  ( obj_id    NUMBER
  , obj_name  VARCHAR2(30)
  , obj_ref   REF base_t)
  NOT FINAL;
/

You can then create a table like the following:

CREATE TABLE base OF base_t
( obj_id CONSTRAINT base_pk PRIMARY KEY )
  OBJECT IDENTIFIER IS PRIMARY KEY;

Let’s insert some rows to test for ourselves that this fails when you try to assign references:

INSERT INTO base VALUES (base_t(1, 'Dwalin',NULL));
INSERT INTO base VALUES (base_t(2, 'Borfur',NULL));
INSERT INTO base VALUES (base_t(3, 'Gloin',NULL));
INSERT INTO base VALUES (base_t(4, 'Kili',NULL));
INSERT INTO base VALUES (base_t(5, 'Fili',NULL));

The following UPDATE statement attempts to assign references, but fails as shown below:

UPDATE   base b
SET      obj_ref = REF(b);

The UPDATE fails as shown:

UPDATE   base b
         *
ERROR at line 1:
ORA-22979: cannot INSERT object VIEW REF OR user-defined REF

The simple fix redefines the object table by removing the OBJ_ID column as an object identifier and primary key value. You do that by removing the OBJECT IDENTIFIER IS PRIMARY KEY clause because the column of the object type can be a primary key for the table without being an object view identifier. After you make the change, you can successfully update the table with object references. Object identifiers or references are unique and serve the same purpose of a primary key for the object view, and at the same time they can’t both exist.

CREATE TABLE base OF base_t

Inserting the same rows, you can now update the table to provide valid object references. Let’s experiment with how they work because that’s also not as clear as I’d like in the Oracle documentation.

The next statement creates a CHILD table that holds a reference to the BASE (or parent) table and another instance of the same BASE_T object type:

CREATE TABLE child
( child_id   NUMBER  CONSTRAINT child_pk PRIMARY KEY
, base_ref   REF base_t SCOPE IS base
, child      base_t);

The INSERT statement can’t use a VALUES clause because we MUST capture the reference (or in this case primary key) from the BASE (or parent) table. An INSERT statement with a query does the trick:

INSERT INTO child 
SELECT 1, obj_ref, base_t(1, 'Gimli',NULL)
FROM   base b
WHERE  b.obj_name = 'Gloin';

You should note that the reference for the CHILD table’s CHILD column isn’t set but is likewise not required for the example to work.

Now, lets perform an standard INNER JOIN (equijoin) between the two tables by using the references as primary and foreign keys. Please note the trick is referring to the table and column of the BASE (or parent) table and the table, column, and embedded OBJ_REF of the CHILD table.

COLUMN father FORMAT A10
COLUMN son    FORMAT A10
SELECT   b.obj_name AS "Father"
,        c.child.obj_name AS "Son"
FROM     base b INNER JOIN child c ON b.obj_ref = c.base_ref.obj_ref;

You get the following results:

Father     Son
---------- ----------
Gloin      Gimli

You can make a view of this table with either of these syntaxes:

CREATE OR REPLACE VIEW base_v OF base_t WITH OBJECT OID DEFAULT AS
SELECT * FROM base;

or,

CREATE OR REPLACE VIEW base_v OF base_t WITH OBJECT OID (obj_id) AS
SELECT * FROM base;

Hope it helps anybody trying it. Personally, I think it’s better to use collections of object types, but that’s much bigger discussion that I’ll save for the Oracle Database 12c PL/SQL Programming book that I’m writing.

Written by maclochlainn

March 30th, 2013 at 11:34 pm

Object Table Function View

with 2 comments

Somebody was trying to create a striped view based on a table’s start_date and end_date temporal columns. They asked for some help, so here are the steps (a two-minute tech-tip).

Basically, you create a user-defined data type, or structure:

1
2
3
4
CREATE OR REPLACE TYPE item_structure IS OBJECT
( id      NUMBER
, lookup  VARCHAR2(30));
/

Then, you create a list (an Oracle table) of the structure, like:

1
2
CREATE OR REPLACE TYPE item_lookup IS TABLE OF item_structure;
/

Lastly, you create an object table function, like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
CREATE OR REPLACE FUNCTION get_item_types RETURN item_lookup IS
 
  -- Declare a variable that uses the record structure.
  lv_counter      PLS_INTEGER := 1;
 
  -- Declare a variable that uses the record structure.
  lv_lookup_table  ITEM_LOOKUP := item_lookup();
 
  -- Declare static cursor structure.
  CURSOR c IS
    SELECT   cl.common_lookup_id AS lookup_id
    ,        SUBSTR(cl.common_lookup_meaning,1,60) AS lookup_meaning
    FROM     common_lookup cl
    WHERE    cl.common_lookup_table = 'ITEM'
    AND      cl.common_lookup_column = 'ITEM_TYPE'
    AND      SYSDATE BETWEEN cl.start_date AND NVL(cl.end_date,TRUNC(SYSDATE) + 1)
    ORDER BY cl.common_lookup_meaning;
 
BEGIN
 
  FOR i IN c LOOP
    lv_lookup_table.EXTEND;
 
    /* The assignment pattern for a SQL collection is incompatible with
       the cursor return type, and you must construct an instance of the
       object type before assigning it to a collection. */
    lv_lookup_table(lv_counter) := item_structure( i.lookup_id
                                                 , i.lookup_meaning );
 
    lv_counter := lv_counter + 1;
  END LOOP;
 
  /* Call an autonomous function or procedure here! It would allow you to
     capture who queried what and when; and acts like a pseudo trigger for
     queries. */
 
  RETURN lv_lookup_table;
END;
/

Now you can embed the object table function in a view, like this:

1
2
3
CREATE OR REPLACE VIEW item_lookup_view AS
  SELECT *
  FROM   TABLE(get_item_types);

Why not simply use an embedded query in the view, like the following?

SQL> CREATE OR REPLACE VIEW normal_view AS
  2    SELECT   cl.common_lookup_id AS lookup_id
  3      ,        SUBSTR(cl.common_lookup_meaning,1,60) AS lookup_meaning
  4      FROM     common_lookup cl
  5      WHERE    cl.common_lookup_table = 'ITEM'
  6      AND      cl.common_lookup_column = 'ITEM_TYPE'
  7      AND      SYSDATE BETWEEN cl.start_date AND NVL(cl.end_date,TRUNC(SYSDATE) + 1)
  8      ORDER BY cl.common_lookup_meaning;

My guess is that it was too easy but who knows, maybe they found a real need. The only need that I see occurs when you’re enforcing something like HIPPA and you want to capture unauthorized queries along with who performed them.

Naturally, I hope this helps those looking to resolve syntax errors when they have a need to do the more complex solution.

Written by maclochlainn

March 17th, 2013 at 10:59 pm

MySQL Auto Increment

with one comment

Somebody ran into a problem after reading about the MySQL CREATE statement and the AUTO_INCREMENT option. They couldn’t get a CREATE statement to work with an AUTO_INCREMENT value other than the default of 1. The problem was they were using this incorrect syntax:

CREATE TABLE elvira
( elvira_id    int unsigned PRIMARY KEY AUTO_INCREMENT=1001
, movie_title  varchar(60))
  ENGINE=InnoDB
  CHARSET=utf8;

It raises this error:

ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '=1001
, movie_title  varchar(60))
  ENGINE=InnoDB
  CHARSET=utf8' at line 2

They concluded that MySQL requires you to assign a default value of 1 as the initial automatic numbering value; then you use the ALTER statement to change the initial sequence value. That assumption is incorrect. The problem was with their assignment of an overriding AUTO_INCREMENT value inside the parenthetical list of columns. That assignment needs to occur after the list of columns and constraints, like

CREATE TABLE elvira
( elvira_id    int unsigned PRIMARY KEY AUTO_INCREMENT
, movie_title  varchar(60))
  ENGINE=InnoDB
  AUTO_INCREMENT=1001
  CHARSET=utf8;

It’s not an unlikely mistake since there’s no clear example on either of the referenced web pages (at the time of writing). It would be nice if they were added but I’m of the opinion some of the reference manual pages are too sparse.

After creating the table, you have the generic fix that seems to appear most often as an answer to setting or re-setting the auto numbering sequence of a MySQL table:

ALTER TABLE elvira AUTO_INCREMENT=1001;

Why would you set the auto numbering sequence to something other than 1? Some designers consider it a best practice to increment from a set point like 101 or 1,001 for your Application Programming Interface (API) and they leave a readily identifiable sequence number set for pre- or post-seeded data in tables. The region of preallocated numbers are typically only used in a few of the tables, in any application, but consistently managing sequences across all tables does sometimes simplifies data diagnostics.

Hope this helps somebody looking for a syntax fix. By the way, you can find it on Page 162 of the Oracle Database 11g & MySQL 5.6 Developer Handbook.

Written by maclochlainn

January 29th, 2013 at 1:10 am

Posted in MySQL,MySQL Workbench,sql

Tagged with ,

Placement over substance

with 5 comments

I was stunned when a SQL query raised an ERROR 1630 (42000) telling me the SUM function didn’t exist in MySQL 5.5.23. The fix was simple. The opening parenthesis of the SUM function must be on the same line as the SUM keyword without an intervening white space. Alternatively phrased, you can’t have a line return or white space between the SUM function name and the opening parenthesis of the call parameter list. The same rule doesn’t apply to the opening parenthesis of the FORMAT function and it seems to me that this parsing inconsistency is problematic.

Therefore, my surprise, observation, and complaint is that all functions don’t parse the same way, using the same rules. That is, unless you use specialized SQL_MODE settings. This assumption was borne out by Kolbe Kegel’s comment on this post, and there are 30 remaining built in functions that have specialized parsing and resolution markers.

A simplified version of the code that raises the error follows. As you’ll notice the opening parenthesis for the FORMAT and SUM function have intervening white space and a line return.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT   t.transaction_account AS "Transaction"
,        LPAD(FORMAT
           (SUM
             (CASE
                WHEN EXTRACT(MONTH FROM transaction_date) = 1 AND
                     EXTRACT(YEAR FROM transaction_date) = 2011 THEN
                  CASE
                    WHEN t.transaction_type = cl.common_lookup_type THEN
                      t.transaction_amount
                    ELSE
                      t.transaction_amount * -1
                  END
             END),2),10,' ') AS "JAN"
FROM     TRANSACTION t CROSS JOIN common_lookup cl
WHERE    cl.common_lookup_table = 'TRANSACTION'
AND      cl.common_lookup_column = 'TRANSACTION_TYPE'
AND      cl.common_lookup_type = 'DEBIT'
GROUP BY t.transaction_account;

Based on the comments, the SQL_MODE is:

mysql> SELECT @@version, @@sql_mode;
+-----------+----------------------------------------------------------------+
| @@version | @@sql_mode                                                     |
+-----------+----------------------------------------------------------------+
| 5.5.23    | STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION |
+-----------+----------------------------------------------------------------+
1 ROW IN SET (0.00 sec)

It raises the following error:

ERROR 1630 (42000): FUNCTION studentdb.SUM does NOT exist. CHECK the 'Function Name Parsing and Resolution' SECTION IN the Reference Manual

Moving ONLY the opening parenthesis to the end of the SUM keyword (or removing the line return and white space from between the SUM keyword and opening parenthesis) prevents the error but it would be more convenient if it supported both approaches. It seems odd that an intervening line return and white space for the SUM function raises an exception while the same intervening line return and white space doesn’t raise an exception for the FORMAT function. It strikes me the parser should support both or reject both. Here’s the fixed code that works without enabling the IGNORE_SPACE SQL Mode option.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT   t.transaction_account AS "Transaction"
,        LPAD(FORMAT
           (SUM(
              CASE
                WHEN EXTRACT(MONTH FROM transaction_date) = 1 AND
                     EXTRACT(YEAR FROM transaction_date) = 2011 THEN
                  CASE
                    WHEN t.transaction_type = cl.common_lookup_type THEN
                      t.transaction_amount
                    ELSE
                      t.transaction_amount * -1
                  END
             END),2),10,' ') AS "JAN"
FROM     TRANSACTION t CROSS JOIN common_lookup cl
WHERE    cl.common_lookup_table = 'TRANSACTION'
AND      cl.common_lookup_column = 'TRANSACTION_TYPE'
AND      cl.common_lookup_type = 'DEBIT'
GROUP BY t.transaction_account;

As noted by the comments, adding the IGNORE_SPACE to the SQL_MODE lets both queries work without moving the open parenthesis. You can do that in a session with the following syntax (which is covered in an older post):

SET SQL_MODE=(SELECT CONCAT(@@sql_mode,',IGNORE_SPACE'));

Hope this helps folks…

Written by maclochlainn

July 3rd, 2012 at 4:10 pm

Derived Table Aliases

without comments

In my database class, students write solutions as group exercises against the Oracle 11g XE database and then they port the solution individually to the MySQL 5.5 database. One of the students copied over a query like the one below to MySQL (a query used to track the expected number of row returns).

SELECT   COUNT(*)
FROM    (SELECT   DISTINCT
                  k.kingdom_id
         ,        kki.kingdom_name
         ,        kki.population
         FROM     kingdom_knight_import kki LEFT JOIN kingdom k
         ON       kki.kingdom_name = k.kingdom_name
         AND      kki.population = k.population);

It got an error they didn’t understand:

ERROR 1248 (42000): Every derived TABLE must have its own alias

Providing a dt query alias fixes the problem in MySQL for the following query. The fact that it was just an alias was a revelation to the student. That’s because Oracle databases don’t require aliases for inline views (what Oracle calls MySQL derived tables and Microsoft calls a Common Table Expression (CTE)).

SELECT   COUNT(*)
FROM    (SELECT   DISTINCT
                  k.kingdom_id
         ,        kki.kingdom_name
         ,        kki.population
         FROM     kingdom_knight_import kki LEFT JOIN kingdom k
         ON       kki.kingdom_name = k.kingdom_name
         AND      kki.population = k.population) dt;

MySQL requires that every derived table have its own alias. This make sense when you think about query optimization engines work, but that’s a story for a much longer post.

You can find more about inline views, derived tables, and common table expressions in this older post of mine. This post was promised so that future students can simply google and answer rather than ask me or a class tutor.

Written by maclochlainn

June 23rd, 2012 at 1:03 am

Single Wildcard Operator

with 2 comments

Somebody wanted to understand why you can backquote a single wildcard operator (that’s the underscore _ character) in MySQL, but can’t in Oracle. The answer is you can in Oracle when you know that you required an additional clause.

While I prefer using regular expression resolution, the LIKE operator is convenient. Here’s an example of backquoting an underscore in MySQL, where it looks for any string with an underscore anywhere in the string:

SELECT   common_lookup_type
FROM     common_lookup
WHERE    common_lookup_type LIKE '%\_%';

You can gain the same behavior in Oracle by appending the ESCAPE '\' clause, like this:

SELECT   common_lookup_type
FROM     common_lookup
WHERE    common_lookup_type LIKE '%\_%' ESCAPE '\';

The ESCAPE '\' clause is one of those Oracle details that often gets lost. It only works when the SQL*Plus ESCAPE parameter is set to OFF.

The SQL*Plus ESCAPE parameter default value is a backslash. and when the ESCAPE parameter is enabled a statement like this raises the following exception:

ERROR at line 3:
ORA-01425: escape CHARACTER must be CHARACTER string OF LENGTH 1

If you drop the ESCAPE '\' clause with the ESCAPE parameter enabled it will return all rows from the table not just those strings with an underscore in the string. Hope this helps if need to look for an underscore in a table.

Written by maclochlainn

June 22nd, 2012 at 12:41 am

Overloading Procedures

with 10 comments

A student asked, “Can you successfully overload MySQL stored procedures, like PL/SQL does in stored packages?” I said, “MySQL doesn’t formally support overloading stored procedures, and doesn’t support packages like Oracle 11g. You can, however, mimic (or fake) overloading with prepared statements. It requires a careful combination of stored procedures and session variables.”

Unfortunately, they asked for an example. So I wrote this set of code. It uses queries as the dynamic statements to make it simple to test but you can substitute INSERT, UPDATE, or DELETE statements. I didn’t provide type validation in the example, which would be required for dates or date-timestamps.

It’s presented in steps with test programs at each level. If you see an opportunity to improve on any of the example components, leave a comment. As to whether I’d implement this in production code, the answer is no but I believe all possibilities should at least be explored.

Step #1

The first step requires defining a stored procedure that sets a session variable. It’s fairly straight forward, but remember to change those DELIMITER values when testing. (As a note, you don’t require a semicolon after the END statement of a stored function or procedure in MySQL. You only need the redefined DELIMITER, which is $$ in these examples.)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
DELIMITER $$
 
-- Drop the procedure if it exists.
DROP PROCEDURE IF EXISTS set_session_var$$
 
-- Create the procedure.
CREATE PROCEDURE set_session_var
( pv_session_name   VARCHAR(32767)
, pv_session_value  VARCHAR(32767))
CONTAINS SQL
BEGIN
 
  /* Insert statement with auto commit enabled. */
  SET @sql := concat('SET',' ','@',pv_session_name,' := ','?');
  SELECT @sql AS "SQL String";
  PREPARE stmt FROM @sql;
  SET @sv_session_value := pv_session_value;
  EXECUTE stmt using @sv_session_value;
  DEALLOCATE PREPARE stmt;
 
END;
$$

The test case for the set_session_var is:

CALL set_session_var('sv_filter1','One');
CALL set_session_var('sv_filter2','Two');
SELECT @sv_filter1, @sv_filter2;

Important note: If you call this from another stored program you can’t use the stmt variable name in the calling program’s scope.

Step #2

The second step exists because you can’t pass arrays in MySQL (a restriction that also exists for T-SQL in Microsoft SQL Server). You have to pass a varying list of parameters as a serialized string. This is often called flexible parameter passing, which many PHP programmers leverage outside the database (flexible parameter passing is covered in this PHP tutorial I wrote a while back).

The deserialize example does three things:

  • Takes a serialized set of parameters and deserializes the parameters into a set of name-value pairs, which are stored by their name as session level variables.
  • It records the number of name-value pairs written as session variables in a temporary table. The use of a temporary table is required because a MySQL stored function can’t hold a dynamic statement or a call to a procedure with a dynamic statement.
  • The script checks for the existence of the table during execution and truncates it when found. It creates the table when it isn’t found. While it would be ideal to use a temporary table, they’re not disclosed in the the information_schema and therefore more tedious to manage. Creating the table in the MEMORY database avoids fragmenting the InnoDB tablespace.

Here’s the working code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
CREATE PROCEDURE deserialize
( pv_param_list VARCHAR(32767))
CONTAINS SQL
BEGIN
  DECLARE lv_name     VARCHAR(9) DEFAULT 'sv_filter';
  DECLARE lv_length   INT;
  DECLARE lv_start    INT DEFAULT 1;
  DECLARE lv_end      INT DEFAULT 1;
  DECLARE lv_counter  INT DEFAULT 1;
 
  /* Skip when call parameter list is null or empty. */	
  IF NOT (ISNULL(pv_param_list) OR LENGTH(pv_param_list) = 0) THEN
 
    /* Read line by line on a line return character. */
    parse: WHILE NOT (lv_end = 0) DO
 
      /* Check for line returns. */
      SET lv_end := LOCATE(',',pv_param_list,lv_start);
 
      /* Check whether line return has been read. */
      IF NOT lv_end = 0 THEN  /* Reset the ending substring value. */
        SET lv_end := LOCATE(',',pv_param_list,lv_start);
        CALL set_session_var(CONCAT(lv_name,lv_counter),SUBSTR(pv_param_list,lv_start,lv_end - lv_start));
      ELSE  /* Print the last substring with a semicolon. */
        CALL set_session_var(CONCAT(lv_name,lv_counter),SUBSTR(pv_param_list,lv_start,LENGTH(pv_param_list)));
      END IF;
 
      /* Reset the beginning of the string. */
      SET lv_start := lv_end + 1;      
      SET lv_counter := lv_counter + 1;
 
    END WHILE parse;    
 
  END IF;
 
  /* Check for a temporary table that holds a control variable,
     create the table if it doesn't exist, and remove rows from
     the table. */
  IF EXISTS (SELECT   null
             FROM     information_schema.tables
             WHERE    table_name = 'counter') THEN
    TRUNCATE TABLE counter;
  ELSE
    /* It would be ideal to use a temporary table here but then
       it's not recorded in the INFORMATION_SCHEMA and cleansing
       the temporary table is more tedious. */
    CREATE TABLE counter ( counter INT ) ENGINE=MEMORY;
  END IF;
 
  /* Insert the counter value for a list of parameters. */
  INSERT INTO counter VALUES ( lv_counter - 1 );  
 
END;
$$

Bill Karwin made a wonderful point about the overhead of looking up a table name and using a table rather than a temporary table. You can replace the logic above with a CONTINUE HANDLER and a temporary table. It should resolve the problems mentioned in the note but as mentioned earlier there are other solutions for production systems.

Bill also referenced a great post by Jay Pipes. It points out that there are serious problems scaling MySQL stored programs. This occurs because MySQL stored procedures are not compiled and stored in a global stored procedure cache, like Microsoft SQL Server or Oracle.

This deficit in MySQL certainly plays to the philosophy of developing everything in the application layer, suggested by Bill. At least, that is true when the application layer is co-resident on the server-tier as a Server Side Includes (SSI) library.

It does beg the question why Microsoft SQL Server and Oracle enable placing an efficient set of stored logic in the database server, effectively creating an application layer that can be split between an SSI library and the database server. After all, the server has been the most scalable component and the application layer can distribute to the middle-tier.

Today’s limitations on MySQL stored programs present a clear and compelling case toward their general avoidance and specific use for delivering a scalable solutions. Will that hold true beyond 2015 when licensing becomes completely proprietary to Oracle? I’d venture (a complete guess or hope on my part) a global stored procedure cache may be the future of MySQL procedures. It would certainly serve to distinguish MySQL from MariaDB and provide an incentive to remain on MySQL or to return from MariaDB to MySQL. Such a change would certainly make MySQL even more competitive against Microsoft SQL Server, which is the only rival purely relational database it competes against.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
CREATE PROCEDURE deserialize
( pv_param_list VARCHAR(32767))
CONTAINS SQL
BEGIN
  DECLARE lv_name     VARCHAR(9) DEFAULT 'sv_filter';
  DECLARE lv_length   INT;
  DECLARE lv_start    INT DEFAULT 1;
  DECLARE lv_end      INT DEFAULT 1;
  DECLARE lv_counter  INT DEFAULT 1;
 
  DECLARE CONTINUE HANDLER FOR 1146
    BEGIN
      /* Create a temporary table. */
      CREATE TEMPORARY TABLE counter ( counter INT ) ENGINE=MEMORY;
    END;
 
  /* Skip when call parameter list is null or empty. */	
  IF NOT (ISNULL(pv_param_list) OR LENGTH(pv_param_list) = 0) THEN
 
    /* Read line by line on a line return character. */
    parse: WHILE NOT (lv_end = 0) DO
 
      /* Check for line returns. */
      SET lv_end := LOCATE(',',pv_param_list,lv_start);
 
      /* Check whether line return has been read. */
      IF NOT lv_end = 0 THEN  /* Reset the ending substring value. */
        SET lv_end := LOCATE(',',pv_param_list,lv_start);
        CALL set_session_var(CONCAT(lv_name,lv_counter),SUBSTR(pv_param_list,lv_start,lv_end - lv_start));
      ELSE  /* Print the last substring with a semicolon. */
        CALL set_session_var(CONCAT(lv_name,lv_counter),SUBSTR(pv_param_list,lv_start,LENGTH(pv_param_list)));
      END IF;
 
      /* Reset the beginning of the string. */
      SET lv_start := lv_end + 1;      
      SET lv_counter := lv_counter + 1;
 
    END WHILE parse;    
 
  END IF;
 
  /* Truncate existing table. */
  TRUNCATE TABLE counter;
 
  /* Insert the counter value for a list of parameters. */
  INSERT INTO counter VALUES ( lv_counter - 1 );  
 
END;
$$

The test program for deserialize follows below. If you’re curious, those are the Italian version of one through four. 😉 I’ve a partiality for Italian after living in Italy for two years.

CALL deserialize('Uno,Due,Tre,Quattro');
SELECT counter AS "Parameter #" FROM counter;
SELECT @sv_filter1, @sv_filter2, @sv_filter3, @sv_filter4;

You can also eliminate the temporary table completely by replacing the counter with a @sv_counter session variable, as mentioned by Bill in his comment. The following is the version of deserialize uses a session variable.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
CREATE PROCEDURE deserialize
( pv_param_list VARCHAR(32767))
CONTAINS SQL
BEGIN
  DECLARE lv_name     VARCHAR(9) DEFAULT 'sv_filter';
  DECLARE lv_length   INT;
  DECLARE lv_start    INT DEFAULT 1;
  DECLARE lv_end      INT DEFAULT 1;
 
  /* Set a session variable to enable a calling scope to
     read it without a function return. */
  SET @sv_counter := 1;
 
  /* Skip when call parameter list is null or empty. */	
  IF NOT (ISNULL(pv_param_list) OR LENGTH(pv_param_list) = 0) THEN
 
    /* Read line by line on a line return character. */
    parse: WHILE NOT (lv_end = 0) DO
 
      /* Check for line returns. */
      SET lv_end := LOCATE(',',pv_param_list,lv_start);
 
      /* Check whether line return has been read. */
      IF NOT lv_end = 0 THEN  /* Reset the ending substring value. */
        SET lv_end := LOCATE(',',pv_param_list,lv_start);
        CALL set_session_var(CONCAT(lv_name,@sv_counter),SUBSTR(pv_param_list,lv_start,lv_end - lv_start));
      ELSE  /* Print the last substring with a semicolon. */
        CALL set_session_var(CONCAT(lv_name,@sv_counter),SUBSTR(pv_param_list,lv_start,LENGTH(pv_param_list)));
      END IF;
 
      /* Reset the beginning of the string. */
      SET lv_start := lv_end + 1;      
      SET @sv_counter := @sv_counter + 1;
 
    END WHILE parse;    
 
  END IF;
 
  /* Reduce by one for 1-based numbering of name elements. */
  SET @sv_counter := @sv_counter - 1;
 
END;
$$

The test case changes for this version because there isn’t a table involved. You would use the following:

CALL deserialize('Uno,Due,Tre,Quattro');
SELECT @sv_filter1, @sv_filter2, @sv_filter3, @sv_filter4;

Like the earlier procedure, there’s no exception handling but it wouldn’t be a bad idea to add it. Especially, if you’re using the new SIGNAL feature of MySQL 5.6, which is now backported to MySQL 5.5.

Step #3

The last step creates a store procedure that takes two parameters, a parameterized statement and a serialized list of parameters. This means you can build any statement on the fly as required. The example uses queries simply because they’re the easiest to demonstrate the process.

You should note that the prepared statement is dynamic_stmt not stmt because stmt is used inside the set_session_var procedure. That means using stmt in the calling scope program would impact the called scope program because they use the same identifier (more or less a namespace scope issue).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
CREATE PROCEDURE prepared_dml
( pv_query   VARCHAR(32767)
, pv_filter  VARCHAR(32767))
CONTAINS SQL
BEGIN
 
  /* Declare a local variable for the SQL statement. */
  DECLARE dynamic_stmt  VARCHAR(32767);
  DECLARE lv_counter    INT DEFAULT 0;
 
  /* Cleanup the message passing table when a case is not found. */
  DECLARE EXIT HANDLER FOR 1339
    BEGIN
      /* Step #5: */
      DEALLOCATE PREPARE dynamic_stmt;
 
      /* Cleanup the temporary table that exchanges data between
         procedures. */
      DROP TABLE IF EXISTS counter;
    END;
 
  /* Step #1:
     ========
     Set a session variable with two parameter markers. */
  SET @SQL := pv_query;
 
  /* Verify query is not empty. */
  IF NOT ISNULL(@SQL) THEN
 
    /* Step #2:
       ========
       Dynamically allocated and run statement. */
    PREPARE dynamic_stmt FROM @SQL;
 
    /* Step #3:
       ========
       Assign the formal parameters to session variables
       because prepared statements require them. */
    CALL deserialize(pv_filter);
 
    /* Secure the parameter count from a temporary table that
       exchanges data between procedures. */
    SELECT counter INTO lv_counter FROM counter;
 
    /* Step #4:
       ========
       Choose the appropriate overloaded prepared statement. */
    CASE
      WHEN lv_counter = 0 THEN
        EXECUTE dynamic_stmt;      
      WHEN lv_counter = 1 THEN
        EXECUTE dynamic_stmt USING @sv_filter1;
      WHEN lv_counter = 2 THEN
        EXECUTE dynamic_stmt USING @sv_filter1,@sv_filter2;
    END CASE;
 
    /* Step #5: */
    DEALLOCATE PREPARE dynamic_stmt;
 
    /* Cleanup the temporary table that exchanges data between
       procedures. */
    DROP TABLE IF EXISTS counter; 
 
  END IF;
 
END;
$$

If you’re using the deserialize procedure that uses a temporary table, you should make the following changes prepared_dml procedure on line 19 and 62. They modify the DROP TABLE statement to a DROP TEMPORARY TABLE statement. Though the change isn’t technically required because a DROP TABLE works against temporary tables as well as ordinary tables, it generally clearer to those new to MySQL syntax.

17
18
19
    /* Cleanup the temporary table that exchanges data between
       procedures. */
    DROP TEMPORARY TABLE IF EXISTS counter;

The line 62 change:

60
61
62
    /* Cleanup the temporary table that exchanges data between
       procedures. */
    DROP TEMPORARY TABLE IF EXISTS counter;

It would be much nicer to avoid the CASE statement or an if-else-if block but prepared statements are limited on what SQL statements they support. For example, you can create or drop stored functions or procedures, and you can’t dispatch a dynamic statement as a dynamic statement. That means you can’t replace the CASE statement with something like this:

43
44
45
46
    SET @dsql := 'EXECUTE dynamic_stmt USING @sv_filter1, @sv_filter2';
    PREPARE preparsed_stmt FROM @dsql;
    EXECUTE preparsed_stmt;
    DEALLOCATE PREPARE preparsed_stmt;

Attempt that, and you’ll get the following message. Does that mean there’s hope it’ll become a feature and possibility? Or, that it isn’t recommended?

ERROR 1295 (HY000): This command is not supported in the prepared statement protocol yet

The session variable solution requires another version of the prepared_dml procedure, as shown below. As you can imagine, removing the table and replacing a session variable simplifies the prepared_dml stored procedure version.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
CREATE PROCEDURE prepared_dml
( pv_query   VARCHAR(32767)
, pv_filter  VARCHAR(32767))
CONTAINS SQL
BEGIN
 
  /* Declare a local variable for the SQL statement. */
  DECLARE dynamic_stmt  VARCHAR(32767);
  DECLARE lv_counter    INT DEFAULT 0;
 
  /* Cleanup the message passing table when a case is not found. */
  DECLARE EXIT HANDLER FOR 1339
    BEGIN
      /* Step #5: */
      DEALLOCATE PREPARE dynamic_stmt;
    END;
 
  /* Step #1:
     ========
     Set a session variable with two parameter markers. */
  SET @SQL := pv_query;
 
  /* Verify query is not empty. */
  IF NOT ISNULL(@SQL) THEN
 
    /* Step #2:
       ========
       Dynamically allocated and run statement. */
    PREPARE dynamic_stmt FROM @SQL;
 
    /* Step #3:
       ========
       Assign the formal parameters to session variables
       because prepared statements require them. */
    CALL deserialize(pv_filter);
 
     /* Step #4:
       ========
       Choose the appropriate overloaded prepared statement. */
    CASE
      WHEN @sv_counter = 0 THEN
        EXECUTE dynamic_stmt;      
      WHEN @sv_counter = 1 THEN
        EXECUTE dynamic_stmt USING @sv_filter1;
      WHEN @sv_counter = 2 THEN
        EXECUTE dynamic_stmt USING @sv_filter1,@sv_filter2;
    END CASE;
 
    /* Step #5: */
    DEALLOCATE PREPARE dynamic_stmt;
 
  END IF;
 
END;
$$

The test programs check without a parameter and with one or two parameters, as you can see below.

SELECT 'Test Case #1 ...' AS "Statement";
SET @param1 := 'SELECT "Hello World"';
SET @param2 := '';
CALL prepared_dml(@param1,@param2);
 
SELECT 'Test Case #2 ...' AS "Statement";
SET @param1 := 'SELECT item_title FROM item i WHERE item_title REGEXP ?';
SET @param2 := '^.*war.*$';
CALL prepared_dml(@param1,@param2);
 
SELECT 'Test Case #3 ...' AS "Statement";
SET @param1 := 'SELECT common_lookup_type FROM common_lookup cl WHERE common_lookup_table REGEXP ? AND common_lookup_column REGEXP ?';
SET @param2 := 'item,item_type';
CALL prepared_dml(@param1,@param2);

As always, I hope this helps those writing MySQL Stored Procedures. If you’re new to stored programs, you can find a chapter on writing them in my Oracle Database 11g & MySQL 5.6 Developer Handbook or you can use Guy Harrison’s MySQL Stored Procedure Programming.

Written by maclochlainn

June 16th, 2012 at 7:36 pm

Oracle Within Group

without comments

Somebody asked me for a useful example of Oracle 11gR2’s new analytical LISTAGG function that uses a WITHIN GROUP syntax. They’d noticed an update to the askTom that showed how to use it. This post shows how to list values without a displayed aggregation column and how to use a JOIN and GROUP BY clause with the new analytical feature.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
COLUMN list      FORMAT A10
COLUMN last_name FORMAT A10
COLUMN names     FORMAT A42
COLUMN members   FORMAT 9,990
 
 
SELECT   m.account_number AS account
,        c.last_name AS last_name
,        LISTAGG(c.first_name||DECODE(c.middle_name,NULL,NULL,' '||SUBSTR(c.middle_name,1,1)||'.'),', ')
           WITHIN GROUP (ORDER BY 2) AS names
,        COUNT(*) AS members
FROM     contact c INNER JOIN member m USING (member_id)
GROUP BY m.account_number
,        c.last_name
ORDER BY c.last_name;

It produces the following output:

ACCOUNT    LAST_NAME  NAMES                                      MEMBERS
---------- ---------- ------------------------------------------ -------
SLC-000021 Jonah      Gretelz S.                                       1
SLC-000020 Moss       Jane W.                                          1
SLC-000023 Nathan     Smith B.                                         1
SLC-000024 Potter     Albus S., Ginny, Harry, James S., Lily L.        5
SLC-000022 Royal      Jennifer E.                                      1
SJC-000003 Sweeney    Ian M., Matthew, Meaghan                         3
SJC-000002 Vizquel    Doreen, Oscar                                    2
SLC-000018 Ward       Clinton G.                                       1
SLC-000019 Ward       Brandt H.                                        1
SJC-000001 Winn       Brian, Randi                                     2
 
10 rows selected.

I also found some existing examples you might like, at Oracle-Base, and there they’re showing you how to make it work in prior releases of the database without the new LISTAGG function.

Let’s say you just wanted a concatenated list of users, you could use the following in-line view approach:

1
2
3
4
5
SELECT   list.names
FROM    (SELECT   'List' AS list
         ,        LISTAGG(last_name ||', '||first_name||DECODE(middle_name,NULL,NULL,' '||SUBSTR(middle_name,1,1)||'.'),'; ')
                  WITHIN GROUP (ORDER BY 2) AS names
         FROM     contact) list;

As always, I hope this helps somebody.

Written by maclochlainn

April 7th, 2012 at 3:49 pm

MySQL REGEXP Error

with one comment

While working through prepared statements in MySQL, there was an interesting MySQL regular expression question raised. A student wanted to know how to address the following error message:

ERROR 1139 (42000): Got error 'repetition-operator operand invalid' FROM REGEXP

They had substituted * for a .+ in a metasequence. A metasequence is a parenthetical expression that evaluates based on multiple alternative conditions, and the pipe (|) acts as an OR operator. The full code example is found on page 482 of the Oracle Database 11g & MySQL 5.6 Developer Handbook. The student’s change would have worked without an error had he replaced the metasequence with .* instead of the solitary *.

The original call to the procedure passes the following well formed regular expression:

CALL prepared_dml('(^|^.+)war(.+$|$)');

Or, they could eliminate the metasequences and use:

CALL prepared_dml('^.*war.*$');

Either returns the following entries from a column with movie titles from the sample code:

Charlie's War
Star Wars I
Star Wars II
Star Wars III

The dot (.) means any possible character, and the plus (+) means one-to-many possible repeating characters of a preceding character. When the dot precedes the plus, it means one-to-many wildcard characters. The student replaced the metasequence with an asterisk by itself and generated the badly formed regular expression error.

The misunderstanding occurs because the asterisk (*) by itself doesn’t mean zero-to-many wildcard. The combination of the dot and asterisk creates a zero-to-many wildcard, which works when there is or isn’t a character before the first character of a string or after the last character of a string. It also eliminates the need for a metasequence.

Here’s a small test case outside of the book’s stored procedure:

-- Conditionally drop the table.
DROP TABLE IF EXISTS list;
 
-- Create the table.
CREATE TABLE list ( list_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, list_item VARCHAR(20));
 
-- Insert for rows.
INSERT INTO list (list_item) VALUES ('Star'),(' Star'),(' Star '),('Star ');
 
-- Query for zero-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "Zero-to-many leading characters "
,        LENGTH(list_item) AS "Length"
,        '^.*Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.*$';
 
-- Query for zero-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "One-to-many leading characters  "
,        LENGTH(list_item) AS "Length"
,        '^.+Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.+Star.*$';
 
-- Query for one-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "Zero-to-many trailing characters"
,        LENGTH(list_item) AS "Length"
,        '^.*Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.*$';
 
-- Query for one-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "One-to-many trailing characters "
,        LENGTH(list_item) AS "Length"
,        '^.*Star.+$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.+$';

The output from the scripts is:

+----+----------------------------------+--------+--------------------+
| ID | Zero-to-many leading characters  | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  1 | Star                             |      4 | ^.*Star.*$         |
|  2 |  Star                            |      5 | ^.*Star.*$         |
|  3 |  Star                            |      6 | ^.*Star.*$         |
|  4 | Star                             |      5 | ^.*Star.*$         |
+----+----------------------------------+--------+--------------------+
4 rows in set (0.02 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | One-to-many leading characters   | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  2 |  Star                            |      5 | ^.+Star.*$         |
|  3 |  Star                            |      6 | ^.+Star.*$         |
+----+----------------------------------+--------+--------------------+
2 rows in set (0.00 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | Zero-to-many trailing characters | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  1 | Star                             |      4 | ^.*Star.*$         |
|  2 |  Star                            |      5 | ^.*Star.*$         |
|  3 |  Star                            |      6 | ^.*Star.*$         |
|  4 | Star                             |      5 | ^.*Star.*$         |
+----+----------------------------------+--------+--------------------+
4 rows in set (0.02 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | One-to-many trailing characters  | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  3 |  Star                            |      6 | ^.*Star.+$         |
|  4 | Star                             |      5 | ^.*Star.+$         |
+----+----------------------------------+--------+--------------------+
2 rows in set (0.02 sec)

Hope this helps.

Written by maclochlainn

March 23rd, 2012 at 11:01 am