MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

Archive for the ‘Regular Expression’ Category

Single Wildcard Operator

with 2 comments

Somebody wanted to understand why you can backquote a single wildcard operator (that’s the underscore _ character) in MySQL, but can’t in Oracle. The answer is you can in Oracle when you know that you required an additional clause.

While I prefer using regular expression resolution, the LIKE operator is convenient. Here’s an example of backquoting an underscore in MySQL, where it looks for any string with an underscore anywhere in the string:

SELECT   common_lookup_type
FROM     common_lookup
WHERE    common_lookup_type LIKE '%\_%';

You can gain the same behavior in Oracle by appending the ESCAPE '\' clause, like this:

SELECT   common_lookup_type
FROM     common_lookup
WHERE    common_lookup_type LIKE '%\_%' ESCAPE '\';

The ESCAPE '\' clause is one of those Oracle details that often gets lost. It only works when the SQL*Plus ESCAPE parameter is set to OFF.

The SQL*Plus ESCAPE parameter default value is a backslash. and when the ESCAPE parameter is enabled a statement like this raises the following exception:

ERROR at line 3:
ORA-01425: escape CHARACTER must be CHARACTER string OF LENGTH 1

If you drop the ESCAPE '\' clause with the ESCAPE parameter enabled it will return all rows from the table not just those strings with an underscore in the string. Hope this helps if need to look for an underscore in a table.

Written by maclochlainn

June 22nd, 2012 at 12:41 am

MySQL REGEXP Error

with one comment

While working through prepared statements in MySQL, there was an interesting MySQL regular expression question raised. A student wanted to know how to address the following error message:

ERROR 1139 (42000): Got error 'repetition-operator operand invalid' FROM REGEXP

They had substituted * for a .+ in a metasequence. A metasequence is a parenthetical expression that evaluates based on multiple alternative conditions, and the pipe (|) acts as an OR operator. The full code example is found on page 482 of the Oracle Database 11g & MySQL 5.6 Developer Handbook. The student’s change would have worked without an error had he replaced the metasequence with .* instead of the solitary *.

The original call to the procedure passes the following well formed regular expression:

CALL prepared_dml('(^|^.+)war(.+$|$)');

Or, they could eliminate the metasequences and use:

CALL prepared_dml('^.*war.*$');

Either returns the following entries from a column with movie titles from the sample code:

Charlie's War
Star Wars I
Star Wars II
Star Wars III

The dot (.) means any possible character, and the plus (+) means one-to-many possible repeating characters of a preceding character. When the dot precedes the plus, it means one-to-many wildcard characters. The student replaced the metasequence with an asterisk by itself and generated the badly formed regular expression error.

The misunderstanding occurs because the asterisk (*) by itself doesn’t mean zero-to-many wildcard. The combination of the dot and asterisk creates a zero-to-many wildcard, which works when there is or isn’t a character before the first character of a string or after the last character of a string. It also eliminates the need for a metasequence.

Here’s a small test case outside of the book’s stored procedure:

-- Conditionally drop the table.
DROP TABLE IF EXISTS list;
 
-- Create the table.
CREATE TABLE list ( list_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, list_item VARCHAR(20));
 
-- Insert for rows.
INSERT INTO list (list_item) VALUES ('Star'),(' Star'),(' Star '),('Star ');
 
-- Query for zero-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "Zero-to-many leading characters "
,        LENGTH(list_item) AS "Length"
,        '^.*Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.*$';
 
-- Query for zero-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "One-to-many leading characters  "
,        LENGTH(list_item) AS "Length"
,        '^.+Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.+Star.*$';
 
-- Query for one-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "Zero-to-many trailing characters"
,        LENGTH(list_item) AS "Length"
,        '^.*Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.*$';
 
-- Query for one-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "One-to-many trailing characters "
,        LENGTH(list_item) AS "Length"
,        '^.*Star.+$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.+$';

The output from the scripts is:

+----+----------------------------------+--------+--------------------+
| ID | Zero-to-many leading characters  | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  1 | Star                             |      4 | ^.*Star.*$         |
|  2 |  Star                            |      5 | ^.*Star.*$         |
|  3 |  Star                            |      6 | ^.*Star.*$         |
|  4 | Star                             |      5 | ^.*Star.*$         |
+----+----------------------------------+--------+--------------------+
4 rows in set (0.02 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | One-to-many leading characters   | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  2 |  Star                            |      5 | ^.+Star.*$         |
|  3 |  Star                            |      6 | ^.+Star.*$         |
+----+----------------------------------+--------+--------------------+
2 rows in set (0.00 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | Zero-to-many trailing characters | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  1 | Star                             |      4 | ^.*Star.*$         |
|  2 |  Star                            |      5 | ^.*Star.*$         |
|  3 |  Star                            |      6 | ^.*Star.*$         |
|  4 | Star                             |      5 | ^.*Star.*$         |
+----+----------------------------------+--------+--------------------+
4 rows in set (0.02 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | One-to-many trailing characters  | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  3 |  Star                            |      6 | ^.*Star.+$         |
|  4 | Star                             |      5 | ^.*Star.+$         |
+----+----------------------------------+--------+--------------------+
2 rows in set (0.02 sec)

Hope this helps.

Written by maclochlainn

March 23rd, 2012 at 11:01 am