Archive for the ‘Regular Expression’ Category
Single Wildcard Operator
Somebody wanted to understand why you can backquote a single wildcard operator (that’s the underscore _
character) in MySQL, but can’t in Oracle. The answer is you can in Oracle when you know that you required an additional clause.
While I prefer using regular expression resolution, the LIKE
operator is convenient. Here’s an example of backquoting an underscore in MySQL, where it looks for any string with an underscore anywhere in the string:
SELECT common_lookup_type FROM common_lookup WHERE common_lookup_type LIKE '%\_%'; |
You can gain the same behavior in Oracle by appending the ESCAPE '\'
clause, like this:
SELECT common_lookup_type FROM common_lookup WHERE common_lookup_type LIKE '%\_%' ESCAPE '\'; |
The ESCAPE '\'
clause is one of those Oracle details that often gets lost. It only works when the SQL*Plus ESCAPE
parameter is set to OFF
.
The SQL*Plus ESCAPE
parameter default value is a backslash. and when the ESCAPE parameter is enabled a statement like this raises the following exception:
ERROR at line 3: ORA-01425: escape CHARACTER must be CHARACTER string OF LENGTH 1 |
If you drop the ESCAPE '\'
clause with the ESCAPE
parameter enabled it will return all rows from the table not just those strings with an underscore in the string. Hope this helps if need to look for an underscore in a table.
MySQL REGEXP Error
While working through prepared statements in MySQL, there was an interesting MySQL regular expression question raised. A student wanted to know how to address the following error message:
ERROR 1139 (42000): Got error 'repetition-operator operand invalid' FROM REGEXP |
They had substituted *
for a .+
in a metasequence. A metasequence is a parenthetical expression that evaluates based on multiple alternative conditions, and the pipe (|
) acts as an OR
operator. The full code example is found on page 482 of the Oracle Database 11g & MySQL 5.6 Developer Handbook. The student’s change would have worked without an error had he replaced the metasequence with .*
instead of the solitary *
.
The original call to the procedure passes the following well formed regular expression:
CALL prepared_dml('(^|^.+)war(.+$|$)'); |
Or, they could eliminate the metasequences and use:
CALL prepared_dml('^.*war.*$'); |
Either returns the following entries from a column with movie titles from the sample code:
Charlie's War Star Wars I Star Wars II Star Wars III |
The dot (.
) means any possible character, and the plus (+
) means one-to-many possible repeating characters of a preceding character. When the dot precedes the plus, it means one-to-many wildcard characters. The student replaced the metasequence with an asterisk by itself and generated the badly formed regular expression error.
The misunderstanding occurs because the asterisk (*
) by itself doesn’t mean zero-to-many wildcard. The combination of the dot and asterisk creates a zero-to-many wildcard, which works when there is or isn’t a character before the first character of a string or after the last character of a string. It also eliminates the need for a metasequence.
Here’s a small test case outside of the book’s stored procedure:
-- Conditionally drop the table. DROP TABLE IF EXISTS list; -- Create the table. CREATE TABLE list ( list_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, list_item VARCHAR(20)); -- Insert for rows. INSERT INTO list (list_item) VALUES ('Star'),(' Star'),(' Star '),('Star '); -- Query for zero-to-many leading characters. SELECT list_id AS "ID" , list_item AS "Zero-to-many leading characters " , LENGTH(list_item) AS "Length" , '^.*Star.*$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.*Star.*$'; -- Query for zero-to-many leading characters. SELECT list_id AS "ID" , list_item AS "One-to-many leading characters " , LENGTH(list_item) AS "Length" , '^.+Star.*$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.+Star.*$'; -- Query for one-to-many leading characters. SELECT list_id AS "ID" , list_item AS "Zero-to-many trailing characters" , LENGTH(list_item) AS "Length" , '^.*Star.*$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.*Star.*$'; -- Query for one-to-many leading characters. SELECT list_id AS "ID" , list_item AS "One-to-many trailing characters " , LENGTH(list_item) AS "Length" , '^.*Star.+$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.*Star.+$'; |
The output from the scripts is:
+----+----------------------------------+--------+--------------------+ | ID | Zero-to-many leading characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 1 | Star | 4 | ^.*Star.*$ | | 2 | Star | 5 | ^.*Star.*$ | | 3 | Star | 6 | ^.*Star.*$ | | 4 | Star | 5 | ^.*Star.*$ | +----+----------------------------------+--------+--------------------+ 4 rows in set (0.02 sec) +----+----------------------------------+--------+--------------------+ | ID | One-to-many leading characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 2 | Star | 5 | ^.+Star.*$ | | 3 | Star | 6 | ^.+Star.*$ | +----+----------------------------------+--------+--------------------+ 2 rows in set (0.00 sec) +----+----------------------------------+--------+--------------------+ | ID | Zero-to-many trailing characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 1 | Star | 4 | ^.*Star.*$ | | 2 | Star | 5 | ^.*Star.*$ | | 3 | Star | 6 | ^.*Star.*$ | | 4 | Star | 5 | ^.*Star.*$ | +----+----------------------------------+--------+--------------------+ 4 rows in set (0.02 sec) +----+----------------------------------+--------+--------------------+ | ID | One-to-many trailing characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 3 | Star | 6 | ^.*Star.+$ | | 4 | Star | 5 | ^.*Star.+$ | +----+----------------------------------+--------+--------------------+ 2 rows in set (0.02 sec) |
Hope this helps.