MacLochlainns Weblog

Michael McLaughlin's Technical Blog

Site Admin

MySQL REGEXP Error

with one comment

While working through prepared statements in MySQL, there was an interesting MySQL regular expression question raised. A student wanted to know how to address the following error message:

ERROR 1139 (42000): Got error 'repetition-operator operand invalid' FROM REGEXP

They had substituted * for a .+ in a metasequence. A metasequence is a parenthetical expression that evaluates based on multiple alternative conditions, and the pipe (|) acts as an OR operator. The full code example is found on page 482 of the Oracle Database 11g & MySQL 5.6 Developer Handbook. The student’s change would have worked without an error had he replaced the metasequence with .* instead of the solitary *.

The original call to the procedure passes the following well formed regular expression:

CALL prepared_dml('(^|^.+)war(.+$|$)');

Or, they could eliminate the metasequences and use:

CALL prepared_dml('^.*war.*$');

Either returns the following entries from a column with movie titles from the sample code:

Charlie's War
Star Wars I
Star Wars II
Star Wars III

The dot (.) means any possible character, and the plus (+) means one-to-many possible repeating characters of a preceding character. When the dot precedes the plus, it means one-to-many wildcard characters. The student replaced the metasequence with an asterisk by itself and generated the badly formed regular expression error.

The misunderstanding occurs because the asterisk (*) by itself doesn’t mean zero-to-many wildcard. The combination of the dot and asterisk creates a zero-to-many wildcard, which works when there is or isn’t a character before the first character of a string or after the last character of a string. It also eliminates the need for a metasequence.

Here’s a small test case outside of the book’s stored procedure:

-- Conditionally drop the table.
DROP TABLE IF EXISTS list;
 
-- Create the table.
CREATE TABLE list ( list_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, list_item VARCHAR(20));
 
-- Insert for rows.
INSERT INTO list (list_item) VALUES ('Star'),(' Star'),(' Star '),('Star ');
 
-- Query for zero-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "Zero-to-many leading characters "
,        LENGTH(list_item) AS "Length"
,        '^.*Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.*$';
 
-- Query for zero-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "One-to-many leading characters  "
,        LENGTH(list_item) AS "Length"
,        '^.+Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.+Star.*$';
 
-- Query for one-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "Zero-to-many trailing characters"
,        LENGTH(list_item) AS "Length"
,        '^.*Star.*$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.*$';
 
-- Query for one-to-many leading characters.
SELECT   list_id AS "ID"
,        list_item AS "One-to-many trailing characters "
,        LENGTH(list_item) AS "Length"
,        '^.*Star.+$' AS "Regular Expression"
FROM     list
WHERE    list_item REGEXP '^.*Star.+$';

The output from the scripts is:

+----+----------------------------------+--------+--------------------+
| ID | Zero-to-many leading characters  | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  1 | Star                             |      4 | ^.*Star.*$         |
|  2 |  Star                            |      5 | ^.*Star.*$         |
|  3 |  Star                            |      6 | ^.*Star.*$         |
|  4 | Star                             |      5 | ^.*Star.*$         |
+----+----------------------------------+--------+--------------------+
4 rows in set (0.02 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | One-to-many leading characters   | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  2 |  Star                            |      5 | ^.+Star.*$         |
|  3 |  Star                            |      6 | ^.+Star.*$         |
+----+----------------------------------+--------+--------------------+
2 rows in set (0.00 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | Zero-to-many trailing characters | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  1 | Star                             |      4 | ^.*Star.*$         |
|  2 |  Star                            |      5 | ^.*Star.*$         |
|  3 |  Star                            |      6 | ^.*Star.*$         |
|  4 | Star                             |      5 | ^.*Star.*$         |
+----+----------------------------------+--------+--------------------+
4 rows in set (0.02 sec)
 
+----+----------------------------------+--------+--------------------+
| ID | One-to-many trailing characters  | Length | Regular Expression |
+----+----------------------------------+--------+--------------------+
|  3 |  Star                            |      6 | ^.*Star.+$         |
|  4 | Star                             |      5 | ^.*Star.+$         |
+----+----------------------------------+--------+--------------------+
2 rows in set (0.02 sec)

Hope this helps.

Written by maclochlainn

March 23rd, 2012 at 11:01 am