MySQL REGEXP Error
While working through prepared statements in MySQL, there was an interesting MySQL regular expression question raised. A student wanted to know how to address the following error message:
ERROR 1139 (42000): Got error 'repetition-operator operand invalid' FROM REGEXP |
They had substituted *
for a .+
in a metasequence. A metasequence is a parenthetical expression that evaluates based on multiple alternative conditions, and the pipe (|
) acts as an OR
operator. The full code example is found on page 482 of the Oracle Database 11g & MySQL 5.6 Developer Handbook. The student’s change would have worked without an error had he replaced the metasequence with .*
instead of the solitary *
.
The original call to the procedure passes the following well formed regular expression:
CALL prepared_dml('(^|^.+)war(.+$|$)'); |
Or, they could eliminate the metasequences and use:
CALL prepared_dml('^.*war.*$'); |
Either returns the following entries from a column with movie titles from the sample code:
Charlie's War Star Wars I Star Wars II Star Wars III |
The dot (.
) means any possible character, and the plus (+
) means one-to-many possible repeating characters of a preceding character. When the dot precedes the plus, it means one-to-many wildcard characters. The student replaced the metasequence with an asterisk by itself and generated the badly formed regular expression error.
The misunderstanding occurs because the asterisk (*
) by itself doesn’t mean zero-to-many wildcard. The combination of the dot and asterisk creates a zero-to-many wildcard, which works when there is or isn’t a character before the first character of a string or after the last character of a string. It also eliminates the need for a metasequence.
Here’s a small test case outside of the book’s stored procedure:
-- Conditionally drop the table. DROP TABLE IF EXISTS list; -- Create the table. CREATE TABLE list ( list_id INT UNSIGNED PRIMARY KEY AUTO_INCREMENT, list_item VARCHAR(20)); -- Insert for rows. INSERT INTO list (list_item) VALUES ('Star'),(' Star'),(' Star '),('Star '); -- Query for zero-to-many leading characters. SELECT list_id AS "ID" , list_item AS "Zero-to-many leading characters " , LENGTH(list_item) AS "Length" , '^.*Star.*$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.*Star.*$'; -- Query for zero-to-many leading characters. SELECT list_id AS "ID" , list_item AS "One-to-many leading characters " , LENGTH(list_item) AS "Length" , '^.+Star.*$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.+Star.*$'; -- Query for one-to-many leading characters. SELECT list_id AS "ID" , list_item AS "Zero-to-many trailing characters" , LENGTH(list_item) AS "Length" , '^.*Star.*$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.*Star.*$'; -- Query for one-to-many leading characters. SELECT list_id AS "ID" , list_item AS "One-to-many trailing characters " , LENGTH(list_item) AS "Length" , '^.*Star.+$' AS "Regular Expression" FROM list WHERE list_item REGEXP '^.*Star.+$'; |
The output from the scripts is:
+----+----------------------------------+--------+--------------------+ | ID | Zero-to-many leading characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 1 | Star | 4 | ^.*Star.*$ | | 2 | Star | 5 | ^.*Star.*$ | | 3 | Star | 6 | ^.*Star.*$ | | 4 | Star | 5 | ^.*Star.*$ | +----+----------------------------------+--------+--------------------+ 4 rows in set (0.02 sec) +----+----------------------------------+--------+--------------------+ | ID | One-to-many leading characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 2 | Star | 5 | ^.+Star.*$ | | 3 | Star | 6 | ^.+Star.*$ | +----+----------------------------------+--------+--------------------+ 2 rows in set (0.00 sec) +----+----------------------------------+--------+--------------------+ | ID | Zero-to-many trailing characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 1 | Star | 4 | ^.*Star.*$ | | 2 | Star | 5 | ^.*Star.*$ | | 3 | Star | 6 | ^.*Star.*$ | | 4 | Star | 5 | ^.*Star.*$ | +----+----------------------------------+--------+--------------------+ 4 rows in set (0.02 sec) +----+----------------------------------+--------+--------------------+ | ID | One-to-many trailing characters | Length | Regular Expression | +----+----------------------------------+--------+--------------------+ | 3 | Star | 6 | ^.*Star.+$ | | 4 | Star | 5 | ^.*Star.+$ | +----+----------------------------------+--------+--------------------+ 2 rows in set (0.02 sec) |
Hope this helps.