Oracle10g Regular Expression

articles: 

Oracle10g Regular Expression

Oracle database 10g added a new feature "Regular Expression" enhancement that can be used with SQL and PL/SQL statements.

There are four regular expression functions :-

REGEXP_LIKE
REGEXP_SUBSTR
REGEXP_INSTR
REGEXP_REPLACE

The functions REGEXP_SUBSTR, REGEXP_INSTR and REGEXP_REPLACE basically extend the functionality of other well known string functions SUBSTR, INSTR and REPLACE.

REGEXP_LIKE is basically an operator that is similar to existing LIKE operator.

The above regular expression functions are very efficient and powerful.

A regular expression is basically the combination of one or more literals with some metacharacters , this is used for pattern matching.

Oracle Regular Expression functions support the POSIX (Portable Operating System Interface) standard character classes.

Metacharacter and Character Class Description

* match 0 or more times
+ match 1 or more times
? match 0 or 1 time
^ start of the expression
$ end of the the expression
{m} match exactly m times
{m,} match at least m times
{m, n} match at least m times but no more than n times
| alternate operator
( ) group used for subexpression and backreferencing
[char] character List (can be used as literals except ^ ,- and character class)
\digit backslash followed by a digit between 1 and 9 ,used for backrerencing the respective subexpression.
[:alpha:] alphabetic characters
[:lower:] lowercase alphabetic characters
[:upper:] uppercase alphabetic characters
[:digit:] numeric digits
[:alnum:] alphanumeric characters
[:space:] nonprinting space characters e.g. carriage return, newline, vertical tab, and form feed
[:punct:] punctuation characters
[:cntrl:] nonprinting control characters
[:print:] printable characters

Matching Modes

i case insensitive matching
c case sensitive matching
n the dot match any character, including newlines.
m the caret and dollar match at the start and end of each line

REGEXP_LIKE
==============
Syntax : REGEXP_LIKE(source_string, pattern [, match_parameter])

source_string can be any character expression and will be used as search value.
pattern is any valid regular expression.
match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m').

Example : To find the list of Product Codes that contain any non-numric chracters.

create table t_product
(
product_id number,
product_code varchar2(10),
product_desc varchar2(100)
);

insert into t_product values (1,'10','Option1');
insert into t_product values (2,'20','Option2');
insert into t_product values (3,'30','Option3');
insert into t_product values (4,'40a','Option4');
insert into t_product values (5,'Z50','Option5');

select product_code from t_product
where regexp_like(product_code, '[^[:digit:]]');

The query output in this case would be :

PRODUCT_CODE
40a
Z50

REGEXP_SUBSTR
==============
Syntax : REGEXP_SUBSTR(source_string, pattern [, position [, occurrence [, match_parameter ]]])

REGEXP_SUBSTR returns the substring for a regular expression pattern.
source_string is search value
pattern is a valid regular expression
position is an integer n (n >0) that indicates the position from where serach should begin (default is 1)
occurrence is an integer n (n >0) that indicates the search for which occurrence of pattern in source_string.
match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m').

Example : To find the first value from the set of values separated by commas

SELECT REGEXP_SUBSTR('P1,P2,P3','[^,]+') FROM DUAL;

The query output in this case will be P1

REGEXP_INSTR
============
Syntax : REGEXP_INSTR (source_string, pattern [, position [, occurrence [, return_option [, match_parameter ]]]])

REGEXP_INSTR returns the position of the matched substring.In case no match , the function returns 0.

source_string is search value
pattern is a valid regular expression
position is an integer n (n >0) that indicates the position from where serach should begin (default is 1)
occurrence is an integer n (n >0) that indicates the search for which occurrence of pattern in source_string.
return_option gives the position w.r.t. occurrence, for value 0 => returns the position of the first character of the occurrence , for value 1 => returns the position of the character following the occurrence
match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m').
Example :
select REGEXP_INSTR('The Sum is 100.','[[:digit:]]+') FROM DUAL;
This query will return 12, the starting position of 100 in the string.

REGEXP_REPLACE
===============
Syntax : REGEXP_REPLACE(source_char, pattern [, replace_string [, position [, occurrence [, match_parameter ]]]])

REGEXP_REPLACE returns source_char by replacing any matched substring for a regular expression pattern.
source_char is search value
pattern is a valid regular expression
replace_string is a character string used for replacement this can be used as backreferences to subexpressions in the form \n where n is 1 to 9.
position is an integer n (n >0) that indicates the position from where serach should begin (default is 1)
occurrence is an integer n (n >0) that indicates the occurrence of replace operation, for value 0 => replace all the occurrences of the matched substring , for value n => replaces the nth occurrence.
match_parameter can be one or more combination of valid Matching Modes ('i','c','n','m','x')
Example :
SELECT REGEXP_REPLACE('This is an example with lot blank spaces','( ){2,}', ' ') FROM DUAL;
This query will return "This is an example with lot blank spaces" , this will replace each occurrence of two or more spaces with a single space.

Comments

Hi,

Sorry to write this but it is nothing new and you might read exactly the same information on Oracle www.
Some hint if you want to use those items:
- If you use "=" or "IN" or "ANY" or similar operators then index might be used.
- If you use LIKE then index might be used only if the starting part is fixed (so pattern "abc%" might use index but "%abc" would not use it). Use function based indexes to reverse that (so that "abc%" might not be using index while "%abc" would use it).
- If you use regexps then index is in general not used. Avoid it if possible. Use only if there is no other option.