Re: question about regexp_substr

From: Seth Miller <sethmiller.sm_at_gmail.com>
Date: Tue, 16 Dec 2014 20:46:58 -0600
Message-ID: <CAEueRAVgSphaCvCzYvxEkGN14Hm2vGkuQrLYBUU7M0iOXnBkRw_at_mail.gmail.com>

Alex,

The * is actually matching one character and one non-character. The non-character is the end of the match. So for the first example, you are matching the A and the end of the A. The second match is the end non-character so there is a zero match on the non-character following the A but it is returned as null. I can't quite figure out how to put this in words so hopefully the demonstration helps.

After the first example, you can see that every match on a character is actually a match on the character and the non-character that follows it. The * and the + match the same thing. The only difference is, the + requires a one match and the * only requires a zero match.

SQL> select nvl(regexp_replace(regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1), '$', '$'), 'NULL') res from dual;

--
A$

SQL> select nvl(regexp_replace(regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 2),
'$', '$'), 'NULL') res from dual;

RES
----
NULL

SQL> select nvl(regexp_replace(regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 3),
'$', '$'), 'NULL') res from dual;

RE
--
B$

SQL> select nvl(regexp_count(regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1),
'*'), '0') cnt from dual;

       CNT
----------
         2

SQL> select nvl(regexp_count(regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 2),
'*'), '0') cnt from dual;

       CNT
----------
         0

SQL> select nvl(regexp_count(regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 3),
'*'), '0') cnt from dual;

       CNT
----------
         2

SQL> select nvl(regexp_count(regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 1),
'*'), '0') cnt from dual;

       CNT
----------
         2

SQL> select nvl(regexp_count(regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 2),
'*'), '0') cnt from dual;

       CNT
----------
         2


Seth Miller



On Tue, Dec 16, 2014 at 5:11 PM, amonte <ax.mount_at_gmail.com> wrote:


>

> Hello

>

> I dont get why there is 0 character immediately after 1

>

> May be if you can do the same sort of explanation for + would help

>

> Thanks a lot

>

>

>

> 2014-12-16 23:53 GMT+01:00 Jackie Brock <J.Brock_at_cablelabs.com>:

>>

>>  After the A, the first instance of 0 or more non-numeric characters is

>> 0 characters immediately after the 1, so an empty string.  The third

>> instance of 0 or more, including the A and the match immediately after the

>> 1 (0 length), is B.  Does that help?

>>

>>

>>

>> *From:* amonte [mailto:ax.mount_at_gmail.com]

>> *Sent:* Tuesday, December 16, 2014 3:04 PM

>>

>> *To:* Jackie Brock

>> *Cc:* Oracle-L Group

>> *Subject:* Re: question about regexp_substr

>>

>>

>>

>> Hi Jackie

>>

>> Not sure what do you mean, I did this:

>>

>> select regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1) occur_1,

>>        regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 2) occur_2,

>>        regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 3) occur_3,

>>        regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 4) occur_4,

>>        regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 5) occur_5

>> from dual;

>>

>> O O O O O

>> - - - - -

>> A   B   C

>>

>> select regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 1) occur_1,

>>        regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 2) occur_2,

>>        regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 3) occur_3,

>>        regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 4) occur_4,

>>        regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 5) occur_5

>> from dual;

>>

>> O O O O O

>> - - - - -

>> A B C D E

>>

>> And I dont understand very well why * gives A, B and C whereas + gives

>> expected output.

>>

>> This is how I read it,

>>

>> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1) says start searching from

>> position 1 for first pattern which is non-numeric and no matter if there is

>> any pattern occurence so A is printed

>>

>> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1) says start searching from

>> position 1 for second pattern which is non-numeric and no matter if there

>> is any pattern occurence. If I parse the line A is non-numeric therefore is

>> a candidate but I am looking for the second occurence so I keep on

>> searching, immediately see number 1 so the pattern is not matched so keep

>> on searching, we now read B which is non-numeric, satisfies the pattern so

>> it should be printed but it's not?

>>

>>

>>

>>

>>

>>

>> 2014-12-16 22:42 GMT+01:00 Jackie Brock <J.Brock_at_cablelabs.com>:

>>

>> J  Run it with various values where the three is – get the first

>> occurrence, then the second, then the third, then the fourth – you’ll

>> easily see what it’s doing.

>>

>>

>>

>> *From:* amonte [mailto:ax.mount_at_gmail.com]

>> *Sent:* Tuesday, December 16, 2014 2:30 PM

>> *To:* Jackie Brock

>> *Cc:* Oracle-L Group

>> *Subject:* Re: question about regexp_substr

>>

>>

>>

>> Hello Jackie

>>

>> I know + means > 1 and * > 0 occurence. But I dont see why they give

>> different results in my example.

>>

>> I understand that what query is asking with * is

>>

>> "find in the string any non-numeric character pattern, no matter if the

>> there is character or not in the third occurence". I dont see why B

>> satisfies such condition?

>>

>>

>>

>> Thanks in advance

>>

>>

>>

>> Alex

>>

>>

>>

>> 2014-12-16 22:23 GMT+01:00 Jackie Brock <J.Brock_at_cablelabs.com>:

>>

>> The plus sign indicates that it expects at least 1 digit (1 or more).

>> The * means 0 or more.

>>

>>

>>

>> *From:* oracle-l-bounce_at_freelists.org [mailto:

>> oracle-l-bounce_at_freelists.org] *On Behalf Of *amonte

>> *Sent:* Tuesday, December 16, 2014 2:07 PM

>> *To:* Oracle-L Group

>> *Subject:* question about regexp_substr

>>

>>

>>

>> Hi people

>>

>> I have some difficulty understanding applying an operator to the pattren

>> in regexp_substr. Not sure if anyone can help ?

>>

>> The question is, what is the difference between these two queries:

>>

>> select regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 3) from dual;

>>

>> R

>> -

>> C

>>

>> select regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 3) from dual;

>>

>> R

>> -

>> B

>>

>> Why * and + gives different answers?

>>

>> Thanks in advance

>>

>>

>>

>> Alex

>>

>


--
http://www.freelists.org/webpage/oracle-l

Received on Wed Dec 17 2014 - 03:46:58 CET

This message: [ Message body ]
Next message: Tanel Poder: "Re: Memory operations on Sun/Oracle M class servers vs T class servers"
Previous message: rjamya: "Re: Anyone running Oracle 12 in single tenant mode?"
In reply to: amonte: "Re: question about regexp_substr"
Next in thread: Ls Cheng: "Re: question about regexp_substr"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message