Re: question about regexp_substr

From: Ls Cheng <exriscer_at_gmail.com>
Date: Fri, 19 Dec 2014 00:13:52 +0100
Message-ID: <CAJ2-Qb-m9aixVvHDsh0HWW4XQwOKA=NXGztcqiAqGThELFqo3w_at_mail.gmail.com>



Hi Alex

Try your examples with regexp_instr, you can see where the searching pattern starts and ends therefore it can help you to understand the difference between * and +.

When you test for regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 2) it searches from position 1 and when it hits A the number of occurence is increased from 0 to 1 and this makes the next start searching position 2. When it finds 1 it satisfies the pattern matching because * is used (0 or more occurence of pattern, in this case 0 occurence), this time a NULL is printed because you asked not to search numeric characters, the number of occurence is increased from 2 to 3. It next finds B which satisfies the pattern matching again and it prints B because it's non-numeric and you asked to serach for it. And so on.

HTH On Tue, Dec 16, 2014 at 11:04 PM, amonte <ax.mount_at_gmail.com> wrote:
>
> Hi Jackie
>
> Not sure what do you mean, I did this:
>
> select regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1) occur_1,
> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 2) occur_2,
> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 3) occur_3,
> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 4) occur_4,
> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 5) occur_5
> from dual;
>
> O O O O O
> - - - - -
> A B C
>
> select regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 1) occur_1,
> regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 2) occur_2,
> regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 3) occur_3,
> regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 4) occur_4,
> regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 5) occur_5
> from dual;
>
> O O O O O
> - - - - -
> A B C D E
>
> And I dont understand very well why * gives A, B and C whereas + gives
> expected output.
>
> This is how I read it,
>
> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1) says start searching from
> position 1 for first pattern which is non-numeric and no matter if there is
> any pattern occurence so A is printed
> regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 1) says start searching from
> position 1 for second pattern which is non-numeric and no matter if there
> is any pattern occurence. If I parse the line A is non-numeric therefore is
> a candidate but I am looking for the second occurence so I keep on
> searching, immediately see number 1 so the pattern is not matched so keep
> on searching, we now read B which is non-numeric, satisfies the pattern so
> it should be printed but it's not?
>
>
>
>
>
>
> 2014-12-16 22:42 GMT+01:00 Jackie Brock <J.Brock_at_cablelabs.com>:
>>
>> J Run it with various values where the three is – get the first
>> occurrence, then the second, then the third, then the fourth – you’ll
>> easily see what it’s doing.
>>
>>
>>
>> *From:* amonte [mailto:ax.mount_at_gmail.com]
>> *Sent:* Tuesday, December 16, 2014 2:30 PM
>> *To:* Jackie Brock
>> *Cc:* Oracle-L Group
>> *Subject:* Re: question about regexp_substr
>>
>>
>>
>> Hello Jackie
>>
>> I know + means > 1 and * > 0 occurence. But I dont see why they give
>> different results in my example.
>>
>> I understand that what query is asking with * is
>>
>> "find in the string any non-numeric character pattern, no matter if the
>> there is character or not in the third occurence". I dont see why B
>> satisfies such condition?
>>
>>
>>
>> Thanks in advance
>>
>>
>>
>> Alex
>>
>>
>>
>> 2014-12-16 22:23 GMT+01:00 Jackie Brock <J.Brock_at_cablelabs.com>:
>>
>> The plus sign indicates that it expects at least 1 digit (1 or more).
>> The * means 0 or more.
>>
>>
>>
>> *From:* oracle-l-bounce_at_freelists.org [mailto:
>> oracle-l-bounce_at_freelists.org] *On Behalf Of *amonte
>> *Sent:* Tuesday, December 16, 2014 2:07 PM
>> *To:* Oracle-L Group
>> *Subject:* question about regexp_substr
>>
>>
>>
>> Hi people
>>
>> I have some difficulty understanding applying an operator to the pattren
>> in regexp_substr. Not sure if anyone can help ?
>>
>> The question is, what is the difference between these two queries:
>>
>> select regexp_substr('A1B2C3D4E', '[^0-9]+', 1, 3) from dual;
>>
>> R
>> -
>> C
>>
>> select regexp_substr('A1B2C3D4E', '[^0-9]*', 1, 3) from dual;
>>
>> R
>> -
>> B
>>
>> Why * and + gives different answers?
>>
>> Thanks in advance
>>
>>
>>
>> Alex
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Dec 19 2014 - 00:13:52 CET

Original text of this message