Home » SQL & PL/SQL » SQL & PL/SQL » Regular expression: letter category? (Any version supporting regular expressions)
Regular expression: letter category? [message #651311] Sun, 15 May 2016 12:50 Go to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Is there any neater way to get the first letter of a string, using regular expression, than:
SQL> select regexp_substr('***Michel***','[a-z]',1,1,'i') from dual;
R
-
M

In Perl, you can use the Posix Unicode \pL category (letter):
E:\>perl -e "print('***Michel***' =~ /(\pL)/);"
M

but, as far as I know, Oracle does not support Unicode categories.
Without saying that [a-z] does not cover ALL letters, just in country there are 2 dozens more.

[Updated on: Sun, 15 May 2016 13:03]

Report message to a moderator

Re: Regular expression: letter category? [message #651314 is a reply to message #651311] Sun, 15 May 2016 13:59 Go to previous messageGo to next message
Solomon Yakobson
Messages: 3273
Registered: January 2010
Location: Connecticut, USA
Senior Member
Michel Cadot wrote on Sun, 15 May 2016 13:50

but, as far as I know, Oracle does not support Unicode categories.


Regular Expression Operator Multilingual Enhancements
select regexp_substr('***Michel***','[[:alpha:]]',1,1,'i') from dual;


SY.

[Updated on: Sun, 15 May 2016 14:02]

Report message to a moderator

Re: Regular expression: letter category? [message #651315 is a reply to message #651311] Sun, 15 May 2016 14:11 Go to previous messageGo to next message
Frank Naude
Messages: 4579
Registered: April 1998
Senior Member
Here's another alternative, not sure if it would qualify as neater:
SQL> select regexp_substr('***Michel***','\w',1,1) from dual;

R
-
M
Re: Regular expression: letter category? [message #651316 is a reply to message #651314] Sun, 15 May 2016 14:11 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Yes, thanks, I did read this page but not so closely the examples. Smile
Just by curiosity, do you know any \? shortcut?
I will have to look closer if [[:alpha:]] (=[a-zA-Z]) is sufficient for our case.

[Updated on: Sun, 15 May 2016 14:14]

Report message to a moderator

Re: Regular expression: letter category? [message #651317 is a reply to message #651315] Sun, 15 May 2016 14:13 Go to previous messageGo to next message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Frank Naude wrote on Sun, 15 May 2016 21:11
Here's another alternative, not sure if it would qualify as neater:
SQL> select regexp_substr('***Michel***','\w',1,1) from dual;

R
-
M


This was my very first idea thought but \w also cover numerical characters:
SQL> select regexp_substr('***1Michel***','\w',1,1) from dual;
R
-
1

but, yes, \? expression is what I called neater, I should say more cryptic instead. Smile

[Updated on: Sun, 15 May 2016 14:17]

Report message to a moderator

Re: Regular expression: letter category? [message #651318 is a reply to message #651317] Sun, 15 May 2016 14:32 Go to previous message
Michel Cadot
Messages: 68641
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

I think I found a definitive answer (about non [a-zA-Z] letters): IEEE Std 1003.1, 2013 Edition.

Quote:
The following character class expressions shall be supported in all locales:
[:alnum:]   [:cntrl:]   [:lower:]   [:space:]
[:alpha:]   [:digit:]   [:print:]   [:upper:]
[:blank:]   [:graph:]   [:punct:]   [:xdigit:]

but
Quote:
While many regular expressions can be interpreted differently depending on the current locale, many features, such as character class expressions, provide for contextual invariance across locales.


which means for me I'll never have "é" in some alpha class unless Oracle chooses to implement Unicode extension. Sad

Previous Topic: attendance sheet
Next Topic: oracle session leak
Goto Forum:
  


Current Time: Fri Apr 19 21:58:41 CDT 2024