REGEXP_REPLACE to remove non-native english characters [message #632436] |
Fri, 30 January 2015 15:59 |
wtolentino
Messages: 398 Registered: March 2005
|
Senior Member |
|
|
we are receiving data feeds from other source. this data feed has some characters that we our system cannot recognize. so we built a function to remove them.
replace(REGEXP_REPLACE (pStr,'[:alpha:][:alnum:][:graph:][:punct:][:upper:]'),chr(191),'')
the code above does not work when there is characters in non-native english. for example é and Ž. how do i include it on the regexp_replace? please advise. thank you.
warren
|
|
|
|
Re: REGEXP_REPLACE to remove non-native english characters [message #632455 is a reply to message #632436] |
Sat, 31 January 2015 12:31 |
Solomon Yakobson
Messages: 3273 Registered: January 2010 Location: Connecticut, USA
|
Senior Member |
|
|
Oracle regular expressions are multilingual. [:alpha:] is multilingual and will cover é and Ž ([A-Za-z] will not):
SQL> select '[' || regexp_replace('žolė','[[:alpha:]]') || ']','[' || regexp_replace('žolė','[a-z]') || ']' from dual;
'['||REGEXP_REPLACE('ŽOLE','[[:ALPHA:]]')||']' '['||REGEXP_REPLACE('ŽOLE','[A-Z]')||']'
---------------------------------------------------- ----------------------------------------------
[] [žė]
SQL>
I believe your issue in bad pattern. Pattern should be [[:alpha]], not [:alpha]:
SQL> select '[' || regexp_replace('žolė','[:alpha:]') || ']','[' || regexp_replace('žolė','[a-z]') || ']' from dual;
'['||REGEXP_REPLACE('ŽOLE','[[:ALPHA:]]')||']' '['||REGEXP_REPLACE('ŽOLE','[A-Z]')||']'
---------------------------------------------------- ----------------------------------------------
[žoė] [žė]
SQL>
SY.
|
|
|