Home » SQL & PL/SQL » SQL & PL/SQL » How to verify if string contains unwanted character (Oracle Db 11g and later) 
	
		
		
			| How to verify if string contains unwanted character [message #665448] | 
			Wed, 06 September 2017 05:08   | 
		 
		
			
				
				
				
					
						
						ator
						 Messages: 43 Registered: March 2009 
						
					 | 
					Member  | 
					 | 
		 
		 
	 | 
 
	
		Hi, 
 
before writing here i've been looking all over but i couldn't find how to resolve my problem. 
 
I have to verify if in a give string there are character outside a given list. 
 
I tried to user regular expression but it doesn't seem to work as follows. 
 
regexp_like('test letter ã','[^a-zA-Z0-9\.,;:''\+\-\(\)\?\*\[\]\{\}\\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ]') 
 
In this case it should return false because the letter ã is not in the given list and i can't do the opposite match because i don't know which character they would insert. 
 
I hope you can help me with this problem, otherwise i had to make a list of unwanted characters and match it with it. 
 
Thank you in advance, 
 
Stefano 
 
		
		
		
 |  
	| 
		
	 | 
 
 
 |  
	| 
		
 |  
	| 
		
 |  
	| 
		
 |  
	| 
		
 |  
	
		
		
			| Re: How to verify if string contains unwanted character [message #665587 is a reply to message #665456] | 
			Mon, 11 September 2017 10:29    | 
		 
		
			
				
				
				
					
						
						ator
						 Messages: 43 Registered: March 2009 
						
					 | 
					Member  | 
					 | 
		 
		 
	 | 
 
	
		Hi, 
 
sorry for the late reply, i wrote in this secton because is a pl/sql function of oracle 11g and if i got the wrong section i apoligize. 
 Quote: 
Hi, 
Just a few remarks: 
The regular expression pattern does not include a space (" "), so it always matching. 
Additionally, escaping inside a bracket expression is not taking any effect. On the other hand, the characters "-" and "]" have to be put into special place. 
See details here: https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended 
 
So, in the end you may end with mask like this (I have not included the "ã" character , as it was not in the original one): 
 
'[^] a-zA-Z0-9.,;:''+()?*[{}\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ-]' 
 
(hopefully it recognizes all included characters) 
I tried with the string but i did not succeded, so i'll do a translate and keep updating the character needed to be replaced and the error will occur. 
 
If I find a better solution i will let you know. 
 
Thanks for your help. 
 
Stefano 
 
		
		
		
 |  
	| 
		
	 | 
 
 
 |  
	
		
		
			| Re: How to verify if string contains unwanted character [message #665738 is a reply to message #665587] | 
			Tue, 19 September 2017 05:08   | 
		 
		
			
				
				
				  | 
					
						
						quirks
						 Messages: 85 Registered: October 2014 
						
					 | 
					Member  | 
					 | 
		 
		 
	 | 
 
	
		Hello @ator: 
 
I thought, that @flyboy s statement should be working, but it did not (as you figured out by yourself). 
So I played a little bit around and figured that ... 
SELECT CASE WHEN REGEXP_LIKE('ã', '[a-zA-Z]') THEN 'true' ELSE 'false' END STR_MATCHES
  FROM DUAL; 
... 'ã' is part of the characters defined by '[a-z]'. 
 
As it shows, as long as you are using a ASCII character set in your database the '[a-z]' works as expected: 
WITH
    CHARACTERS
    AS
        (SELECT     TO_NUMBER(TO_CHAR(LEVEL - 1)) AS DEC_NUM
               FROM DUAL
         CONNECT BY LEVEL <= POWER(2, 7))
SELECT DEC_NUM
      ,CASE WHEN DEC_NUM < 128 THEN CHR(DEC_NUM) ELSE NULL END          ASCII_CHAR
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:graph:]]') THEN 'x' END  IS_GRAPH
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:blank:]]') THEN 'x' END  IS_BLANK
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:alnum:]]') THEN 'x' END  IS_ALNUM
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:alpha:]]') THEN 'x' END  IS_ALPHA
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[a-z-Z]') THEN 'x' END      A_Z_TEST
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:digit:]]') THEN 'x' END  IS_DIGIT
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:cntrl:]]') THEN 'x' END  IS_CNTRL
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:lower:]]') THEN 'x' END  IS_LOWER
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:upper:]]') THEN 'x' END  IS_UPPER
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:print:]]') THEN 'x' END  IS_PRINT
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:punct:]]') THEN 'x' END  IS_PUNCT
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:space:]]') THEN 'x' END  IS_SPACE
      ,CASE WHEN REGEXP_LIKE(CHR(DEC_NUM), '[[:xdigit:]]') THEN 'x' END IS_XDIGIT
  FROM CHARACTERS 
 
But when your character set is bigger (as it is in nearly all oracle installations) 'ã' is part of the [a_z] range. 
WITH
    CHARACTERS
    AS
        (SELECT     TO_NUMBER(TO_CHAR(LEVEL - 1))                                               AS DEC_NUM
                   ,REGEXP_REPLACE(REGEXP_REPLACE(TO_CHAR(LEVEL - 1, 'XXXX'), '^ ', '\'), ' ', '0') AS HEX_NUM
               FROM DUAL
         CONNECT BY LEVEL <= POWER(2, 16))
SELECT DEC_NUM
      ,CASE WHEN DEC_NUM < 128 THEN CHR(DEC_NUM) ELSE NULL END             ASCII_CHAR
      ,HEX_NUM
      ,UNISTR(HEX_NUM)                                                     UTF8_CHAR
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:graph:]]') THEN 'x' END  IS_GRAPH
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:blank:]]') THEN 'x' END  IS_BLANK
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:alnum:]]') THEN 'x' END  IS_ALNUM
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:alpha:]]') THEN 'x' END  IS_ALPHA
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[a-z-Z]') THEN 'x' END      A_Z_TEST
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:digit:]]') THEN 'x' END  IS_DIGIT
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:cntrl:]]') THEN 'x' END  IS_CNTRL
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:lower:]]') THEN 'x' END  IS_LOWER
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:upper:]]') THEN 'x' END  IS_UPPER
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:print:]]') THEN 'x' END  IS_PRINT
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:punct:]]') THEN 'x' END  IS_PUNCT
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:space:]]') THEN 'x' END  IS_SPACE
      ,CASE WHEN REGEXP_LIKE(UNISTR(HEX_NUM), '[[:xdigit:]]') THEN 'x' END IS_XDIGIT
  FROM CHARACTERS
 WHERE 
       UNISTR(HEX_NUM) = 'ã' AND
       (
          DECODE(UNISTR(HEX_NUM), REGEXP_SUBSTR(UNISTR(HEX_NUM), '[a-z-Z]'), 1) = 1 OR 
          DECODE(UNISTR(HEX_NUM), REGEXP_SUBSTR(UNISTR(HEX_NUM), '[[:alpha:]]'), 1) = 1
       ); 
 
You can find out which character set you are using with this statement: 
SELECT *
  FROM NLS_DATABASE_PARAMETERS
 WHERE PARAMETER LIKE 'NLS_%CHARACTERSET';  
'NLS_CHARACTERSET' is used in CHAR and VARCHAR columns. 
'NLS_NCHAR_CHARACTERSET' in NCHAR and NVARCHAR columns (which can store the full Unicode character set). 
 
In conclusion you can not use [a-z] but must enlist each character by itself: 
SELECT CASE WHEN REGEXP_LIKE('ã', '[^abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ]') THEN 'true' ELSE 'false' END STR_MATCHES
  FROM DUAL; 
 
To bring @flyboy s statement to life you just need to replace the range definitions by their individual values: 
SELECT CASE
           WHEN REGEXP_LIKE(
                    'test letter ã'
                   ,'[^] abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0-9.,;:''+()?*[{}\`´~!"#%&<>÷=@_$£àáâäçèéêëìíîïñòóôöùúûüýßÀÁÂÄÇÈÉÊËÌÍÎÏÒÓÔÖÙÚÛÜÑ-]'
                )
           THEN
               'true'
           ELSE
               'false'
       END
           STR_MATCHES
  FROM DUAL; 
		
		
		
 |  
	| 
		
	 | 
 
 
 |   
Goto Forum:
 
 Current Time: Tue Nov 04 04:01:22 CST 2025 
 |