Home » SQL & PL/SQL » SQL & PL/SQL » How to find Unicode characters within a varchar2
How to find Unicode characters within a varchar2 [message #151179] Tue, 13 December 2005 10:27 Go to next message
sengleb
Messages: 2
Registered: December 2005
Location: Englewood
Junior Member
Is there a function to find a Unicode character in a varchar2? we are having problems with the outputed XML not validating because of Unicode characters. I want to find every instance in the Database so that I can delete them, is there a way?
Re: How to find Unicode characters within a varchar2 [message #151195 is a reply to message #151179] Tue, 13 December 2005 12:02 Go to previous messageGo to next message
smartin
Messages: 1803
Registered: March 2005
Location: Jacksonville, Florida
Senior Member
Are you sure you mean unicode characters, or do you mean unprintable values like hex nulls or something?

You might try looking at functions like asciistr and ascii.

Oh and the chr function, possibly what you are seeing is chr(0) ('I've seen those before when loading data...)

[Updated on: Tue, 13 December 2005 12:06]

Report message to a moderator

Re: How to find Unicode characters within a varchar2 [message #151197 is a reply to message #151195] Tue, 13 December 2005 12:49 Go to previous messageGo to next message
sengleb
Messages: 2
Registered: December 2005
Location: Englewood
Junior Member
smartin wrote on Tue, 13 December 2005 11:02

Are you sure you mean unicode characters, or do you mean unprintable values like hex nulls or something?

You might try looking at functions like asciistr and ascii.

Oh and the chr function, possibly what you are seeing is chr(0) ('I've seen those before when loading data...)


"unprintable values" - Correct, specifically the value of '0x1a - uprintable value' was what was causing the XML to fail in validation.

Can asciistr or ascii look for all unprintable characters or only specific identified values? I am trying to comb through millions of records to delete any and all of these.
Re: How to find Unicode characters within a varchar2 [message #151214 is a reply to message #151179] Tue, 13 December 2005 16:07 Go to previous message
andrew again
Messages: 2577
Registered: March 2000
Senior Member
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.6.0 - Production

SQL> -- make sure we can store UTF8
SQL> select parameter, value, banner
  2    from v$nls_parameters a, v$version
  3    where parameter like 'NLS_CHARACTERSET%'
  4    and banner like '%Enterprise%';

PARAMETER
----------------------------------------------------------------
VALUE
----------------------------------------------------------------
BANNER
----------------------------------------------------------------
NLS_CHARACTERSET
AL32UTF8
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production

SQL> CREATE OR REPLACE FUNCTION ascii_only (p_txt IN VARCHAR2)
  2     RETURN VARCHAR2
  3  IS
  4     v_tmp     VARCHAR2 (32767);
  5     v_clean   VARCHAR2 (32767);
  6     v_char    VARCHAR2 (3 BYTE);
  7  BEGIN
  8     FOR i IN 1 .. LENGTH (p_txt) LOOP
  9        v_char := SUBSTR (p_txt, i, 1);
 10
 11        -- choose your range of valid characters
 12        -- Ascii only looks at first BYTE of character
 13        -- (note sure if this is a reliable way to detect UTF8 multi-byte characters)
 14        -- (but it seems to work ok on my 10000 test records)
 15        IF    (ASCII (v_char) BETWEEN 32 AND 127)
 16           OR (ASCII (v_char) IN (9, 10, 13)) THEN
 17           v_clean := v_clean || v_char;
 18        END IF;
 19     END LOOP;
 20
 21     IF LENGTH (v_clean) != LENGTH (p_txt) THEN
 22        DBMS_OUTPUT.put_line ('removed '||TO_CHAR(LENGTH(p_txt) - LENGTH(v_clean))||' characters');
 23     END IF;
 24
 25     RETURN v_clean;
 26  END;
 27  /

Function created.

SQL>
SQL>
SQL> create table utf8_data(col1 varchar2(20));

Table created.

SQL>
SQL> -- http://www.unicode.org/Public/3.0-Update/NamesList-3.0.0.txt
SQL> -- Euro is U+20AC, Small Greek Gamma U+03B3
SQL> insert into utf8_data values ('hello '||unistr('\20AC')||unistr('\03B3')||chr(10)||'world');

1 row created.

SQL>
SQL> set serveroutput on size 1000000
SQL>
SQL> select ascii_only(col1) from utf8_data;

ASCII_ONLY(COL1)
---------------------------------------
hello
world


SQL>
SQL> begin
  2   DBMS_OUTPUT.put_line ( 'just flushing buffer...');
  3  end;
  4  /
removed 2 characters
just flushing buffer...

PL/SQL procedure successfully completed.


http://asktom.oracle.com/pls/ask/f?p=4950:61:1550939581473334491::::P61_ID:26178667828928
Previous Topic: TRIGGER DOES NOT WORK PLEASE HELP!
Next Topic: Access user_tab_columns
Goto Forum:
  


Current Time: Thu Aug 21 20:47:47 CDT 2025