How to find Unicode characters within a varchar2 [message #151179] |
Tue, 13 December 2005 10:27  |
sengleb
Messages: 2 Registered: December 2005 Location: Englewood
|
Junior Member |
|
|
Is there a function to find a Unicode character in a varchar2? we are having problems with the outputed XML not validating because of Unicode characters. I want to find every instance in the Database so that I can delete them, is there a way?
|
|
|
|
Re: How to find Unicode characters within a varchar2 [message #151197 is a reply to message #151195] |
Tue, 13 December 2005 12:49   |
sengleb
Messages: 2 Registered: December 2005 Location: Englewood
|
Junior Member |
|
|
smartin wrote on Tue, 13 December 2005 11:02 | Are you sure you mean unicode characters, or do you mean unprintable values like hex nulls or something?
You might try looking at functions like asciistr and ascii.
Oh and the chr function, possibly what you are seeing is chr(0) ('I've seen those before when loading data...)
|
"unprintable values" - Correct, specifically the value of '0x1a - uprintable value' was what was causing the XML to fail in validation.
Can asciistr or ascii look for all unprintable characters or only specific identified values? I am trying to comb through millions of records to delete any and all of these.
|
|
|
Re: How to find Unicode characters within a varchar2 [message #151214 is a reply to message #151179] |
Tue, 13 December 2005 16:07  |
andrew again
Messages: 2577 Registered: March 2000
|
Senior Member |
|
|
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
With the Partitioning option
JServer Release 9.2.0.6.0 - Production
SQL> -- make sure we can store UTF8
SQL> select parameter, value, banner
2 from v$nls_parameters a, v$version
3 where parameter like 'NLS_CHARACTERSET%'
4 and banner like '%Enterprise%';
PARAMETER
----------------------------------------------------------------
VALUE
----------------------------------------------------------------
BANNER
----------------------------------------------------------------
NLS_CHARACTERSET
AL32UTF8
Oracle9i Enterprise Edition Release 9.2.0.6.0 - 64bit Production
SQL> CREATE OR REPLACE FUNCTION ascii_only (p_txt IN VARCHAR2)
2 RETURN VARCHAR2
3 IS
4 v_tmp VARCHAR2 (32767);
5 v_clean VARCHAR2 (32767);
6 v_char VARCHAR2 (3 BYTE);
7 BEGIN
8 FOR i IN 1 .. LENGTH (p_txt) LOOP
9 v_char := SUBSTR (p_txt, i, 1);
10
11 -- choose your range of valid characters
12 -- Ascii only looks at first BYTE of character
13 -- (note sure if this is a reliable way to detect UTF8 multi-byte characters)
14 -- (but it seems to work ok on my 10000 test records)
15 IF (ASCII (v_char) BETWEEN 32 AND 127)
16 OR (ASCII (v_char) IN (9, 10, 13)) THEN
17 v_clean := v_clean || v_char;
18 END IF;
19 END LOOP;
20
21 IF LENGTH (v_clean) != LENGTH (p_txt) THEN
22 DBMS_OUTPUT.put_line ('removed '||TO_CHAR(LENGTH(p_txt) - LENGTH(v_clean))||' characters');
23 END IF;
24
25 RETURN v_clean;
26 END;
27 /
Function created.
SQL>
SQL>
SQL> create table utf8_data(col1 varchar2(20));
Table created.
SQL>
SQL> -- http://www.unicode.org/Public/3.0-Update/NamesList-3.0.0.txt
SQL> -- Euro is U+20AC, Small Greek Gamma U+03B3
SQL> insert into utf8_data values ('hello '||unistr('\20AC')||unistr('\03B3')||chr(10)||'world');
1 row created.
SQL>
SQL> set serveroutput on size 1000000
SQL>
SQL> select ascii_only(col1) from utf8_data;
ASCII_ONLY(COL1)
---------------------------------------
hello
world
SQL>
SQL> begin
2 DBMS_OUTPUT.put_line ( 'just flushing buffer...');
3 end;
4 /
removed 2 characters
just flushing buffer...
PL/SQL procedure successfully completed.
http://asktom.oracle.com/pls/ask/f?p=4950:61:1550939581473334491::::P61_ID:26178667828928
|
|
|