Return-Path: <ml-errors@fatcity.com>
Received: from ensim.rackshack.net (root@localhost)
 by orafaq.net (8.11.6/8.11.6) with ESMTP id h9ALOS632190
 for <oracle-l@orafaq.net>; Fri, 10 Oct 2003 16:24:28 -0500
X-ClientAddr: 66.27.56.212
Received: from www3.fatcity.com (rrcs-west-66-27-56-212.biz.rr.com [66.27.56.212])
 by ensim.rackshack.net (8.11.6/8.11.6) with ESMTP id h9ALOCc32174
 for <oracle-l@orafaq.net>; Fri, 10 Oct 2003 16:24:12 -0500
Received: (from root@localhost)
 by www3.fatcity.com (8.11.6/8.11.6) id h9AIeS304726
 for oracle-l@orafaq.net; Fri, 10 Oct 2003 11:40:28 -0700
Received: by fatcity.com (05-Jun-2003/v1.0g-b73/bab) via fatcity.com id 005D2B82; Fri, 10 Oct 2003 11:39:24 -0800
Message-ID: <F001.005D2B82.20031010113924@fatcity.com>
Date: Fri, 10 Oct 2003 11:39:24 -0800
To: Multiple recipients of list ORACLE-L <ORACLE-L@fatcity.com>
X-Comment: Oracle RDBMS Community Forum
X-Sender: Jared.Still@radisys.com
Sender: ml-errors@fatcity.com
Reply-To: ORACLE-L@fatcity.com
Errors-To: ML-ERRORS@fatcity.com
From: Jared.Still@radisys.com
Subject: RE: RE: RE: Find an unprintable character inside a column....
Organization: Fat City Network Services, San Diego, California
X-ListServer: v1.0g, build 73; ListGuru (c) 1996-2003 Bruce A. Bergman
Precedence: bulk
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="=_alternative 0066464B88256DBB_="
--=_alternative 0066464B88256DBB_=
Content-Type: text/plain; charset="us-ascii"

I played with this a bit.

First, I created some test  data with one column corrupted with a single 
random character
of 0-31 replacing a random char in that column 20% of the rows of the 
table.

Peter's function correctly found all of the rows in 7.5 seconds.

Stephane's function ran in 3.5 seconds, but didn't find any of
the rows.  I didn't attempt to correct the code.

Then I tried a function based on owa_pattern.regex.  My initial attempts
didn't return the correct rows, as the regex pattern needed some tuning.

I didn't attempt to fix it, as it was woefully slow, about 30 seconds.

Regex in the WHERE clause in 10g will be nice.

Jared





"Stephane Faroult" <sfaroult@oriolecorp.com>
Sent by: ml-errors@fatcity.com
 10/10/2003 07:09 AM
 Please respond to ORACLE-L

 
        To:     Multiple recipients of list ORACLE-L <ORACLE-L@fatcity.com>
        cc: 
        Subject:        RE: RE: RE: Find an unprintable character inside a column....


>Some people have requested this code, so I thought
>you might as well all
>have the chance to pick it to bits... Its a
>function called BAD_ASCII, and
>it hunts out for any ascii characters with an ascii
>value of less than 32 in
>a specified field. (Acknowledgments to my colleague
>Keith Holmes for help
>with this code.)
>
>Use it as follows:
>
>Where a field called DATA in a table TABLE_1 may
>contain an ascci character
>with a value less than 32 (ie a non-printing
>character), the following SQL
>will find the row in question:
>
>select rowid,DATA,dump(DATA) from TABLE_1 
>where BAD_ASCII(DATA) > 0;
>
>You could use the PK of the table instead of rowid,
>of course. You will also
>note that I select the DATA field in both normal
>and ascii 'dump' mode, the
>better to locate where the corruption is located.
>
>peter
>edinburgh
>...................................
>
>Source as follows:
>
>
>Function BAD_ASCII
> (V_Text in char)
> return number
>is
> V_Int  number;
> V_Count number;
>begin
>--
>V_Int            := 0;
>V_Count := 1;
>while V_Count<=length(rtrim(V_Text)) and V_Int=0
> loop
>  if ascii(substr(V_Text, V_Count, 1))<32 then
>   V_Int := V_Count;
>  end if;
> V_Count := V_Count + 1;
>end loop;
>return V_Int;
>--
>exception
>  when others then
>    return -1;
>end BAD_ASCII;
>/
>

Peter,

   I think that you can make this code 25% faster when the data is clean 
(which hopefully is the general case) by using 'replace', more efficient 
than a PL/SQL loop, to check whether you have some rubbish (sort of). It 
will not tell you where the bad character is, however - which means that 
then you can loop to look for it.

Here is what I would suggest :

create or replace Function BAD_ASCII (V_Text in char) 
return number 
is 
  V_Int number; 
  V_Count number; 
begin 
  if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)||
                      chr(4)||chr(5)||chr(6)||chr(7)||
                      chr(8)||chr(9)||chr(10)||chr(11)||
                      chr(12)||chr(13)||chr(14)||chr(15)||
                      chr(16)||chr(17)||chr(18)||chr(19)||
                      chr(20)||chr(21)||chr(22)||chr(23)||
                      chr(24)||chr(25)||chr(26)||chr(27)||
                      chr(28)||chr(29)||chr(30)||chr(31),
                      '--------------------------------') 
                    = V_text)
  then
    return 0;
  else
    V_Int := 0; 
    V_Count := 1; 
    while V_Count<=length(rtrim(V_Text)) and V_Int=0 
    loop 
      if ascii(substr(V_Text, V_Count, 1))<32 then 
        V_Int := V_Count; 
      end if; 
      V_Count := V_Count + 1; 
    end loop; 
    return V_Int; 
 end if;
-- 
exception 
  when others then 
    return -1; 
end BAD_ASCII; 
/ 

Regards,

Stephane Faroult
Oriole
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Stephane Faroult
  INET: sfaroult@oriolecorp.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru@fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).



--=_alternative 0066464B88256DBB_=
Content-Type: text/html; charset="us-ascii"


<br><font size=2 face="sans-serif">I played with this a bit.</font>
<br>
<br><font size=2 face="sans-serif">First, I created some test &nbsp;data with one column corrupted with a single random character</font>
<br><font size=2 face="sans-serif">of 0-31 replacing a random char in that column 20% of the rows of the table.</font>
<br>
<br><font size=2 face="sans-serif">Peter's function correctly found all of the rows in 7.5 seconds.</font>
<br>
<br><font size=2 face="sans-serif">Stephane's function ran in 3.5 seconds, but didn't find any of</font>
<br><font size=2 face="sans-serif">the rows. &nbsp;I didn't attempt to correct the code.</font>
<br>
<br><font size=2 face="sans-serif">Then I tried a function based on owa_pattern.regex. &nbsp;My initial attempts</font>
<br><font size=2 face="sans-serif">didn't return the correct rows, as the regex pattern needed some tuning.</font>
<br>
<br><font size=2 face="sans-serif">I didn't attempt to fix it, as it was woefully slow, about 30 seconds.</font>
<br>
<br><font size=2 face="sans-serif">Regex in the WHERE clause in 10g will be nice.</font>
<br>
<br><font size=2 face="sans-serif">Jared</font>
<br>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>&quot;Stephane Faroult&quot; &lt;sfaroult@oriolecorp.com&gt;</b></font>
<br><font size=1 face="sans-serif">Sent by: ml-errors@fatcity.com</font>
<p><font size=1 face="sans-serif">&nbsp;10/10/2003 07:09 AM</font>
<br><font size=2 face="sans-serif">&nbsp;</font><font size=1 face="sans-serif">Please respond to ORACLE-L</font>
<br>
<td><font size=1 face="Arial">&nbsp; &nbsp; &nbsp; &nbsp; </font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; To: &nbsp; &nbsp; &nbsp; &nbsp;Multiple recipients of list ORACLE-L &lt;ORACLE-L@fatcity.com&gt;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; cc: &nbsp; &nbsp; &nbsp; &nbsp;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; Subject: &nbsp; &nbsp; &nbsp; &nbsp;RE: RE: RE: Find an unprintable character inside a column....</font></table>
<br>
<br>
<br><font size=2 face="Courier New">&gt;Some people have requested this code, so I thought<br>
&gt;you might as well all<br>
&gt;have the chance to pick it to bits... Its a<br>
&gt;function called BAD_ASCII, and<br>
&gt;it hunts out for any ascii characters with an ascii<br>
&gt;value of less than 32 in<br>
&gt;a specified field. (Acknowledgments to my colleague<br>
&gt;Keith Holmes for help<br>
&gt;with this code.)<br>
&gt;<br>
&gt;Use it as follows:<br>
&gt;<br>
&gt;Where a field called DATA in a table TABLE_1 may<br>
&gt;contain an ascci character<br>
&gt;with a value less than 32 (ie a non-printing<br>
&gt;character), the following SQL<br>
&gt;will find the row in question:<br>
&gt;<br>
&gt;select rowid,DATA,dump(DATA) from TABLE_1 <br>
&gt;where BAD_ASCII(DATA) &gt; 0;<br>
&gt;<br>
&gt;You could use the PK of the table instead of rowid,<br>
&gt;of course. You will also<br>
&gt;note that I select the DATA field in both normal<br>
&gt;and ascii 'dump' mode, the<br>
&gt;better to locate where the corruption is located.<br>
&gt;<br>
&gt;peter<br>
&gt;edinburgh<br>
&gt;...................................<br>
&gt;<br>
&gt;Source as follows:<br>
&gt;<br>
&gt;<br>
&gt;Function BAD_ASCII<br>
&gt; (V_Text in char)<br>
&gt; return number<br>
&gt;is<br>
&gt; V_Int &nbsp;number;<br>
&gt; V_Count number;<br>
&gt;begin<br>
&gt;--<br>
&gt;V_Int &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:= 0;<br>
&gt;V_Count := 1;<br>
&gt;while V_Count&lt;=length(rtrim(V_Text)) and V_Int=0<br>
&gt; loop<br>
&gt; &nbsp;if ascii(substr(V_Text, V_Count, 1))&lt;32 then<br>
&gt; &nbsp; V_Int := V_Count;<br>
&gt; &nbsp;end if;<br>
&gt; V_Count := V_Count + 1;<br>
&gt;end loop;<br>
&gt;return V_Int;<br>
&gt;--<br>
&gt;exception<br>
&gt; &nbsp;when others then<br>
&gt; &nbsp; &nbsp;return -1;<br>
&gt;end BAD_ASCII;<br>
&gt;/<br>
&gt;<br>
<br>
Peter,<br>
<br>
 &nbsp; I think that you can make this code 25% faster when the data is clean (which hopefully is the general case) by using 'replace', more efficient than a PL/SQL loop, to check whether you have some rubbish (sort of). It will not tell you where the bad character is, however - which means that then you can loop to look for it.<br>
<br>
Here is what I would suggest :<br>
<br>
create or replace Function BAD_ASCII (V_Text in char) <br>
return number <br>
is <br>
 &nbsp;V_Int number; <br>
 &nbsp;V_Count number; <br>
begin <br>
 &nbsp;if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)||<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(4)||chr(5)||chr(6)||chr(7)||<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(8)||chr(9)||chr(10)||chr(11)||<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(12)||chr(13)||chr(14)||chr(15)||<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(16)||chr(17)||chr(18)||chr(19)||<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(20)||chr(21)||chr(22)||chr(23)||<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(24)||chr(25)||chr(26)||chr(27)||<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(28)||chr(29)||chr(30)||chr(31),<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'--------------------------------') <br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= V_text)<br>
 &nbsp;then<br>
 &nbsp; &nbsp;return 0;<br>
 &nbsp;else<br>
 &nbsp; &nbsp;V_Int := 0; <br>
 &nbsp; &nbsp;V_Count := 1; <br>
 &nbsp; &nbsp;while V_Count&lt;=length(rtrim(V_Text)) and V_Int=0 <br>
 &nbsp; &nbsp;loop <br>
 &nbsp; &nbsp; &nbsp;if ascii(substr(V_Text, V_Count, 1))&lt;32 then <br>
 &nbsp; &nbsp; &nbsp; &nbsp;V_Int := V_Count; <br>
 &nbsp; &nbsp; &nbsp;end if; <br>
 &nbsp; &nbsp; &nbsp;V_Count := V_Count + 1; <br>
 &nbsp; &nbsp;end loop; <br>
 &nbsp; &nbsp;return V_Int; <br>
 end if;<br>
-- <br>
exception <br>
 &nbsp;when others then <br>
 &nbsp; &nbsp;return -1; <br>
end BAD_ASCII; <br>
/ <br>
<br>
Regards,</font>
<br><font size=2 face="Courier New"><br>
Stephane Faroult<br>
Oriole<br>
-- <br>
Please see the official ORACLE-L FAQ: http://www.orafaq.net<br>
-- <br>
Author: Stephane Faroult<br>
 &nbsp;INET: sfaroult@oriolecorp.com<br>
<br>
Fat City Network Services &nbsp; &nbsp;-- 858-538-5051 http://www.fatcity.com<br>
San Diego, California &nbsp; &nbsp; &nbsp; &nbsp;-- Mailing list and web hosting services<br>
---------------------------------------------------------------------<br>
To REMOVE yourself from this mailing list, send an E-Mail message<br>
to: ListGuru@fatcity.com (note EXACT spelling of 'ListGuru') and in<br>
the message BODY, include a line containing: UNSUB ORACLE-L<br>
(or the name of mailing list you want to be removed from). &nbsp;You may<br>
also send the HELP command for other information (like subscribing).<br>
</font>
<br>
<br>
--=_alternative 0066464B88256DBB_=--
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: 
  INET: Jared.Still@radisys.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru@fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

