Return-Path: <ml-errors@fatcity.com>
Received: from ensim.rackshack.net (root@localhost)
 by orafaq.net (8.11.6/8.11.6) with ESMTP id h9B0ZVc08732
 for <oracle-l@orafaq.net>; Fri, 10 Oct 2003 19:35:31 -0500
X-ClientAddr: 66.27.56.210
Received: from ns3.fatcity.com (rrcs-west-66-27-56-210.biz.rr.com [66.27.56.210])
 by ensim.rackshack.net (8.11.6/8.11.6) with ESMTP id h9B0ZVc08727
 for <oracle-l@orafaq.net>; Fri, 10 Oct 2003 19:35:31 -0500
Received: from ns3.fatcity.com (localhost.localdomain [127.0.0.1])
 by ns3.fatcity.com (8.12.8/8.12.8) with ESMTP id h9ALo1WL032651
 for <oracle-l@orafaq.net>; Fri, 10 Oct 2003 14:51:55 -0700
Received: (from root@localhost)
 by ns3.fatcity.com (8.12.8/8.12.5/Submit) id h9ALUO9K030635
 for oracle-l@orafaq.net; Fri, 10 Oct 2003 14:30:25 -0700
Received: by fatcity.com (05-Jun-2003/v1.0g-b73/bab) via fatcity.com id 005D2B9E; Fri, 10 Oct 2003 14:29:24 -0800
Message-ID: <F001.005D2B9E.20031010142924@fatcity.com>
Date: Fri, 10 Oct 2003 14:29:24 -0800
To: Multiple recipients of list ORACLE-L <ORACLE-L@fatcity.com>
X-Comment: Oracle RDBMS Community Forum
X-Sender: Jared.Still@radisys.com
Sender: ml-errors@fatcity.com
Reply-To: ORACLE-L@fatcity.com
Errors-To: ML-ERRORS@fatcity.com
From: Jared.Still@radisys.com
Subject: Re: Find an unprintable character inside a column....
Organization: Fat City Network Services, San Diego, California
X-ListServer: v1.0g, build 73; ListGuru (c) 1996-2003 Bruce A. Bergman
Precedence: bulk
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="=_alternative 007538BF88256DBB_="
--=_alternative 007538BF88256DBB_=
Content-Type: text/plain; charset="us-ascii"

Always glad to be of service.

It works with translate(), about  53% faster.





Stephane Faroult <sfaroult@oriole.com>
Sent by: ml-errors@fatcity.com
 10/10/2003 02:54 PM
 Please respond to ORACLE-L

 
        To:     Multiple recipients of list ORACLE-L <ORACLE-L@fatcity.com>
        cc: 
        Subject:        Re: Find an unprintable character inside a column....


Jared.Still@radisys.com wrote:
> 
> I played with this a bit.
> 
> First, I created some test  data with one column corrupted with a
> single random character
> of 0-31 replacing a random char in that column 20% of the rows of the
> table.
> 
> Peter's function correctly found all of the rows in 7.5 seconds.
> 
> Stephane's function ran in 3.5 seconds, but didn't find any of
> the rows.  I didn't attempt to correct the code.
> 
> Then I tried a function based on owa_pattern.regex.  My initial
> attempts
> didn't return the correct rows, as the regex pattern needed some
> tuning.
> 
> I didn't attempt to fix it, as it was woefully slow, about 30 seconds.
> 
> Regex in the WHERE clause in 10g will be nice.
> 
> Jared
> 
>  "Stephane Faroult"
>  <sfaroult@oriolecorp.com>                  To:        Multiple
>  Sent by: ml-errors@fatcity.com     recipients of list ORACLE-L
>                                     <ORACLE-L@fatcity.com>
>   10/10/2003 07:09 AM                       cc:
>   Please respond to ORACLE-L                Subject:        RE: RE:
>                                     RE: Find an unprintable character
>                                     inside a column....
> 
> >Some people have requested this code, so I thought
> >you might as well all
> >have the chance to pick it to bits... Its a
> >function called BAD_ASCII, and
> >it hunts out for any ascii characters with an ascii
> >value of less than 32 in
> >a specified field. (Acknowledgments to my colleague
> >Keith Holmes for help
> >with this code.)
> >
> >Use it as follows:
> >
> >Where a field called DATA in a table TABLE_1 may
> >contain an ascci character
> >with a value less than 32 (ie a non-printing
> >character), the following SQL
> >will find the row in question:
> >
> >select rowid,DATA,dump(DATA) from TABLE_1
> >where BAD_ASCII(DATA) > 0;
> >
> >You could use the PK of the table instead of rowid,
> >of course. You will also
> >note that I select the DATA field in both normal
> >and ascii 'dump' mode, the
> >better to locate where the corruption is located.
> >
> >peter
> >edinburgh
> >...................................
> >
> >Source as follows:
> >
> >
> >Function BAD_ASCII
> > (V_Text in char)
> > return number
> >is
> > V_Int  number;
> > V_Count number;
> >begin
> >--
> >V_Int                  := 0;
> >V_Count := 1;
> >while V_Count<=length(rtrim(V_Text)) and V_Int=0
> > loop
> >  if ascii(substr(V_Text, V_Count, 1))<32 then
> >   V_Int := V_Count;
> >  end if;
> > V_Count := V_Count + 1;
> >end loop;
> >return V_Int;
> >--
> >exception
> >  when others then
> >    return -1;
> >end BAD_ASCII;
> >/
> >
> 
> Peter,
> 
>   I think that you can make this code 25% faster when the data is
> clean (which hopefully is the general case) by using 'replace', more
> efficient than a PL/SQL loop, to check whether you have some rubbish
> (sort of). It will not tell you where the bad character is, however -
> which means that then you can loop to look for it.
> 
> Here is what I would suggest :
> 
> create or replace Function BAD_ASCII (V_Text in char)
> return number
> is
>  V_Int number;
>  V_Count number;
> begin
>  if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)||
>                      chr(4)||chr(5)||chr(6)||chr(7)||
>                      chr(8)||chr(9)||chr(10)||chr(11)||
>                      chr(12)||chr(13)||chr(14)||chr(15)||
>                      chr(16)||chr(17)||chr(18)||chr(19)||
>                      chr(20)||chr(21)||chr(22)||chr(23)||
>                      chr(24)||chr(25)||chr(26)||chr(27)||
>                      chr(28)||chr(29)||chr(30)||chr(31),
>                      '--------------------------------')
>                    = V_text)
>  then
>    return 0;
>  else
>    V_Int := 0;
>    V_Count := 1;
>    while V_Count<=length(rtrim(V_Text)) and V_Int=0
>    loop
>      if ascii(substr(V_Text, V_Count, 1))<32 then
>        V_Int := V_Count;
>      end if;
>      V_Count := V_Count + 1;
>    end loop;
>    return V_Int;
> end if;
> --
> exception
>  when others then
>    return -1;
> end BAD_ASCII;
> /


Jared, you're the scourge of people who just write things out of the top
of their head and don't test them thoroughly :-). I had made my usual
mistake of using REPLACE instead of TRANSLATE. Just tried it with
'regular' data, since this is the only case where it can be faster that
Peter's routine.
Works like Peter's routine with TRANSLATE, only somewhat faster.



--=_alternative 007538BF88256DBB_=
Content-Type: text/html; charset="us-ascii"


<br><font size=2 face="sans-serif">Always glad to be of service.</font>
<br>
<br><font size=2 face="sans-serif">It works with translate(), about &nbsp;53% faster.<br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td>
<td><font size=1 face="sans-serif"><b>Stephane Faroult &lt;sfaroult@oriole.com&gt;</b></font>
<br><font size=1 face="sans-serif">Sent by: ml-errors@fatcity.com</font>
<p><font size=1 face="sans-serif">&nbsp;10/10/2003 02:54 PM</font>
<br><font size=2 face="sans-serif">&nbsp;</font><font size=1 face="sans-serif">Please respond to ORACLE-L</font>
<br>
<td><font size=1 face="Arial">&nbsp; &nbsp; &nbsp; &nbsp; </font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; To: &nbsp; &nbsp; &nbsp; &nbsp;Multiple recipients of list ORACLE-L &lt;ORACLE-L@fatcity.com&gt;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; cc: &nbsp; &nbsp; &nbsp; &nbsp;</font>
<br><font size=1 face="sans-serif">&nbsp; &nbsp; &nbsp; &nbsp; Subject: &nbsp; &nbsp; &nbsp; &nbsp;Re: Find an unprintable character inside a column....</font></table>
<br>
<br>
<br><font size=2 face="Courier New">Jared.Still@radisys.com wrote:<br>
&gt; <br>
&gt; I played with this a bit.<br>
&gt; <br>
&gt; First, I created some test &nbsp;data with one column corrupted with a<br>
&gt; single random character<br>
&gt; of 0-31 replacing a random char in that column 20% of the rows of the<br>
&gt; table.<br>
&gt; <br>
&gt; Peter's function correctly found all of the rows in 7.5 seconds.<br>
&gt; <br>
&gt; Stephane's function ran in 3.5 seconds, but didn't find any of<br>
&gt; the rows. &nbsp;I didn't attempt to correct the code.<br>
&gt; <br>
&gt; Then I tried a function based on owa_pattern.regex. &nbsp;My initial<br>
&gt; attempts<br>
&gt; didn't return the correct rows, as the regex pattern needed some<br>
&gt; tuning.<br>
&gt; <br>
&gt; I didn't attempt to fix it, as it was woefully slow, about 30 seconds.<br>
&gt; <br>
&gt; Regex in the WHERE clause in 10g will be nice.<br>
&gt; <br>
&gt; Jared<br>
&gt; <br>
&gt; &nbsp;&quot;Stephane Faroult&quot;<br>
&gt; &nbsp;&lt;sfaroult@oriolecorp.com&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;To: &nbsp; &nbsp; &nbsp; &nbsp;Multiple<br>
&gt; &nbsp;Sent by: ml-errors@fatcity.com &nbsp; &nbsp; recipients of list ORACLE-L<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;ORACLE-L@fatcity.com&gt;<br>
&gt; &nbsp; 10/10/2003 07:09 AM &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cc:<br>
&gt; &nbsp; Please respond to ORACLE-L &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Subject: &nbsp; &nbsp; &nbsp; &nbsp;RE: RE:<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; RE: Find an unprintable character<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; inside a column....<br>
&gt; <br>
&gt; &gt;Some people have requested this code, so I thought<br>
&gt; &gt;you might as well all<br>
&gt; &gt;have the chance to pick it to bits... Its a<br>
&gt; &gt;function called BAD_ASCII, and<br>
&gt; &gt;it hunts out for any ascii characters with an ascii<br>
&gt; &gt;value of less than 32 in<br>
&gt; &gt;a specified field. (Acknowledgments to my colleague<br>
&gt; &gt;Keith Holmes for help<br>
&gt; &gt;with this code.)<br>
&gt; &gt;<br>
&gt; &gt;Use it as follows:<br>
&gt; &gt;<br>
&gt; &gt;Where a field called DATA in a table TABLE_1 may<br>
&gt; &gt;contain an ascci character<br>
&gt; &gt;with a value less than 32 (ie a non-printing<br>
&gt; &gt;character), the following SQL<br>
&gt; &gt;will find the row in question:<br>
&gt; &gt;<br>
&gt; &gt;select rowid,DATA,dump(DATA) from TABLE_1<br>
&gt; &gt;where BAD_ASCII(DATA) &gt; 0;<br>
&gt; &gt;<br>
&gt; &gt;You could use the PK of the table instead of rowid,<br>
&gt; &gt;of course. You will also<br>
&gt; &gt;note that I select the DATA field in both normal<br>
&gt; &gt;and ascii 'dump' mode, the<br>
&gt; &gt;better to locate where the corruption is located.<br>
&gt; &gt;<br>
&gt; &gt;peter<br>
&gt; &gt;edinburgh<br>
&gt; &gt;...................................<br>
&gt; &gt;<br>
&gt; &gt;Source as follows:<br>
&gt; &gt;<br>
&gt; &gt;<br>
&gt; &gt;Function BAD_ASCII<br>
&gt; &gt; (V_Text in char)<br>
&gt; &gt; return number<br>
&gt; &gt;is<br>
&gt; &gt; V_Int &nbsp;number;<br>
&gt; &gt; V_Count number;<br>
&gt; &gt;begin<br>
&gt; &gt;--<br>
&gt; &gt;V_Int &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;:= 0;<br>
&gt; &gt;V_Count := 1;<br>
&gt; &gt;while V_Count&lt;=length(rtrim(V_Text)) and V_Int=0<br>
&gt; &gt; loop<br>
&gt; &gt; &nbsp;if ascii(substr(V_Text, V_Count, 1))&lt;32 then<br>
&gt; &gt; &nbsp; V_Int := V_Count;<br>
&gt; &gt; &nbsp;end if;<br>
&gt; &gt; V_Count := V_Count + 1;<br>
&gt; &gt;end loop;<br>
&gt; &gt;return V_Int;<br>
&gt; &gt;--<br>
&gt; &gt;exception<br>
&gt; &gt; &nbsp;when others then<br>
&gt; &gt; &nbsp; &nbsp;return -1;<br>
&gt; &gt;end BAD_ASCII;<br>
&gt; &gt;/<br>
&gt; &gt;<br>
&gt; <br>
&gt; Peter,<br>
&gt; <br>
&gt; &nbsp; I think that you can make this code 25% faster when the data is<br>
&gt; clean (which hopefully is the general case) by using 'replace', more<br>
&gt; efficient than a PL/SQL loop, to check whether you have some rubbish<br>
&gt; (sort of). It will not tell you where the bad character is, however -</font>
<br><font size=2 face="Courier New">&gt; which means that then you can loop to look for it.<br>
&gt; <br>
&gt; Here is what I would suggest :<br>
&gt; <br>
&gt; create or replace Function BAD_ASCII (V_Text in char)<br>
&gt; return number<br>
&gt; is<br>
&gt; &nbsp;V_Int number;<br>
&gt; &nbsp;V_Count number;<br>
&gt; begin<br>
&gt; &nbsp;if (replace(V_text, chr(0)||chr(1)||chr(2)||chr(3)||<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(4)||chr(5)||chr(6)||chr(7)||<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(8)||chr(9)||chr(10)||chr(11)||<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(12)||chr(13)||chr(14)||chr(15)||<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(16)||chr(17)||chr(18)||chr(19)||<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(20)||chr(21)||chr(22)||chr(23)||<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(24)||chr(25)||chr(26)||chr(27)||<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;chr(28)||chr(29)||chr(30)||chr(31),<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;'--------------------------------')<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;= V_text)<br>
&gt; &nbsp;then<br>
&gt; &nbsp; &nbsp;return 0;<br>
&gt; &nbsp;else<br>
&gt; &nbsp; &nbsp;V_Int := 0;<br>
&gt; &nbsp; &nbsp;V_Count := 1;<br>
&gt; &nbsp; &nbsp;while V_Count&lt;=length(rtrim(V_Text)) and V_Int=0<br>
&gt; &nbsp; &nbsp;loop<br>
&gt; &nbsp; &nbsp; &nbsp;if ascii(substr(V_Text, V_Count, 1))&lt;32 then<br>
&gt; &nbsp; &nbsp; &nbsp; &nbsp;V_Int := V_Count;<br>
&gt; &nbsp; &nbsp; &nbsp;end if;<br>
&gt; &nbsp; &nbsp; &nbsp;V_Count := V_Count + 1;<br>
&gt; &nbsp; &nbsp;end loop;<br>
&gt; &nbsp; &nbsp;return V_Int;<br>
&gt; end if;<br>
&gt; --<br>
&gt; exception<br>
&gt; &nbsp;when others then<br>
&gt; &nbsp; &nbsp;return -1;<br>
&gt; end BAD_ASCII;<br>
&gt; /<br>
<br>
<br>
Jared, you're the scourge of people who just write things out of the top<br>
of their head and don't test them thoroughly :-). I had made my usual<br>
mistake of using REPLACE instead of TRANSLATE. Just tried it with<br>
'regular' data, since this is the only case where it can be faster that<br>
Peter's routine.<br>
Works like Peter's routine with TRANSLATE, only somewhat faster.<br>
<br>
</font>
<br>
--=_alternative 007538BF88256DBB_=--
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: 
  INET: Jared.Still@radisys.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru@fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).

