Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: DBMS_LOB.INSTR Function makes mistake searching string on BLOB columns

Re: DBMS_LOB.INSTR Function makes mistake searching string on BLOB columns

From: Thomas Kyte <tkyte_at_us.oracle.com>
Date: 3 Dec 2001 05:25:12 -0800
Message-ID: <9ufufo0dg@drn.newsguy.com>


In article <483af060.0112030042.10f26558_at_posting.google.com>, a_dursun_at_HOTMAIL.COM says...
>
>Hi ,
>i am using Oracle 8x. i have a table contains one BLOB column that
>store any type of file file (Word,Excel,PawerPoint etc). My
>application searhces data stored BLOB columns by using DBMS_LOB.INSTR
>function.But DBMS_LOG.INSTR funciton makes mistake searching BLOB
>columns and can not find the matching data.
>
> For example :
>
>a Word file stored before and contains a string "THIS IS A TEST". MY
>SQL is ;
>
>SELECT *
> FROM MYTABLE
>WHERE DBMS_LOB.INSTR(MYCOLUNM,SYS.UTL_RAW.CAST_TO_RAW("THIS IS A
>TEST")) > 0
>This SQL returns now row selected..
>
>Please help me . What can i do ? or Am i wrong ?
>
>Sincenerly
>Adnan DURSUN

Well, you are assuming that MS word will store "this is a test" as a single string and there is nothing saying that it will (or won't). A .doc file is a data structure, it'll do whatever it feels like it with it.

(you should really be using interMedia which understands the document format, gets the text out and indexes it efficiently, using dbms_lob.instr to search is going to be slooooowwwww).

I can say in general, this works -- doing an instr on a blob. I created a doc file, verified using a binary editor that the string THIS IS A TEST was in it and loaded it using:

create table demo
( date_loaded date,
  theblob blob
)
/

create or replace directory my_files as '/export/home/tkyte/';

create or replace
procedure load_a_file( p_dir_name in varchar2,

                       p_file_name in varchar2 )
as

    l_blob blob;
    l_bfile bfile;
begin

    insert into demo values ( sysdate, empty_blob() )     returning theblob into l_blob;

    l_bfile := bfilename( p_dir_name, p_file_name );     dbms_lob.fileopen( l_bfile );

    dbms_lob.fileclose( l_bfile );
end;
/

exec load_a_file( 'MY_FILES', 'x.doc' );

Then, i verified it got loaded and that instr works:

ops$tkyte_at_ORA817DEV.US.ORACLE.COM> select date_loaded,

  2         dbms_lob.getlength(theblob) len,
  3         dbms_lob.instr(theblob,utl_raw.cast_to_raw('THIS IS A TEST')) instr
  4 from demo;

DATE_LOAD LEN INSTR
--------- ---------- ----------
03-DEC-01 19456 1537 What you might want to do is have a function to dump the blob so you can inspect what you have, for example:

ops$tkyte_at_ORA817DEV.US.ORACLE.COM> create or replace

  2  function clean( p_raw in blob,
  3                  p_from_byte in number default 1,
  4                  p_for_bytes in number default 4000 )
  5 return varchar2
  6 as
  7      l_tmp varchar2(8192) default
  8             utl_raw.cast_to_varchar2(
  9                 dbms_lob.substr(p_raw,p_for_bytes,p_from_byte)
 10                                     );
 11      l_char   char(1);
 12      l_return varchar2(16384);
 13      l_whitespace varchar2(25) default
 14                   chr(13) || chr(10) || chr(9);
 15      l_ws_char    varchar2(50) default
 16                   'rnt';

 17
 18 begin
 19      for i in 1 .. length(l_tmp)
 20      loop
 21          l_char := substr( l_tmp, i, 1 );
 22  
 23          -- if the character is "printable" (ascii non-control)
 24          -- then just add it.  If it happens to be a \, add another
 25          -- \ to it since we will replace newlines and tabs with
 26          -- \n and \t and such and need to be able to tell the
 27          -- difference between a file with \n in it and a newline
 28  
 29          if ( ascii(l_char) between 32 and 127 )
 30          then
 31              l_return := l_return || l_char;
 32              if ( l_char = '\' ) then
 33                  l_return := l_return || '\';
 34              end if;
 35  
 36          -- if the character is a "whitespace" replace it
 37          -- with a special charcter like \r, \n, \t
 38  
 39          elsif ( instr( l_whitespace, l_char ) > 0 )
 40          then
 41              l_return := l_return ||
 42                     '\' ||
 43                     substr( l_ws_char, instr(l_whitespace,l_char), 1 );
 44  
 45          -- else for all other non-printable characters
 46          -- just put a "."
 47  
 48          else
 49              l_return := l_return || '.';
 50          end if;
 51      end loop;
 52  
 53      -- now, just return the first 4000 bytes as
 54      -- that is all that SQL will let us see.  We
 55      -- might have more then 4000 characters since chr(10) will
 56      -- become \n (double the bytes) and so this is necessary
 57  
 58      return substr(l_return,1,4000);
 59 end;
 60 /

Function created.

Now, I can dump the file to the screen in sqlplus:

ops$tkyte_at_ORA817DEV.US.ORACLE.COM> select r * 80 from_byte, (r+1)*80 to_byte, substr(clean( theblob, r*80+1, 80 ),1,80) bytes   2 from demo, (select rownum-1 r

  3                              from all_objects
4                             where rownum < (select
ceil((dbms_lob.getlength(theblob))/80)+1 from demo)
  5               )

  6 /

 FROM_BYTE TO_BYTE BYTES
---------- ----------



0 80
........................>.....\t.................!...........#............... ..
80        160
................................................................................
160        240
................................................................................
240        320
................................................................................
320        400
................................................................................
400        480
................................................................................
480        560
....................................G.\t...........................bjbj.........
560        640
............\t..................................................................
640        720
........................].......................................................
720        800
................................................................................
800        880
..........................................................p.......r.......r.....
880        960
..r.......r.......r.......r...$...a.......U...r.................................
960       1040
................................................................................
1040       1120
................................................................................
1120       1200
..........................p.....................................................
1200       1280
..........................p...............................p.....................
1280       1360
..........................................................p.....................
1360       1440
...q'..{......................\n...p............................................
1440       1520
................................................................................
1520       1600 ................THIS IS A
TEST\r................................................

......

and I can "see" what is actually in there.

--
Thomas Kyte (tkyte@us.oracle.com)             http://asktom.oracle.com/ 
Expert one on one Oracle, programming techniques and solutions for Oracle.
http://www.amazon.com/exec/obidos/ASIN/1861004826/  
Opinions are mine and do not necessarily reflect those of Oracle Corp 
Received on Mon Dec 03 2001 - 07:25:12 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US