Home » SQL & PL/SQL » SQL & PL/SQL » Dbms_lob help (10.2.0.1.0)
Dbms_lob help [message #390679] Mon, 09 March 2009 04:33 Go to next message
hari.choprala
Messages: 65
Registered: November 2006
Location: Mumbai
Member

Hi All,

Iam searching the content of BLOB using below query. It is giving the desired result for doc,xml,html,txt files. But i am not getting the result for pdf files.


SQL> desc test
Name          Type          Nullable Default Comments 
------------- ------------- -------- ------- -------- 
ID            NUMBER        Y                         
DATA          BLOB          Y                         
DOCUMENT_NAME VARCHAR2(250) Y 



SELECT DOCUMENT_NAME FROM TEST
WHERE dbms_lob.INSTR(DATA, utl_raw.cast_to_raw('HARI'),1,1) > 0 


Thanks in advance.

hari
Re: Dbms_lob help [message #390682 is a reply to message #390679] Mon, 09 March 2009 04:42 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
You seem to have some odd ideas about Blobs, and the files that belong in them.

BLOBS are for storing large binary objects. character based data such as Html really belongs in CLOBS.
Xml belongs either in a Clob or in an XMLType.

For data that does belong in Blobs, there's no reason why it should be searchable with INSTR.
MS .doc formats do (as it happens) store the text within them as characters, but it also stores a lot of other character text in it that isn't text that shows up on screen.

PDF documents store none of the displayed text as characters (as far as I know), but they to store a lot of non-displayed data as character text.

In short, you cannot rely on INSTR to give an accurate results when searching a BLOB - if you search the right sort of Blob data, you might find the word you're looking for, but it might be a false positive result.
Re: Dbms_lob help [message #390687 is a reply to message #390682] Mon, 09 March 2009 04:52 Go to previous messageGo to next message
hari.choprala
Messages: 65
Registered: November 2006
Location: Mumbai
Member

Hi JRowBottem,

Thanks for the reply.

For testing purpose, i just inserted different files in to blobs and searching the content with in the file. Is there any other way to search the content of blob column.

Thanks in advance.

Hari
Re: Dbms_lob help [message #390690 is a reply to message #390687] Mon, 09 March 2009 05:15 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
It depends entirely on the binary data that you're searching.
Generally, binary data isn't intended to be searchable - what could you look for in a .jpg or .avi file?
Re: Dbms_lob help [message #390698 is a reply to message #390690] Mon, 09 March 2009 05:42 Go to previous messageGo to next message
hari.choprala
Messages: 65
Registered: November 2006
Location: Mumbai
Member

Hi,

Then how can I search the content of BLOB. Because in my database documents are stored in blob column. Iam just using doc and pdf files only.

JRowBottem,

please provide me any link or idea.

Thanks in advance
Hari

[Updated on: Mon, 09 March 2009 05:43]

Report message to a moderator

Re: Dbms_lob help [message #390717 is a reply to message #390698] Mon, 09 March 2009 07:32 Go to previous messageGo to next message
JRowbottom
Messages: 5933
Registered: June 2006
Location: Sunny North Yorkshire, ho...
Senior Member
You would need some code to extract the text from a document of that type - and then you'll need some way to call this code from Oracle.

This is a non-trivial requirement - Anyone out there done this before?

Re: Dbms_lob help [message #390728 is a reply to message #390717] Mon, 09 March 2009 08:20 Go to previous messageGo to next message
Frank
Messages: 7880
Registered: March 2000
Senior Member
Maybe Barbara has experience with this, given her vast knowledge of Oracle Text indexes.
I think I recall that it supports searching in MicroSoft Word documents..
Re: Dbms_lob help [message #390815 is a reply to message #390728] Mon, 09 March 2009 21:55 Go to previous messageGo to next message
Barbara Boehmer
Messages: 8631
Registered: November 2002
Location: California, USA
Senior Member
You can use Oracle Text to index documents of various formats stored in a blob column. I have provided a simple demo using pdf documents below. Oracle Text has many search capabilities. You can also index the files directly from the server using a file_datastore without loading them into blob columns.

SCOTT@orcl_11g> CREATE TABLE test (
  2    ID	     NUMBER,
  3    DATA	     BLOB,
  4    DOCUMENT_NAME VARCHAR2(250))
  5  /

Table created.

SCOTT@orcl_11g> CREATE OR REPLACE DIRECTORY my_dir AS 'c:\oracle11g'
  2  /

Directory created.

SCOTT@orcl_11g> DECLARE
  2    v_bfile	       BFILE;
  3    v_blob	       BLOB;
  4  BEGIN
  5    v_bfile := BFILENAME ('MY_DIR', 'banana.pdf');
  6    INSERT INTO test (id, data, document_name)
  7  	 VALUES (1, EMPTY_BLOB(), 'Banana recipes')
  8  	 RETURNING data INTO v_blob;
  9    DBMS_LOB.OPEN (v_bfile, DBMS_LOB.LOB_READONLY);
 10    DBMS_LOB.LOADFROMFILE
 11  	 (v_blob, v_bfile, DBMS_LOB.GETLENGTH (v_bfile));
 12    DBMS_LOB.CLOSE (v_bfile);
 13    --
 14    v_bfile := BFILENAME ('MY_DIR', 'cranberry.pdf');
 15    INSERT INTO test (id, data, document_name)
 16  	 VALUES (2, EMPTY_BLOB(), 'Cranberry recipes')
 17  	 RETURNING data INTO v_blob;
 18    DBMS_LOB.OPEN (v_bfile, DBMS_LOB.LOB_READONLY);
 19    DBMS_LOB.LOADFROMFILE
 20  	 (v_blob, v_bfile, DBMS_LOB.GETLENGTH (v_bfile));
 21    DBMS_LOB.CLOSE (v_bfile);
 22  END;
 23  /

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> CREATE INDEX your_index ON test (data)
  2  INDEXTYPE IS CTXSYS.CONTEXT
  3  /

Index created.

SCOTT@orcl_11g> EXEC CTX_DOC.SET_KEY_TYPE ('ROWID')

PL/SQL procedure successfully completed.

SCOTT@orcl_11g> COLUMN document_name	    FORMAT A17
SCOTT@orcl_11g> COLUMN key_words_in_context FORMAT A45 WORD_WRAPPED
SCOTT@orcl_11g> SELECT document_name,
  2  	    CTX_DOC.SNIPPET ('your_index', ROWID, 'banana')
  3  	      AS key_words_in_context
  4  FROM   test
  5  WHERE  CONTAINS (data, 'banana') > 0
  6  /

DOCUMENT_NAME     KEY_WORDS_IN_CONTEXT
----------------- ---------------------------------------------
Banana recipes    Fruit of the Month
                  <b>Banana</b>
                  Bananas are the most popular
                  fresh<b>...</b>yellow <b>banana</b> of
                  Cavendish is the <b>banana</b> we see in


SCOTT@orcl_11g> SELECT document_name,
  2  	    CTX_DOC.SNIPPET ('your_index', ROWID, 'banana')
  3  	      AS key_words_in_context
  4  FROM   test
  5  WHERE  CONTAINS (data, 'cranberry') > 0
  6  /

DOCUMENT_NAME     KEY_WORDS_IN_CONTEXT
----------------- ---------------------------------------------
Cranberry recipes Fruit of the Month
                  Cranberries
                  Cranberries grow on vines in boggy areas.
                  Cranberries were first cultivated
                  in Massachusetts around 1815 and are


SCOTT@orcl_11g>


Re: Dbms_lob help [message #390836 is a reply to message #390815] Tue, 10 March 2009 00:02 Go to previous message
hari.choprala
Messages: 65
Registered: November 2006
Location: Mumbai
Member

Thanks Barbara
Previous Topic: Date difference in days
Next Topic: ora 00907
Goto Forum:
  


Current Time: Mon Dec 05 08:33:23 CST 2016

Total time taken to generate the page: 0.06113 seconds