Re: How to handle unstructured data with Oracle 7

From: jared still <jared_at_valleynet.com>
Date: 1996/09/08
Message-ID: <50tnl8$sjg_at_alpine.psnw.com>#1/1


Felix Honold <100551.3551_at_compuserve.com> wrote:

>Hello
 

>We are currently in a project where we need to handle a big number of
>unstructured text strings in a Oracle database and to search for single
>words within these strings ...
>The whole database with structured and unstructured data will contain
>serval GB of data.
 

>Does anybody know how to solve this problem with respect to an optimal
>query performance?
 

>Question 1:
>Is it o.k just to keep the text strings in varchar2(2000) fields or do
>we have to keep every word in its own database field for beeing able to
>create an efficient index on these words?

One way to deal with this would be to keep your strings in varchar2(2000) as you mention. Add a column to this table that is generated from a sequence. This will be your Primary Key. A trigger will do this easily.

Create another table that has two columns. One is used as a foreign key to the table with the strings. The other column is a varchar2 that will contain a single word in uppercase.

Let's call these tables STRINGS and WORDS.

When inserting a row into the STRINGS table, pass it through a PL/SQL procedure that will parse the string into separate words, remove all duplicate words from the string, remove words such as 'the', 'and', 'your', 'my', etc.

This routine may even be smart enough to follow some rules of grammer and just leave words that are important to the content of the sentence.

Now convert all of the words to upper case ( or lower, it doesn't matter which ) and store them in the words TABLE one word per row, along with the PK from the new row in the STRINGS table.

This will make it pretty simple and fast to search your STRINGS table.

>Question 2:
>Are there any add-on tools for Oracle to cope the problem of
>unstructured data?

Don't know.

>Many thanks
 

>Felix Honold
>100551,3551_at_compuserve.com or fh_at_innovate-it.ch

jared still                  ---- ___o
jared_at_valleynet.com       ----  _`\ <,_
                       -----   (*)/o (*)
--------------------------------------
Received on Sun Sep 08 1996 - 00:00:00 CEST

Original text of this message