Re: How to handle unstructured data with Oracle 7
Date: 1996/09/07
Message-ID: <3231118e.172268283_at_news.concentric.net>#1/1
On Thu, 05 Sep 1996 17:51:50 +0200, Felix Honold <100551.3551_at_compuserve.com> wrote:
>Hello
>
>We are currently in a project where we need to handle a big number of
>unstructured text strings in a Oracle database and to search for single
>words within these strings ...
I'm afraid to ask why you would want to do such a thing...
>The whole database with structured and unstructured data will contain
>serval GB of data.
>
>Does anybody know how to solve this problem with respect to an optimal
>query performance?
Sure, don't do it :-).
Your where clause would have to read:
select field from table where field like '% a %';
to find the word "a" in a field. Or,
select field from table where field like '% the %';
to find the word "the". Unstringing the resulting rows is problematic via a "C" program or the like. Then, you have to take into account the various punctuation and capitalization situations which are bound to happen (i.e.; "The", "THE" "and,", etc., ad infinitum).
>Question 1:
>Is it o.k just to keep the text strings in varchar2(2000) fields or do
>we have to keep every word in its own database field for beeing able to
>create an efficient index on these words?
Yes to part one, for part 2, yes and your index would be HUGE (not to mention the database)! What you are talking about is relatively undoable (yes, it might be a new word). You could not create anything resembling a unique index, therefore your index would contain entries for single words multiple times (how many times do you see the word "the"?). Say you have a million rows containing these text fields, each of which contains about 25 words. The likelihood of a particular word occuring in each row is pretty high.
>Question 2:
>Are there any add-on tools for Oracle to cope the problem of
>unstructured data?
You may want to talk to Oracle about getting a demo of their "Context Option". It is supposed to be able to deal with textual data in something approching an efficient manner (I haven't seen it but it came with 7.3.2.2 for Sun and we have it installed. Playtime hasn't arrived as yet.).
>Many thanks
>
>Felix Honold
>100551,3551_at_compuserve.com or fh_at_innovate-it.ch
Received on Sat Sep 07 1996 - 00:00:00 CEST