| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Word Frequencies
John Paulett wrote:
> Hello,
>
> I am trying to analyze word frequencies in articles (stored in clob's).
> There are several million articles, each which has an average length of
> about 150 words. My goal is to eventually break the data down by the
> author and category of the article to see how many unique words are used
> (I would need a count of the number of times each unique word appears).
>
> I was initially thinking of just processing all of this using Java, but
> I was wondering if there is some way to natively do it in Oracle / SQL,
> possibly using Oracle Intermedia or Text. I have 10g on Windows. I
> started trying to use the index tables (e.g. DR$BOOKS_TITLE$K), but I do
> not want to ignore words like "the" and "a," and from what I could see
> this would not give me the option of breaking the analysis down by
> author and category.
>
> Any help is greatly appreciated -- even just a point in the right
> direction (or if you think I should do this some other way, like Java or
> PL/SQL).
>
> Thanks,
>
> John
> jmpcrew_at_hotmail.com
I am not very familiar with Oracle Text but if I had to do something like this I think I would be looking at a Merge statement against an IOT for each word I parsed from a document to get the count of unique words.
HTH -- Mark D Powell -- Received on Tue Jun 06 2006 - 11:34:36 CDT
![]() |
![]() |