Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: Oracle Text Design Question

Re: Oracle Text Design Question

From: Kevin Crosbie <caoimhinocrosbai_at_yahoo.com>
Date: Wed, 13 Apr 2005 17:44:19 +0200
Message-ID: <1113407067.97b49ccb470e645ff08809f532ce2822@teranews>

"Giovanni Azua" <bravegag_at_hotmail.com> wrote in message news:3c4r04F6kblndU1_at_individual.net...
> Hi Kevin,
>
> "Kevin Crosbie" <caoimhinocrosbai_at_yahoo.com> wrote
> > Doing this now... also rebuilding the index before hand, just to be
sure.
> > I've taken an export into another schema so I can play with a subset of
my
> > data (2 Million rows).
> >
> Please note that the statistics at table level ARE NOT enough. This is
> because Oracle Text indexes generate lot of auxiliary tables which are
> queried MANY times under the hood before or even during your query
> execution, the explain plan of the originating query will not include
> this. I found out by getting exclusive access to my DB, monitoring
> all queries and running the Oracle TEXT query, this will evidence
> all the tiny queries which to me seems are also improved by computing
> statistics on them.
>

I'm going to run the exact PLSQL code you sent so that should do all tables.

> > I'll try this suggestion out next and let you know how it goes.
> >
> > It just seems strange to me, I guess I haven't really grasped the
overall
> > concept of how text indexes work.
> > From what I've read in this doc:
> > http://www.oracle.com/technology/products/text/x/FAQs/imt_Perf_Faq.html
it
> > seems that it is designed to work great for single table queries,
> especially
> > when restricted by rownum.
> > What doesn't seem clear to me is how it can work efficiently when
joining
> to
> > other tables or making a query more complex by adding more clauses.
> >
> I have an application which uses the most expensive features of Oracle
> text weighted fuzzy and wildcard with boolean logic, single query, many
> contain clauses, 7+ tables joined to and with the table indexed, tested
> over more than 300K rows and I get response time ALWAYS lower than a sec.
>
> The major difference we have is that the column where I have the context
> index on is VARCHAR2 and not CLOB but I can also bet it is not the issue
> here ... I also had serious performance troubles with Oracle Text even
> using VARCHAR2 as indexed column.
>

That sounds reassuring. My new schema is still being created... lots of rows, so it'll probably be tomorrow before I reply back with an update.

It looks like it could be just a matter of how I structure my queries. Thanks for the advice.

> e.g. 3 contain clauses here ...
>
> select /*+ NOPARALLEL(ft1) */ to_char(ft1.entity_id) as id
> , to_char(round(ft1.score)) as score
> , ft1.bestmatch
> , ft1.name_last as lastname
> , ft1.name_first as firstname
> , ft1.category
> , ft1.entry_type as entrytype
> , to_char(ft1.country_id) as country_id
> , ft1.ctry_name_english as country
> , ft1.list_abbreviations as source
> , to_char(ft1.list_readonly) as readonly
> from (SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS
> DOMAIN_INDEX_NO_SORT */
> t2.entity_id
> , t2.real_name_first as name_first
> , t2.real_name_last as name_last
> , t2.bestmatch
> , t2.enty_category as category
> , t2.enty_country_id as country_id
> , t2.enty_entry_type as entry_type
> , t2.ctry_name_english
> , t2.list_readonly
> , t2.list_abbreviations
> , round(( greatest(score_1*decode(occurs_last_flag_1, 1, 10.0, 0)
> /greatest(1, CASE WHEN c_name_last_wc >= 3 THEN c_name_last_wc ELSE 3 -
> c_name_first_wc END , c_name_last_wc)
> , score_1*decode(occurs_first_flag_1, 1, 4.0, 0)
> /greatest(1, CASE WHEN c_name_full_wc > 3 THEN c_name_first_wc ELSE 3 -
> c_name_last_wc END, c_name_first_wc)) +
> greatest(score_2*decode(occurs_last_flag_2, 1, 10.0, 0)
> /greatest(1, CASE WHEN c_name_last_wc >= 3 THEN c_name_last_wc ELSE 3 -
> c_name_first_wc END , c_name_last_wc)
> , score_2*decode(occurs_first_flag_2, 1, 4.0, 0)
> /greatest(1, CASE WHEN c_name_full_wc > 3 THEN c_name_first_wc ELSE 3 -
> c_name_last_wc END, c_name_first_wc)) +
> greatest(score_3*decode(occurs_last_flag_3, 1, 10.0, 0)
> /greatest(1, CASE WHEN c_name_last_wc >= 3 THEN c_name_last_wc ELSE 3 -
> c_name_first_wc END , c_name_last_wc)
> , score_3*decode(occurs_first_flag_3, 1, 4.0, 0)
> /greatest(1, CASE WHEN c_name_full_wc > 3 THEN c_name_first_wc ELSE 3 -
> c_name_last_wc END, c_name_first_wc)) )/GREATEST(DECODE(c_name_last, NULL,
> 0, 10.0) + DECODE(c_name_first, NULL, 0, 4.0), 1)) as score
> FROM (
> SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS
DOMAIN_INDEX_NO_SORT
> */
> name_id , max(decode( token_id, 1, score , 0))
> score_1
> , max(decode( token_id, 1, occurs_last_flag, 0)) occurs_last_flag_1
> , max(decode( token_id, 1, decode(occurs_last_flag, 1, 0,
> occurs_first_flag), 0)) occurs_first_flag_1
> , max(decode( token_id, 2, score , 0)) score_2
> , max(decode( token_id, 2, occurs_last_flag, 0)) occurs_last_flag_2
> , max(decode( token_id, 2, decode(occurs_last_flag, 2, 0,
> occurs_first_flag), 0)) occurs_first_flag_2
> , max(decode( token_id, 3, score , 0)) score_3
> , max(decode( token_id, 3, occurs_last_flag, 0)) occurs_last_flag_3
> , max(decode( token_id, 3, decode(occurs_last_flag, 3, 0,
> occurs_first_flag), 0)) occurs_first_flag_3
> FROM (
> SELECT term_id
> , max(score) as score
> , max(token_id) as token_id
> FROM ( select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS
> DOMAIN_INDEX_NO_SORT */
> term_id
> , (100*scoreLevenshtein(TERM, 'OSAMA')) as score
> , 1 as token_id
> from T_NLM_TERM
> where contains(term, 'fuzzy({OSAMA},20,20,weight)', 1) > 0
> union all
> select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
> term_id
> , (100*scoreLevenshtein(TERM, 'BIEN')) as score
> , 2 as token_id
> from T_NLM_TERM
> where contains(term, 'fuzzy({BIEN},20,20,weight)', 2) > 0
> union all
> select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */
> term_id
> , (100*scoreLevenshtein(TERM, 'LADIN')) as score
> , 3 as token_id
> from T_NLM_TERM
> where contains(term, 'fuzzy({LADIN},20,20,weight)', 3) > 0
> ) t1
> GROUP BY term_id
> ) t1
> , T_NLM_SEARCHTERM_REL t2
> WHERE t1.term_id=t2.term_id
> GROUP BY name_id
> ) t1
> , (
> SELECT /*+ NOPARALLEL(t2) */ t2.*
> FROM T_NLM_SEARCH t2
>
> ) t2
> WHERE t1.name_id=t2.name_id
> ORDER BY 11 desc, 4 asc ) ft1
>
>
Received on Wed Apr 13 2005 - 10:44:19 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US