Re: Oracle Text Design Question

From: Giovanni Azua <bravegag_at_hotmail.com>
Date: Wed, 13 Apr 2005 17:01:26 +0200
Message-ID: <3c4r04F6kblndU1@individual.net>

Hi Kevin,

"Kevin Crosbie" <caoimhinocrosbai_at_yahoo.com> wrote
> Doing this now... also rebuilding the index before hand, just to be sure.
> I've taken an export into another schema so I can play with a subset of my
> data (2 Million rows).
>

Please note that the statistics at table level ARE NOT enough. This is because Oracle Text indexes generate lot of auxiliary tables which are queried MANY times under the hood before or even during your query execution, the explain plan of the originating query will not include this. I found out by getting exclusive access to my DB, monitoring all queries and running the Oracle TEXT query, this will evidence all the tiny queries which to me seems are also improved by computing statistics on them.

> I'll try this suggestion out next and let you know how it goes.
>
> It just seems strange to me, I guess I haven't really grasped the overall
> concept of how text indexes work.
> From what I've read in this doc:
> http://www.oracle.com/technology/products/text/x/FAQs/imt_Perf_Faq.html it
> seems that it is designed to work great for single table queries,
especially
> when restricted by rownum.
> What doesn't seem clear to me is how it can work efficiently when joining
to
> other tables or making a query more complex by adding more clauses.
>

I have an application which uses the most expensive features of Oracle text weighted fuzzy and wildcard with boolean logic, single query, many contain clauses, 7+ tables joined to and with the table indexed, tested over more than 300K rows and I get response time ALWAYS lower than a sec.

The major difference we have is that the column where I have the context index on is VARCHAR2 and not CLOB but I can also bet it is not the issue here ... I also had serious performance troubles with Oracle Text even using VARCHAR2 as indexed column.

e.g. 3 contain clauses here ...

select /*+ NOPARALLEL(ft1) */ to_char(ft1.entity_id) as id

, to_char(round(ft1.score)) as score
, ft1.bestmatch
, ft1.name_last    as lastname
, ft1.name_first   as firstname
, ft1.category
, ft1.entry_type   as entrytype
, to_char(ft1.country_id) as country_id
, ft1.ctry_name_english as country
, ft1.list_abbreviations as source
, to_char(ft1.list_readonly) as readonly

from (SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS DOMAIN_INDEX_NO_SORT */

       t2.entity_id
     , t2.real_name_first as name_first
     , t2.real_name_last  as name_last
     , t2.bestmatch
     , t2.enty_category   as category

, t2.enty_country_id as country_id
, t2.enty_entry_type as entry_type

     , t2.ctry_name_english
     , t2.list_readonly
     , t2.list_abbreviations
     , round(( greatest(score_1*decode(occurs_last_flag_1, 1, 10.0, 0)

/greatest(1, CASE WHEN c_name_last_wc >= 3 THEN c_name_last_wc ELSE 3 - c_name_first_wc END , c_name_last_wc)
, score_1*decode(occurs_first_flag_1, 1, 4.0, 0) /greatest(1, CASE WHEN c_name_full_wc > 3 THEN c_name_first_wc ELSE 3 - c_name_last_wc END, c_name_first_wc)) + greatest(score_2*decode(occurs_last_flag_2, 1, 10.0, 0) /greatest(1, CASE WHEN c_name_last_wc >= 3 THEN c_name_last_wc ELSE 3 - c_name_first_wc END , c_name_last_wc)
, score_2*decode(occurs_first_flag_2, 1, 4.0, 0) /greatest(1, CASE WHEN c_name_full_wc > 3 THEN c_name_first_wc ELSE 3 - c_name_last_wc END, c_name_first_wc)) + greatest(score_3*decode(occurs_last_flag_3, 1, 10.0, 0) /greatest(1, CASE WHEN c_name_last_wc >= 3 THEN c_name_last_wc ELSE 3 - c_name_first_wc END , c_name_last_wc)
, score_3*decode(occurs_first_flag_3, 1, 4.0, 0) /greatest(1, CASE WHEN c_name_full_wc > 3 THEN c_name_first_wc ELSE 3 - c_name_last_wc END, c_name_first_wc)) )/GREATEST(DECODE(c_name_last, NULL, 0, 10.0) + DECODE(c_name_first, NULL, 0, 4.0), 1)) as score FROM (
SELECT /*+ NOPARALLEL(t1) NOPARALLEL(t2) FIRST_ROWS DOMAIN_INDEX_NO_SORT */

name_id , max(decode( token_id, 1, score , 0)) score_1
, max(decode( token_id, 1, occurs_last_flag, 0)) occurs_last_flag_1 , max(decode( token_id, 1, decode(occurs_last_flag, 1, 0, occurs_first_flag), 0)) occurs_first_flag_1

 , max(decode( token_id, 2, score           ,  0)) score_2
 , max(decode( token_id, 2, occurs_last_flag,  0)) occurs_last_flag_2
 , max(decode( token_id, 2, decode(occurs_last_flag, 2, 0,
occurs_first_flag), 0)) occurs_first_flag_2
 , max(decode( token_id, 3, score           ,  0)) score_3

, max(decode( token_id, 3, occurs_last_flag, 0)) occurs_last_flag_3 , max(decode( token_id, 3, decode(occurs_last_flag, 3, 0, occurs_first_flag), 0)) occurs_first_flag_3

          FROM (
                SELECT term_id
                     , max(score)    as score
                     , max(token_id) as token_id
                FROM ( select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS
DOMAIN_INDEX_NO_SORT */
        term_id
      , (100*scoreLevenshtein(TERM, 'OSAMA')) as score
      , 1 as token_id

from T_NLM_TERM
where contains(term, 'fuzzy({OSAMA},20,20,weight)', 1) > 0 union all
select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */

        term_id
      , (100*scoreLevenshtein(TERM, 'BIEN')) as score
      , 2 as token_id

from T_NLM_TERM
where contains(term, 'fuzzy({BIEN},20,20,weight)', 2) > 0 union all
select /*+ NOPARALLEL(T_NLM_TERM) FIRST_ROWS DOMAIN_INDEX_NO_SORT */

        term_id
      , (100*scoreLevenshtein(TERM, 'LADIN')) as score
      , 3 as token_id

from T_NLM_TERM
where contains(term, 'fuzzy({LADIN},20,20,weight)', 3) > 0 ) t1

                GROUP BY term_id
              ) t1
            , T_NLM_SEARCHTERM_REL t2
         WHERE t1.term_id=t2.term_id
         GROUP BY name_id
      ) t1
    , (
         SELECT /*+ NOPARALLEL(t2) */ t2.*
         FROM   T_NLM_SEARCH t2

      ) t2

WHERE t1.name_id=t2.name_id
ORDER BY 11 desc, 4 asc ) ft1 Received on Wed Apr 13 2005 - 10:01:26 CDT