Creating Oracle Text Indexes

From: Jeremy <jeremy0505_at_gmail.com>
Date: Wed, 4 Mar 2009 23:20:38 -0000
Message-ID: <MPG.24191d206a824e229896eb_at_News.Individual.NET>



Hi this is on 10gR2 - we have a table with around 240,000 documents
(word, pdf, html).

We have a text index created on it using the following settings
(generated by executing the procedure):


create or replace procedure wd_rebuild_cv_text_index_blob

    (p_tablespace in varchar2 default 'WD_I') is

begin

  begin
    execute immediate 'begin ctx_ddl.drop_preference
(''wd_cand_doc_lob_store''); end; ';

  exception
    when others then
      null;
  end;

  begin
    execute immediate 'drop index wd_cand_doc_lob_text';   exception
    when others then
      null;
  end;

  execute immediate 'begin ctx_ddl.create_preference
(''wd_cand_doc_lob_store'', ''BASIC_STORAGE''); end; ';

  execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''I_TABLE_CLAUSE'',

                        ''tablespace '||p_tablespace||' storage (initial 
500m next 10m maxextents 200)''); end; ';   execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''K_TABLE_CLAUSE'',
                        ''tablespace '||p_tablespace||' storage (initial 
1K)''); end; ';
  execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''R_TABLE_CLAUSE'',
                        ''tablespace '||p_tablespace||' storage (initial 
1K)''); end;';
  execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''N_TABLE_CLAUSE'',
                        ''tablespace '||p_tablespace||' storage (initial 
1K)''); end;';
  execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''I_INDEX_CLAUSE'',
                        ''tablespace '||p_tablespace||' storage (initial 
100m next 10m maxextents 500)''); end; ';   execute immediate 'create index wd_cand_doc_lob_text
                     on wd_cand_doc_lob (doc_blob)
                     indextype is ctxsys.context
                     parameters ( ''TRANSACTIONAL SYNC(EVERY "SYSDATE+
15/1440") storage wd_cand_doc_lob_store'' )'; exception
  when others then
    dbms_output.put_line('error');
    dbms_output.put_line(sqlerrm(sqlcode)); end;
/
sho err

We have run

SELECT AVG(COUNT(*))
FROM DR$wd_web_pages_imt$I
GROUP BY TOKEN_TEXT HAVING COUNT(*) > 1;   The documentation says anything beyond 10 would be an indication that you should refresh the index. On the database the score is a staggering 31.8.

Using ctx_report.index_stats we see:

indexed documents:                                            234,324
allocated docids:                                             238,845
$I rows:                                                   16,635,689

total size of $I data:                        371,884,061 (354.66 MB)

$I rows:                                                   16,635,689
estimated $I rows if optimal:                               1,976,853
estimated row fragmentation:                                     88 %

garbage docids:                                                 4,521
estimated garbage size:                           5,842,571 (5.57 MB)


now this does not look good.

Index query performance is extremely poor.

Could anyone suggest some "best practise" for how we might be able to set the various preferences to create the indexes with minimal fragmentation?

The server on which we are running this is running Linux RHEL4 with 4GB RAM.

-- 
jeremy
Received on Wed Mar 04 2009 - 17:20:38 CST

Original text of this message