Creating Oracle Text Indexes
Date: Wed, 4 Mar 2009 23:20:38 -0000
Message-ID: <MPG.24191d206a824e229896eb_at_News.Individual.NET>
Hi this is on 10gR2 - we have a table with around 240,000 documents
(word, pdf, html).
We have a text index created on it using the following settings
(generated by executing the procedure):
create or replace procedure wd_rebuild_cv_text_index_blob
(p_tablespace in varchar2 default 'WD_I') is
begin
begin
execute immediate 'begin ctx_ddl.drop_preference
(''wd_cand_doc_lob_store''); end; ';
exception
when others then
null;
end;
begin
execute immediate 'drop index wd_cand_doc_lob_text';
exception
when others then
null;
end;
execute immediate 'begin ctx_ddl.create_preference
(''wd_cand_doc_lob_store'', ''BASIC_STORAGE''); end; ';
execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''I_TABLE_CLAUSE'',
''tablespace '||p_tablespace||' storage (initial500m next 10m maxextents 200)''); end; '; execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''K_TABLE_CLAUSE'',
''tablespace '||p_tablespace||' storage (initial1K)''); end; ';
execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''R_TABLE_CLAUSE'',
''tablespace '||p_tablespace||' storage (initial1K)''); end;';
execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''N_TABLE_CLAUSE'',
''tablespace '||p_tablespace||' storage (initial1K)''); end;';
execute immediate 'begin ctx_ddl.set_attribute
(''wd_cand_doc_lob_store'', ''I_INDEX_CLAUSE'',
''tablespace '||p_tablespace||' storage (initial100m next 10m maxextents 500)''); end; '; execute immediate 'create index wd_cand_doc_lob_text
on wd_cand_doc_lob (doc_blob) indextype is ctxsys.context parameters ( ''TRANSACTIONAL SYNC(EVERY "SYSDATE+15/1440") storage wd_cand_doc_lob_store'' )'; exception
when others then
dbms_output.put_line('error');
dbms_output.put_line(sqlerrm(sqlcode)); end;
/
sho err
We have run
SELECT AVG(COUNT(*))
FROM DR$wd_web_pages_imt$I
GROUP BY TOKEN_TEXT HAVING COUNT(*) > 1;
The documentation says anything beyond 10 would be an indication that
you should refresh the index. On the database the score is a staggering
31.8.
Using ctx_report.index_stats we see:
indexed documents: 234,324 allocated docids: 238,845 $I rows: 16,635,689 total size of $I data: 371,884,061 (354.66 MB) $I rows: 16,635,689 estimated $I rows if optimal: 1,976,853 estimated row fragmentation: 88 % garbage docids: 4,521 estimated garbage size: 5,842,571 (5.57 MB)
now this does not look good.
Index query performance is extremely poor.
Could anyone suggest some "best practise" for how we might be able to set the various preferences to create the indexes with minimal fragmentation?
The server on which we are running this is running Linux RHEL4 with 4GB RAM.
-- jeremyReceived on Wed Mar 04 2009 - 17:20:38 CST