Re: Creating Oracle Text Indexes

From: Jeremy <jeremy0505_at_gmail.com>
Date: Mon, 9 Mar 2009 15:34:12 -0000
Message-ID: <MPG.241f47423721407e9896ec_at_News.Individual.NET>



In article <adff2a9a-8424-4dba-a8cc-
43323e82c21e_at_a12g2000yqm.googlegroups.com>, vladimir.zakharychev_at_gmail.com says...>

> First of all, the stats from SELECT AVG(...) seem to be for a
> different index - WD_WEB_PAGES_IMT.

sorry that was a typo - we definitely collected them for the correct index :)

> Not sure what you ran
> ctx_report.index_stats() against, but if you just built your index,
> you doubtly would end up with fragmentation and garbage percentage
> like cited. For example, there should be zero, or around zero, garbage
> docids for new index. I don't think the issue is with index
> parameters. At least your index parameters seem fine to me, except the
> part where you limit number of extents for $I table and its index
> (what if one day your index will grow beyond that size?) and specify
> initial extent sizes of 1K (Oracle will allocate more anyway,) but
> storage does not affect text index fragmentation anyway.

The index took 7 hours to recreate. Straight after (well 6 hours or so when there may have been maybe 20 or 30 docs added) the fragmentation is reported as:

total size of $I data:                          369,940,540 (352.80 MB)

$I rows:                                                      4,171,800
estimated $I rows if optimal:                                 1,977,258
estimated row fragmentation:                                       53 %

garbage docids:                                                       3
estimated garbage size:                                 3,921 (3.83 KB)



>
> Did you consider running ctx_ddl.optimize_index() at REBUILD or FULL
> level regularly (especially if your data is volatile and index
> fragmentation/garbage percentage grows quickly?) Rebuilding the index
> by dropping and creating it every so often makes it unavailable for
> the duration of "rebuild", which might not please your users, and
> consumes a lot of resources (mostly CPU, because Text needs to put
> every document being indexed through external filter program and then
> parse and process resulting XML); ctx_ddl.optimize_index
> (index_name,'REBUILD') is online and is much cheaper.
>

We are on Standard Edition and so online index DDL operations are not supported (at least to my knowledge). We can make the index unavailable overnight so that is not an issue.

On this rebuild we used the setting "memory 64m" in the parameters clause. I would be very interested to see any information which discusses what optimal values of the "memory" parameter might be?

-- 
jeremy
Received on Mon Mar 09 2009 - 10:34:12 CDT

Original text of this message