RE: Intermedia Performance Benchmarks anyone ?

From: Jack C. Applewhite <japplewhite_at_inetprofit.com>
Date: Thu, 04 Oct 2001 13:46:37 -0700
Message-ID: <F001.003A2832.20011004134703@fatcity.com>

Martin,

We use interMedia Text to index and query up to about 10-15 million CLOB documents (up to 5KB each). We're on 8.1.6.0.0 under Win2k - 2 550MHz CPUs, 2GB RAM, 18 36GB drives.

Because a domain index cannot be partitioned, we have the documents spread across 5 tables (on 6 drives). One is a 2 partition table (each partition on its own drive) containing the current two months of docs, the other 4 hold the 4 prior months' docs. We can query the entire 6 months of docs via a Union View on them - even Contains() queries work fine on this view.

When we add a new month's partition, the prior month's partition gets turned into a table (segment exchange). The interMedia Text indexes on the partitioned table and the new prior month are rebuilt.

Lately we've been getting about 3.5 million docs/month and the index rebuild takes about 7 hours - that's 7 hrs. for the index on the prior month and 7 more hours for the index on the partitioned table, which only contains one month of docs at that point.

Since we're adding docs every day, we sync the interMedia index every morning. Last night we added about 200,000 docs and it took about 3 hours for the index to resync. We don't use ctxsrv, but use CTX_DDL.Sync_Index.

When we get over about 4.5 million docs in a table, the resync really slows down. The in-memory part still happens at about 150 docs/sec, but when interMedia writes to disk it slows down a bunch. What took 3 hours today will take 10 hours in a couple of weeks.

That's why I plan on spreading the DR$<>$I segment across multiple drives by spreading the datafiles of its tablespace across those drives.

BTW, that brings up some performance points - be sure you cache the DR$<>$R segment (use CACHE not CACHE READS, due to bugs in Oracle):

Alter Table DR$<YourIndexName>$R Modify LOB (Data) (Cache) ;

Also ensure that your LOBs are out-of-line and stored in their own segment(s) on drive(s) separate from the "regular" data. Make sure that your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify tablespaces on their own drives to spread the I/O out even further. We're getting 2GB more RAM on a new server, so I plan on caching the 900MB DR$<>$X segment, which is the index on the DR$<>$I token table.

I've learned a lot about how interMedia Text processes different kinds of queries by watching disk I/O on Win2k's Performance Monitor while I issue various "flavors". Our folks use lots of complex query terms with heavy use of the Stemmer. I've gotten them to switch from using tons of ORs to using the Equivalence operator and we're getting much better results using NEAR than simple ANDs. Performance is very good, with CONTAINS queries returning results in less than a second for terms that are rare in the docs, up to a minute for terms that are common in lots (e.g. hundreds of thousands) of docs.

If you're going to do synonym searches, you'd better start looking for a good thesaurus - the one Oracle ships is pretty limited. We've not found a good one for the technical lingo our docs contain, so we don't do ABOUT queries at this time.

Get familiar with CTX_Query.Explain, it will help you understand things like what the Stemmer *really* does and how complex queries are parsed.

Hope this helps.

Jack

Jack C. Applewhite
Database Administrator/Developer
OCP Oracle8 DBA
iNetProfit, Inc.
Austin, Texas
www.iNetProfit.com
japplewhite_at_inetprofit.com
(512)327-9068

-----Original Message-----
Kendall
Sent: Thursday, October 04, 2001 10:00 AM To: Multiple recipients of list ORACLE-L

Hello all,

Although I have installed Intermedia as part of my general DBA duties before I have not experienced any particular requirements on throughput rate or indexing.

I need some information on being able to deal with large volumes of product data (e.g. 1 million products in a retail application) and be able to perform 'intelligent' searches against the metadata (things like typographical error matching, synonyms etc.) as well as the more usual parametric search (i.e. advanced search page with lots of metadata specific fields).

Indexing time and max throughput are also of interest.

Any data based on experience would be appreciated.

Thanks

Martin

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.com
-- 
Author: Jack C. Applewhite
  INET: japplewhite_at_inetprofit.com

Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L


(or the name of mailing list you want to be removed from).  You may

also send the HELP command for other information (like subscribing).

Received on Thu Oct 04 2001 - 15:46:37 CDT