From cspence@FuelSpot.com Thu, 04 Oct 2001 19:46:36 -0700 From: Christopher Spence Date: Thu, 04 Oct 2001 19:46:36 -0700 Subject: RE: Intermedia Performance Benchmarks anyone ? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: text/plain I looked into SSD, they are like $15-17k / gb. Very expensive. And with Oracle buffering, I would expect the performance wouldn't be huge. There were some solutions that were like $2500 for a 1 gb, but they were not sharable between machines in a cluster. -----Original Message----- To: Multiple recipients of list ORACLE-L Sent: 10/4/01 6:35 PM *excellent* post. thanks. Anyone out there put the indexes and tables on solid state disk? They have ssd up to about 10G and higher, I hear....just curious, not trying to invoke a global listserv discussion on how it "can't work" or "wouldn't be worth it, especially on microsoft platforms", etc. It would be neat to hear about an InterMedia indexing miracle. This really neat tool just sounds WAAAAAAY to slow to scale at this point, which answers a pet question of mine. (Something like "Why do services like 'Ask Jeeves' suck so hard?") In Love and Peas, etc. -----Original Message----- Sent: Thursday, October 04, 2001 5:47 PM To: Multiple recipients of list ORACLE-L Martin, We use interMedia Text to index and query up to about 10-15 million CLOB documents (up to 5KB each). We're on 8.1.6.0.0 under Win2k - 2 550MHz CPUs, 2GB RAM, 18 36GB drives. Because a domain index cannot be partitioned, we have the documents spread across 5 tables (on 6 drives). One is a 2 partition table (each partition on its own drive) containing the current two months of docs, the other 4 hold the 4 prior months' docs. We can query the entire 6 months of docs via a Union View on them - even Contains() queries work fine on this view. When we add a new month's partition, the prior month's partition gets turned into a table (segment exchange). The interMedia Text indexes on the partitioned table and the new prior month are rebuilt. Lately we've been getting about 3.5 million docs/month and the index rebuild takes about 7 hours - that's 7 hrs. for the index on the prior month and 7 more hours for the index on the partitioned table, which only contains one month of docs at that point. Since we're adding docs every day, we sync the interMedia index every morning. Last night we added about 200,000 docs and it took about 3 hours for the index to resync. We don't use ctxsrv, but use CTX_DDL.Sync_Index. When we get over about 4.5 million docs in a table, the resync really slows down. The in-memory part still happens at about 150 docs/sec, but when interMedia writes to disk it slows down a bunch. What took 3 hours today will take 10 hours in a couple of weeks. That's why I plan on spreading the DR$<>$I segment across multiple drives by spreading the datafiles of its tablespace across those drives. BTW, that brings up some performance points - be sure you cache the DR$<>$R segment (use CACHE not CACHE READS, due to bugs in Oracle): Alter Table DR$$R Modify LOB (Data) (Cache) ; Also ensure that your LOBs are out-of-line and stored in their own segment(s) on drive(s) separate from the "regular" data. Make sure that your I_TABLE_CLAUSE, R_TABLE_CLAUSE, and I_INDEX_CLAUSE all specify tablespaces on their own drives to spread the I/O out even further. We're getting 2GB more RAM on a new server, so I plan on caching the 900MB DR$<>$X segment, which is the index on the DR$<>$I token table. I've learned a lot about how interMedia Text processes different kinds of queries by watching disk I/O on Win2k's Performance Monitor while I issue various "flavors". Our folks use lots of complex query terms with heavy use of the Stemmer. I've gotten them to switch from using tons of ORs to using the Equivalence operator and we're getting much better results using NEAR than simple ANDs. Performance is very good, with CONTAINS queries returning results in less than a second for terms that are rare in the docs, up to a minute for terms that are common in lots (e.g. hundreds of thousands) of docs. If you're going to do synonym searches, you'd better start looking for a good thesaurus - the one Oracle ships is pretty limited. We've not found a good one for the technical lingo our docs contain, so we don't do ABOUT queries at this time. Get familiar with CTX_Query.Explain, it will help you understand things like what the Stemmer *really* does and how complex queries are parsed. Hope this helps. Jack -------------------------------- Jack C. Applewhite Database Administrator/Developer OCP Oracle8 DBA iNetProfit, Inc. Austin, Texas www.iNetProfit.com japplewhite@inetprofit.com (512)327-9068 -----Original Message----- Kendall Sent: Thursday, October 04, 2001 10:00 AM To: Multiple recipients of list ORACLE-L Hello all, Although I have installed Intermedia as part of my general DBA duties before I have not experienced any particular requirements on throughput rate or indexing. I need some information on being able to deal with large volumes of product data (e.g. 1 million products in a retail application) and be able to perform 'intelligent' searches against the metadata (things like typographical error matching, synonyms etc.) as well as the more usual parametric search (i.e. advanced search page with lots of metadata specific fields). Indexing time and max throughput are also of interest. Any data based on experience would be appreciated. Thanks Martin -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Jack C. Applewhite INET: japplewhite@inetprofit.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru@fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Mohan, Ross INET: MohanR@STARS-SMI.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru@fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). -- Please see the official ORACLE-L FAQ: http://www.orafaq.com -- Author: Christopher Spence INET: cspence@FuelSpot.com Fat City Network Services -- (858) 538-5051 FAX: (858) 538-5051 San Diego, California -- Public Internet access / Mailing Lists -------------------------------------------------------------------- To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru@fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).