Re: index with very high LIO

From: Greg Rahn <greg_at_structureddata.org>
Date: Thu, 17 Nov 2011 21:32:45 -0800
Message-ID: <CAGXkmiu=+WGj_bOxFxottLWw+Y_f5C=7q=rqn0Svaj-E6RGrPg_at_mail.gmail.com>



So a few more comments:
- Even if you had perfect clustering factor, I still think a PX + FTS + HJ is a much better solution for this type/size of problem. Would you really want to get 7 million rows one rowid at a time? - 20MB/s is about 1/10 of what 1 single process (serial execution) can do on a modern processor -- for CPU light operations like count(*) with FTS, you should be ~200MB/s. Need to investigate why that is so low.
- On a modern system (say any 5400-5600 Xeon system) this query should not run 10 hours, I'd wager it should take less than 10 minutes using PX.
- Is BT1.LAST_MODIFIED_DATETIME the range partition key on this table (it probably should be if it's a frequent filter)? Daily partitions might work quite nicely here.
- Your query uses November dates of LAST_MODIFIED_DATETIME, yet your stats are from 06-aug-2011. I probably would not hurt to refresh those - they are 2 months old.

On Thu, Nov 17, 2011 at 12:06 PM, Jeremy Schneider <jeremy.schneider_at_ardentperf.com> wrote:
> Yeah, you're not the first one to point that out...  I'm doing some
> tests now.  FWIW it's 7 million out of over a billion rows that are all
> stuffed into a 400GB heap table with the worst possible clustering
> factor on the foreign key we're using to join.  Also I haven't been able
> to get more than 20MB/s total throughput with parallel full scans,
> although single-block reads go at a somewhat decent rate of 5ms.  But
> all the same, I'm checking out the hash join... likely it will turn out
> to be more efficient.
>
> I should have checked this out more in-depth awhile ago...  I got a
> little carried away with maths and mistakenly thought that it was
> totally unrealistic.  This is the problem with ruling things out based
> on "calculations".  There's little to verify your correctness.  It's
> much easier to find your mistakes when you rule things out with
> *testing* instead...  I should know this by now, for some reason I'm
> still making the occasional poor judgements in the heat of the moment.  :)
>
> -Jeremy
>
>
> On 11/17/2011 01:24 PM, Greg Rahn wrote:
>> Am I reading this right?  7 million rows joined using NESTED LOOPS
>> join?  Wouldn't a parallel hash join be better/faster?
>>
>> On Thu, Nov 17, 2011 at 3:34 AM, Jeremy Schneider
>> <jeremy.schneider_at_ardentperf.com> wrote:
>>> PS... 8k block size; segment of index in question is 100G.  Query is
>>> pulling 7 million rows from a join of two billion row tables...
>

-- 
Regards,
Greg Rahn
http://structureddata.org
--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 17 2011 - 23:32:45 CST

Original text of this message