Re: index with very high LIO

From: hrishy <>
Date: Sat, 19 Nov 2011 11:29:47 +0000 (GMT)
Message-ID: <>

Hi Greg,Jeremy
Slightly offtopic but i am curious to know how to measure that the throughput is 20Mb/sec or 30Mb/sec

 From: Jeremy Schneider <> To: Greg Rahn <> Cc: "" <> Sent: Saturday, 19 November 2011, 0:16
Subject: Re: index with very high LIO  

The reason for 20MB/s is the storage network.  They're using DNFS with a single (bonded-backup-mode) 1G network card.  (FYI, tcp window sizes are 16M.)  Last night I ran a parallel(64) query and I think that I was still only getting something around 30 MB/s throughput w/direct path... but it's tough to piece together 64 tracefiles to get an accurate total throughput measure.  (By another measure I saw 60 MB/s but I think that might have included some cache effect...)  At any rate the best theoretical possible on a one gig card is around 100MB/s.  I don't think it's possible to full scan two 400 GB tables over a 1GB network link in 10 minutes, is it?  Also, this is not a warehouse system... it's *heavily* operational, and the operational workload is running satisfactorily with 1G cards.  In fact this query is trying to extract a day's worth of data to get it INTO a warehouse.

I do think the slow link skews the usual points where it becomes beneficial to switch from index to hash methods.  I'd wager that on this system, a couple million rows were probably faster with the index. (Whereas generally a hash might win for this many rows.)  Recently this query has inched toward 10 million, and perhaps now a hash join is going to be better -- but it's still going to take many hours.

But I thinking that I've finally started to inch toward the real problem: rollbacks from other activity on the system -- in fact this could also explain why recently, the problem seemed to accellerate exponentially.  This is why I originally asked about the 13 gets/row on the index... and I think Jonathan's pointer got me moving in the right direction.  In fact, we would still have the same problem with a hash join - because it's still going to take several hours on this busy system with only a 1GB storage network.

I'm thinking that the real answer is to use shorter-running queries... something as simple as getting 2-4 hours of data at a time instead of 24 hours at once could in fact completely resolve this problem.  Which interestingly, might switch us back to nested loop joins again since we're going to be getting a lot less data at once.

Still in progress... so additional thoughts (from anyone on the list) are welcomed.


PS... we have also been discussing stats and partitioning internally... I just view the above issues as primary.

+1 312-725-9249

Jeremy Schneider

On 11/17/2011 11:32 PM, Greg Rahn wrote:

> So a few more comments:
> - Even if you had perfect clustering factor, I still think a PX + FTS
> + HJ is a much better solution for this type/size of problem.  Would
> you really want to get 7 million rows one rowid at a time?
> - 20MB/s is about 1/10 of what 1 single process (serial execution) can
> do on a modern processor -- for CPU light operations like count(*)
> with FTS, you should be ~200MB/s.  Need to investigate why that is so
> low.
> - On a modern system (say any 5400-5600 Xeon system) this query should
> not run 10 hours, I'd wager it should take less than 10 minutes using
> PX.
> - Is BT1.LAST_MODIFIED_DATETIME the range partition key on this table
> (it probably should be if it's a frequent filter)?  Daily partitions
> might work quite nicely here.
> - Your query uses November dates of  LAST_MODIFIED_DATETIME, yet your
> stats are from 06-aug-2011.  I probably would not hurt to refresh
> those - they are 2 months old.
> On Thu, Nov 17, 2011 at 12:06 PM, Jeremy Schneider
> <> wrote:
>> Yeah, you're not the first one to point that out...  I'm doing some >> tests now.  FWIW it's 7 million out of over a billion rows that are all >> stuffed into a 400GB heap table with the worst possible clustering >> factor on the foreign key we're using to join.  Also I haven't been able >> to get more than 20MB/s total throughput with parallel full scans, >> although single-block reads go at a somewhat decent rate of 5ms.  But >> all the same, I'm checking out the hash join... likely it will turn out >> to be more efficient. >> >> I should have checked this out more in-depth awhile ago...  I got a >> little carried away with maths and mistakenly thought that it was >> totally unrealistic.  This is the problem with ruling things out based >> on "calculations".  There's little to verify your correctness.  It's >> much easier to find your mistakes when you rule things out with >> *testing* instead...  I should know this by now, for some reason I'm >> still making the occasional poor judgements in the heat of the moment.  :) >> >> -Jeremy >> >> >> On 11/17/2011 01:24 PM, Greg Rahn wrote: >>> Am I reading this right?  7 million rows joined using NESTED LOOPS >>> join?  Wouldn't a parallel hash join be better/faster? >>> >>> On Thu, Nov 17, 2011 at 3:34 AM, Jeremy Schneider >>> <> wrote: >>>> PS... 8k block size; segment of index in question is 100G.  Query is >>>> pulling 7 million rows from a join of two billion row tables... > > -- --
Received on Sat Nov 19 2011 - 05:29:47 CST

Original text of this message