Re: index with very high LIO

From: Jeremy Schneider <jeremy.schneider_at_ardentperf.com>
Date: Mon, 21 Nov 2011 13:46:57 -0600
Message-ID: <CA+fnDAb32x-UuJOFJCmXynUpWq7cd4rg5dZk1aM=DtFmgbYVdg_at_mail.gmail.com>



Ok, here are a few useful figures. I think I'm finally doing my math right. For this customer:
  35 MB/s throughput (parallel direct path)   6 ms response times
  400GB tables
time to full-scan the table:
  400GB x 1024 = 409600 MB
  409600 MB / 35 = 11703 sec
  11703 / 60 / 60 = 3.25 hrs

how many rows in 3.25 hrs with range scan (about 1.2 blocks per row)?   6 ms / 1000 = 0.006 sec/block
  1.2 blocks/row * 0.006 = 0.0072 sec/row   11703 sec / 0.0072 = about 1.6 million rows in 3.25 hrs

how many rows in 3.25 hrs with index join (about 4 LIOs per row)?   4 blocks/row * 0.006 = 0.024 sec/row
  11703 sec / 0.024 = about 500,000 rows in 3.25 hrs

I guess there's also a case where you can hash join using the index, but i'm skipping that one for now...



However there's another factor skewing things - it's only the rows *from disk* that count - and this customer has a HUGE buffer cache, and were seeing over 90% BCHR on some of these million-row index-based executions. The direct path FTS will never benefit from the buffer cache. So accounting for this cache effect, the breakpoints become 5 million for the second table and 16 million for the first. This lines up with testing we've done. It seems that index path is still most efficient for the first table. And quite conveniently, as soon as the CBO got cardinality estimates right, it picked the index on the first table and the FTS on the second. :)

Regarding their lack hardware, it's worth pointing out that this is the first and only query I'm aware of to get millions of rows on this system. Also, I believe that the data volume has been growing over the past few months - so it's only recently that this solo query moved close to the 10 million row range. It's a *very* busy transactional system - so total throughput isn't a major point. And 6ms is decent response time.

My initial 30MB/s *was* a rough estimate from elapsed time and data sizes... I was only looking at trace files to verify the measurement... also the first test ran in the middle of the night and I wasn't at the console. :) I reran some small tests today. Had trouble with iptraf; seemed to give me numbers that were clearly wrong... but I did get byte-counts from ifconfig and verified about 30-35 MB/s max throughput on the adapter during a parallel run.

(hrishy - right there were three ways to get the throughput estimates!)

-Jeremy

On Fri, Nov 18, 2011 at 6:12 PM, Greg Rahn <greg_at_structureddata.org> wrote:

> The big problem I see here is simply that the lack hardware
> infrastructure is requiring what I refer to as "unnatural acts of
> engineering" in order to do this extract. While this system may not
> require 10GB/s of scan bandwidth, less than 100MB/s is really
> insufficient for any production system these days -- especially one
> that requires multi hundred GB scans. To put it in perspective, one
> can get more than 100MB/s from the SSD drive in a laptop.
>
> I'd think that partitioning would be a big help here to reduce the
> data required off disk (thinking txns & time series data) but if you
> have to rebuild two 400GB tables and their indexes over that slow
> storage connection to get there.
>
> BTW, I'm not sure why you need trace files from PX servers to
> calculate IO bandwidth - just monitor the network tput or just get a
> rough number from elapsed time and the amount of data read from disk.
>
>

-- 
http://www.ardentperf.com
+1 312-725-9249

Jeremy Schneider
Chicago


--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 21 2011 - 13:46:57 CST

Original text of this message