Re: Disk Device Busy (%) - What exactly is this?

From: Karl Arao <karlarao_at_gmail.com>
Date: Mon, 21 Nov 2011 11:25:25 -0600
Message-ID: <CACNsJneV41Em8VXxDk6XSY5xVZ9PTk6BL8O9+n0RLUMF7vEmig_at_mail.gmail.com>



To add on this blog link, if you have collectl installed somewhere there's a file called formatit.ph that contains all the formatting/formulas that collectl is using.. there's a section where the device busy % is derived  ($dskUtil)
[root_at_desktopserver ~]# locate formatit.ph /usr/share/collectl/formatit.ph
[root_at_desktopserver ~]# less /usr/share/collectl/formatit.ph

....

      # we only need these if doing individual disk calculations
      if ($subsys=~/D/)
      {
        # if doing hires time, we need the interval duration and
unfortunately at
        # this point in time $intSecs has not been set so we can't use it

$microInterval=($fullTime-$lastSecs[$rawPFlag])*100 if
$hiResFlag;

$numIOs=$dskRead[$dskIndex]+$dskWrite[$dskIndex];
$dskRqst[$dskIndex]= $numIOs ?
($dskReadKB[$dskIndex]+$dskWriteKB[$dskIndex])/$numIOs : 0;

$dskQueLen[$dskIndex]=

$dskWeighted[$dskIndex]/$microInterval*$HZ/1000;

$dskWait[$dskIndex]= $numIOs ?

($dskReadTicks[$dskIndex]+$dskWriteTicks[$dskIndex])/$numIOs : 0;

$dskSvcTime[$dskIndex]=$numIOs ? $dskTicks[$dskIndex]/$numIOs : 0;
$dskUtil[$dskIndex]= $dskTicks[$dskIndex]*10/$microInterval;
}

....

if you are troubleshooting a "slow IO", you also need to consider and correlate the service times of the SAN, oracle datafiles, and the session IO service times... of course you need to sample them in a consistent and fine grained manner, I would do 5secs interval for all the 3 subsystems - SAN -> iostat -xnc 1 100000 | while read line; do echo "`date +%T`" "$line" ; done >> iostat_1.txt
- datafiles ->
https://www.dropbox.com/s/jzcl5ydt29mvw69/PerformanceAndTroubleshooting/filestat.sql - session - > _at_snapper ash=sql_id+sid+event+wait_class+module+service,stats 5 5 sid=<sid>

I had a recent scenario on Solaris M5000/9000 where the SAN (Symmetrix) and datafiles are on the 10-60ms range and the oracle sessions are doing slow IO and having around 900ms to 1sec service times, well that issue is related to CPU scheduling (they have a really high load avg) and sessions spinning on vxfslocks (due to concurrent IO not set).. but that is something you have to keep in mind on the IO troubleshooting, the response time of the kernel mode calls down to the low-level components (not preempted) + the response time of the user mode calls (session IO - not being serviced properly because of preemption brought by scheduling/lock issues).

Here's the sample distribution of that scenario http://karlarao.tiddlyspot.com/#%5B%5Bavg%20latency%20issue%5D%5D

-- 
Karl Arao
karlarao.wordpress.com
karlarao.tiddlyspot.com


--
http://www.freelists.org/webpage/oracle-l
Received on Mon Nov 21 2011 - 11:25:25 CST

Original text of this message