Re: Disk Device Busy (%) - What exactly is this?
Date: Mon, 21 Nov 2011 11:25:25 -0600
Message-ID: <CACNsJneV41Em8VXxDk6XSY5xVZ9PTk6BL8O9+n0RLUMF7vEmig_at_mail.gmail.com>
To add on this blog link, if you have collectl installed somewhere there's a file called formatit.ph that contains all the formatting/formulas that collectl is using.. there's a section where the device busy % is derived ($dskUtil)
[root_at_desktopserver ~]# locate formatit.ph /usr/share/collectl/formatit.ph
[root_at_desktopserver ~]# less /usr/share/collectl/formatit.ph
....
# we only need these if doing individual disk calculations if ($subsys=~/D/) { # if doing hires time, we need the interval duration and unfortunately at # this point in time $intSecs has not been set so we can't use it$hiResFlag;
$microInterval=($fullTime-$lastSecs[$rawPFlag])*100 if
$numIOs=$dskRead[$dskIndex]+$dskWrite[$dskIndex];
$dskRqst[$dskIndex]= $numIOs ?
($dskReadKB[$dskIndex]+$dskWriteKB[$dskIndex])/$numIOs : 0;
$dskQueLen[$dskIndex]=
$dskWeighted[$dskIndex]/$microInterval*$HZ/1000;
$dskWait[$dskIndex]= $numIOs ?
($dskReadTicks[$dskIndex]+$dskWriteTicks[$dskIndex])/$numIOs : 0;
$dskSvcTime[$dskIndex]=$numIOs ? $dskTicks[$dskIndex]/$numIOs : 0;
$dskUtil[$dskIndex]= $dskTicks[$dskIndex]*10/$microInterval;
}
....
if you are troubleshooting a "slow IO", you also need to consider and
correlate the service times of the SAN, oracle datafiles, and the session
IO service times... of course you need to sample them in a consistent and
fine grained manner, I would do 5secs interval for all the 3 subsystems
- SAN -> iostat -xnc 1 100000 | while read line; do echo "`date +%T`"
"$line" ; done >> iostat_1.txt
- datafiles ->
https://www.dropbox.com/s/jzcl5ydt29mvw69/PerformanceAndTroubleshooting/filestat.sql
- session - > _at_snapper ash=sql_id+sid+event+wait_class+module+service,stats
5 5 sid=<sid>
I had a recent scenario on Solaris M5000/9000 where the SAN (Symmetrix) and datafiles are on the 10-60ms range and the oracle sessions are doing slow IO and having around 900ms to 1sec service times, well that issue is related to CPU scheduling (they have a really high load avg) and sessions spinning on vxfslocks (due to concurrent IO not set).. but that is something you have to keep in mind on the IO troubleshooting, the response time of the kernel mode calls down to the low-level components (not preempted) + the response time of the user mode calls (session IO - not being serviced properly because of preemption brought by scheduling/lock issues).
Here's the sample distribution of that scenario http://karlarao.tiddlyspot.com/#%5B%5Bavg%20latency%20issue%5D%5D
-- Karl Arao karlarao.wordpress.com karlarao.tiddlyspot.com -- http://www.freelists.org/webpage/oracle-lReceived on Mon Nov 21 2011 - 11:25:25 CST