Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Locating a bad disk on Solaris w/Veritas

Re: Locating a bad disk on Solaris w/Veritas

From: Steve <ThisOne_at_Aint.valid>
Date: Tue, 26 Oct 2004 19:03:47 +1300
Message-ID: <clkpc3$uon$1@lust.ihug.co.nz>


Steve B wrote:
> Hi all,
>
> I posted a few weeks ago about some intermittent/inconsistent
> performance problems we were having for queries that didn't make sense
> compared to the reported i/o usage for these queries. I've since
> learned how to use the extended trace facility (10046/12), and I've
> learned more about the problem, but I've run up against another wall.
>
> What I've found is that our app is spending an inordinate amount of
> time fetching just one block from the disk. For example, in some slow
> running candidates, I've seen lines like this:
>
> PARSING IN CURSOR #4 len=1125 dep=0 uid=24 oct=3 lid=24
> tim=678997477414 hv=3589042826 ad='cceb3b00'
> SELECT /*+ ORDERED */sc.IdentifierId, sc.Coefficient,
> fact.ExposureAnalysisId, DataSet FROM ExposureToolModel etm ,
> ExposureTool et , ExposureAnalysis fact, ExposureAnalysisDat
> a ead , ScalarCoefficient sc WHERE fact.ExposureToolId = 183 AND
> fact.RecipeId = 28959 AND fact.MaskId = 23514 AND fact.LayerId = 4130
> AND fact.RouteId = 27327 AND fact.Operat
> ionId = 51086 AND fact.LotId IN (SELECT l.LotId FROM dual fact, Lot l
> WHERE l.DeviceId = 6332 AND l.TechnologyId = 4049 ) AND
> fact.ExposureToolId = et.ExposureToolId (+) AND e
> t.ExposureToolModelId = etm.ExposureToolModelId (+) AND
> etm.ExposureToolType= 'scanner' AND et.ExposureToolId = 183 AND
> etm.VendorMake= 'asml' AND etm.VendorModel= 'AT/750B' AN
> D fact.ExposeTime + 1 - 1 >= '2004.08.24 11:17:34' AND fact.ExposeTime
> + 1 - 1 <= '2004.10.23 11:17:34' AND fact.ExposeHash >= 1698 AND
> fact.ExposeHash <= 1758 AND fact.Exposur
> eAnalysisId = ead.ExposureAnalysisId AND fact.FileSpec= 'Feedback' AND
> fact.ScriptCategoryId = 100 AND fact.ProcessStatus = 'D' AND
> ead.DataSet IS NULL AND sc.DataSetId = ead.D
> ataSetId ORDER BY fact.ExposeTime desc, fact.ExposureAnalysisId desc
> END OF STMT
> PARSE #4:c=50000,e=46837,p=0,cr=9,cu=0,mis=1,r=0,dep=0,og=4,tim=678997477399
> WAIT #4: nam='SQL*Net message to client' ela= 4 p1=1413697536 p2=1
> p3=0
> WAIT #4: nam='SQL*Net message from client' ela= 1110 p1=1413697536
> p2=1 p3=0
> EXEC #4:c=0,e=1135,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=4,tim=678997480425
> WAIT #4: nam='SQL*Net message to client' ela= 9 p1=1413697536 p2=1
> p3=0
> *** 2004-10-23 11:18:10.028
> ****** LOOK HERE *****
> WAIT #4: nam='db file sequential read' ela= 15714497 p1=4 p2=94824
> p3=1
> ****** LOOK HERE *****
> WAIT #4: nam='db file sequential read' ela= 9172 p1=3 p2=404516 p3=1
> WAIT #4: nam='db file sequential read' ela= 9519 p1=4 p2=198168 p3=1
> WAIT #4: nam='db file sequential read' ela= 8957 p1=3 p2=823622 p3=1
> WAIT #4: nam='db file sequential read' ela= 3159 p1=4 p2=857461 p3=1
> WAIT #4: nam='db file sequential read' ela= 7666 p1=3 p2=415264 p3=1
> WAIT #4: nam='db file sequential read' ela= 10128 p1=4 p2=849608 p3=1
> WAIT #4: nam='db file sequential read' ela= 1123429 p1=3 p2=1082960
> p3=1
> WAIT #4: nam='db file sequential read' ela= 7294 p1=3 p2=825174 p3=1
> WAIT #4: nam='db file sequential read' ela= 13880 p1=4 p2=857419 p3=1
> WAIT #4: nam='db file sequential read' ela= 9227 p1=3 p2=415334 p3=1
> WAIT #4: nam='db file sequential read' ela= 21533 p1=4 p2=853244 p3=1
> WAIT #4: nam='db file sequential read' ela= 7562 p1=3 p2=1084013 p3=1
> WAIT #4: nam='db file sequential read' ela= 9635 p1=3 p2=785154 p3=1
> WAIT #4: nam='db file sequential read' ela= 17597 p1=4 p2=198519 p3=1
> WAIT #4: nam='db file sequential read' ela= 10863 p1=4 p2=857465 p3=1
> WAIT #4: nam='db file sequential read' ela= 26288 p1=3 p2=415301 p3=1
> WAIT #4: nam='db file sequential read' ela= 21145 p1=4 p2=852599 p3=1
> WAIT #4: nam='db file sequential read' ela= 14026 p1=3 p2=1084347 p3=1
> WAIT #4: nam='db file sequential read' ela= 9997 p1=3 p2=1185112 p3=1
> FETCH #4:c=30000,e=17079350,p=20,cr=222,cu=0,mis=0,r=1,dep=0,og=4,tim=679014560511
> WAIT #4: nam='SQL*Net message from client' ela= 1838 p1=1413697536
> p2=1 p3=0
> WAIT #4: nam='SQL*Net message to client' ela= 3 p1=1413697536 p2=1
> p3=0
> FETCH #4:c=0,e=286,p=0,cr=0,cu=0,mis=0,r=32,dep=0,og=4,tim=679014572292
> WAIT #4: nam='SQL*Net message from client' ela= 8478 p1=1413697536
> p2=1 p3=0
> WAIT #4: nam='SQL*Net message to client' ela= 3 p1=1413697536 p2=1
> p3=0
> FETCH #4:c=0,e=279,p=0,cr=0,cu=0,mis=0,r=31,dep=0,og=4,tim=679014590654
> WAIT #4: nam='SQL*Net message from client' ela= 10568 p1=1413697536
> p2=1 p3=0
> WAIT #4: nam='SQL*Net message to client' ela= 3 p1=1413697536 p2=1
> p3=0
> FETCH #4:c=0,e=282,p=0,cr=0,cu=0,mis=0,r=32,dep=0,og=4,tim=679014601772
> WAIT #4: nam='SQL*Net message from client' ela= 10105 p1=1413697536
> p2=1 p3=0
> WAIT #4: nam='SQL*Net message to client' ela= 3 p1=1413697536 p2=1
> p3=0
> FETCH #4:c=0,e=150,p=0,cr=0,cu=0,mis=0,r=4,dep=0,og=4,tim=679014612292
> WAIT #4: nam='SQL*Net message from client' ela= 37155 p1=1413697536
> p2=1 p3=0
> XCTEND rlbk=0, rd_only=1
> STAT #4 id=1 cnt=100 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=222 r=20
> w=0 time=17079583 us)'
> STAT #4 id=2 cnt=100 pid=1 pos=1 obj=6633 op='TABLE ACCESS BY INDEX
> ROWID SCALARCOEFFICIENT (cr=222 r=20 w=0 time=17078134 us)'
> STAT #4 id=3 cnt=111 pid=2 pos=1 obj=0 op='NESTED LOOPS (cr=212 r=13
> w=0 time=15895139 us)'
> STAT #4 id=4 cnt=10 pid=3 pos=1 obj=0 op='NESTED LOOPS (cr=180 r=8
> w=0 time=15812944 us)'
> STAT #4 id=5 cnt=10 pid=4 pos=1 obj=0 op='HASH JOIN (cr=148 r=0 w=0
> time=15006 us)'
> STAT #4 id=6 cnt=10 pid=5 pos=1 obj=9661 op='TABLE ACCESS BY INDEX
> ROWID EXPOSUREANALYSIS (cr=117 r=0 w=0 time=7222 us)'
> STAT #4 id=7 cnt=73 pid=6 pos=1 obj=0 op='BITMAP CONVERSION TO ROWIDS
> (cr=44 r=0 w=0 time=5674 us)'
> STAT #4 id=8 cnt=1 pid=7 pos=1 obj=0 op='BITMAP AND (cr=44 r=0 w=0
> time=5542 us)'
> STAT #4 id=9 cnt=1 pid=8 pos=1 obj=30418 op='BITMAP INDEX SINGLE VALUE
> DN9EXPOSUREANALYSIS (cr=4 r=0 w=0 time=104 us)'
> STAT #4 id=11 cnt=24 pid=5 pos=2 obj=0 op='HASH JOIN (cr=31 r=0 w=0
> time=5220 us)'
> STAT #4 id=12 cnt=1 pid=11 pos=1 obj=6592 op='TABLE ACCESS BY INDEX
> ROWID EXPOSURETOOL (cr=2 r=0 w=0 time=52 us)'
> STAT #4 id=13 cnt=1 pid=12 pos=1 obj=7493 op='INDEX UNIQUE SCAN
> PKEXPOSURETOOL (cr=1 r=0 w=0 time=29 us)'
> STAT #4 id=14 cnt=24 pid=11 pos=2 obj=0 op='MERGE JOIN CARTESIAN
> (cr=29 r=0 w=0 time=1884 us)'
> STAT #4 id=15 cnt=24 pid=14 pos=1 obj=0 op='VIEW (cr=27 r=0 w=0
> time=1378 us)'
> STAT #4 id=16 cnt=24 pid=15 pos=1 obj=0 op='SORT UNIQUE (cr=27 r=0 w=0
> time=1334 us)'
> STAT #4 id=17 cnt=24 pid=16 pos=1 obj=0 op='MERGE JOIN CARTESIAN
> (cr=27 r=0 w=0 time=1099 us)'
> STAT #4 id=18 cnt=24 pid=17 pos=1 obj=6600 op='TABLE ACCESS BY INDEX
> ROWID LOT (cr=24 r=0 w=0 time=692 us)'
> STAT #4 id=19 cnt=24 pid=18 pos=1 obj=0 op='BITMAP CONVERSION TO
> ROWIDS (cr=4 r=0 w=0 time=320 us)'
> STAT #4 id=20 cnt=1 pid=19 pos=1 obj=0 op='BITMAP AND (cr=4 r=0 w=0
> time=283 us)'
> STAT #4 id=21 cnt=1 pid=20 pos=1 obj=0 op='BITMAP CONVERSION FROM
> ROWIDS (cr=2 r=0 w=0 time=125 us)'
> STAT #4 id=23 cnt=1 pid=20 pos=2 obj=0 op='BITMAP CONVERSION FROM
> ROWIDS (cr=2 r=0 w=0 time=120 us)'
> STAT #4 id=25 cnt=24 pid=17 pos=2 obj=0 op='BUFFER SORT (cr=3 r=0 w=0
> time=264 us)'
> STAT #4 id=26 cnt=1 pid=25 pos=1 obj=222 op='TABLE ACCESS FULL DUAL
> (cr=3 r=0 w=0 time=94 us)'
> STAT #4 id=27 cnt=24 pid=14 pos=2 obj=0 op='BUFFER SORT (cr=2 r=0 w=0
> time=305 us)'
> STAT #4 id=28 cnt=1 pid=27 pos=1 obj=6594 op='TABLE ACCESS BY INDEX
> ROWID EXPOSURETOOLMODEL (cr=2 r=0 w=0 time=37 us)'
> STAT #4 id=29 cnt=1 pid=28 pos=1 obj=7500 op='INDEX UNIQUE SCAN
> AKEXPOSURETOOLMODEL (cr=1 r=0 w=0 time=15 us)'
> STAT #4 id=30 cnt=10 pid=4 pos=2 obj=6584 op='TABLE ACCESS BY INDEX
> ROWID EXPOSUREANALYSISDATA (cr=32 r=8 w=0 time=15797744 us)'
> STAT #4 id=31 cnt=10 pid=30 pos=1 obj=7471 op='INDEX RANGE SCAN
> PKEXPOSUREANALYSISDATA (cr=22 r=4 w=0 time=15744043 us)'
> STAT #4 id=32 cnt=100 pid=3 pos=2 obj=7590 op='INDEX RANGE SCAN
> PKSCALARCOEFFICIENT (cr=32 r=5 w=0 time=81727 us)'
>
> Note that we're paying almost 16 seconds to fetch a single block from
> the disk.
>
> The problem is, I'm struggling to figure out why. At the time of this
> call, there is nothing else going on in our application that is
> accessing the db - certainly nothing that could block the system for
> over 10 seconds at a time. I need to find what it is that's standing
> in between Oracle and the bits on the platters of our disks. Is it a
> bad disk? Some Veritas weirdness? Some job that's starting up that
> we've overlooked?
>
> We're running Oracle 9.2.0.4.0 on Solaris 8 and our RAID is managed by
> Veritas. Is there some sort of low-level tools we can run that would
> determine if we have some bum disks or if there is suspicious access
> of the disks at certain times? I've used sar, but I don't know how to
> get it to give information that is substantially different from iostat
> (i.e. just aggregated statistics & averages).
>
> Thanks,
> -Steve

The only reason that I can think of is that the disk is spun down. If it were that b*ggered, there's be loads of errors in /var/adm/messages and /var/log/syslog. If this is the case, once you've got the disks spun up, then transfer times will be fine. try copying a big file back and forth.

Is the storage external? If so, are you using fibre cables, and, if so, are there any errors logged on the gbics - always a good point of failure.

There are things that your sysadmin should be able to answer ( and ask many more questions besides ). They should also be able to provide enough info that we have a chance of diagnosing the problem.

Steve Received on Tue Oct 26 2004 - 01:03:47 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US