RE: Timestamps in trace files (and other trace file oddities)

From: Wolfgang Breitling <breitliw_at_centrexcc.com>
Date: Thu, 24 Jul 2003 14:21:29 -0600
Message-Id: <26007.339481@fatcity.com>

20% idle = 80% busy? How many cpus?

If it is 20% idle that should lay to rest the theory that the process may wait for a cpu. But, if it is a 5-way join, maybe it is reading a set of blocks from file 65 and then collects the corresponding rows from the other 4 tables, which must be all pure logical IO since there are no further waits. But if you have a few nested NL joins, they can easily churn through millions of LIOs taking minutes of cpu. And if you have multiple cpus, only one of them would be 100% busy during that time, leaving some overall idle %. Since the corresponding rows of the other tables get heavily referenced, they stay virtually "pinned" in the buffer ( in quotes because pinned in the buffer pool context has a different meaning than the ususal connotation ) offering an explanation why the already read 17 blocks need to be re-read once the session is done with block 6041 and wants to move on to block 6042.

At 12:34 PM 7/24/2003 -0800, you wrote:
>Dan,
> No reason to stay in the corner (unless that's where the fridge
> is). I'm
>trying to decide where to look on the OS. When I looked at vmstat, there was
>no paging, and ~20% idle cpu. Maybe set up a cron to take snapshots of
>something(??) every few minutes? Support from the Sys Admin side is minimal,
>so I am kind of on my own. I also have more experience with Sun than AIX.
>
>Henry
>
>
>-----Original Message-----
>Daniel Fink
>Sent: Thursday, July 24, 2003 4:14 PM
>To: Multiple recipients of list ORACLE-L
>
>
>Henry,
> I completely agree, something did not feel right (hence my obvious
>reluctance). I did a very poor job of trying to explain that we cannot
>extrapolate system waits/activity by looking at database waits/activity. I
>should have learned my lesson (bad dba...go sit in the corner) from a long
>email discussion with another list member where I was trying to understand a
>session issue by examining a single processes trace file.
>
> Wolgang's answer really nails the issue. What I think it comes
> down to is
>that wait events are only valid for what Oracle is waiting for internally.
>It cannot see what is happening at the system level. Some of the system
>activity it can determine by the call it is making. For example, when a
>request is sent to disk, it knows that it is waiting for the data to be
>returned. However, it does not know what is consuming the wait time. It
>could be that the disk service time is high or that there are
>so many requests for the disk that they are being queued up. All the process
>knows is that the data has not been returned. As Wolfgang illustrated, the
>same thing happens at the CPU. I doubt that all 7 minutes were CPU, most
>likely there was a lot of intervening activity. All Oracle know was that it
>was not waiting on something to happen internally.
>
> I hope I've done better the second time around.
>
>Dan
>
>Henry Poras wrote:
> >
> > Dan,
> > I see what you are saying but something doesn't feel right (see,
>say, feel.
> > Should be some pun in here about not making sense). CPU time of 7 minutes
> > after a single read??? This is an ugly INSERT on a 5 table join, but
>that's
> > about it. Unfortunately I don't think there is a way to measure service
>time
> > at this granularity. Isn't it recorded in sesstat at the end of the
> > transactions? Guess I'll go and double check that.
> > Also, I remember an old paper by Craig Shallahamer that shows you
>don't get
> > a full multi_block_read_count if a block is already in buffer cache (e.g.
> > mbrc=8; p3=8,p3=8,p3=3,...The 3 is because the fourth block is already in
> > memory. The cached block is already reflected in the trace numbers.)
> > There is something very strange hear but I don't know what. Might
>be an
> > interesting case to track db/os relationships (running on AIX 4.3)
> >
> > Henry
> >
> > -----Original Message-----
> > Daniel Fink
> > Sent: Thursday, July 24, 2003 11:49 AM
> > To: Multiple recipients of list ORACLE-L
> >
> > Henry,
> > I'll make an attempt, but I am still learning a great deal about
>wait
> > events and trace files. Cary, Mogens, Anjo, Tim, Jonathan, Wolfgang,
>et.al.
> > are better authorities, so any corrections are very welcome.
> > The time between waits is the elapsed time. If we equate elapsed
>time to
> > response time, we must look at the two components of RT, wait time and
> > service time. If the time differential is 7 minutes, x minutes is service
> > time and y minutes is wait time. It is possible that only .01 seconds was
> > wait time, while the remaining 6:59:99 was service time (i.e. cpu)
> > I can't explain the scattered read issue. However, it is
>interesting to
> > note that the # of blocks to be read is offset by 1 in the second wait
> > event. Perhaps, the read request was issued, but the 65.6041 block was
> > already in memory, so the read request was reissued minus that block. I'm
> > not sure how the latch free enters into the picture, but I have a feeling
> > that this wait is integral in the sequence of events.
> >
> > Dan
> >
>
>--
>Please see the official ORACLE-L FAQ: http://www.orafaq.net
>--
>Author: Henry Poras
> INET: hporas_at_etal.uri.edu
>
>Fat City Network Services -- 858-538-5051 http://www.fatcity.com
>San Diego, California -- Mailing list and web hosting services
>---------------------------------------------------------------------
>To REMOVE yourself from this mailing list, send an E-Mail message
>to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
>the message BODY, include a line containing: UNSUB ORACLE-L
>(or the name of mailing list you want to be removed from). You may
>also send the HELP command for other information (like subscribing).
Received on Thu Jul 24 2003 - 15:21:29 CDT