RE: Timestamps in trace files (and other trace file oddities)

From: Cary Millsap <cary.millsap_at_hotsos.com>
Date: Thu, 24 Jul 2003 23:27:39 -0500
Message-Id: <26007.339503@fatcity.com>

I absolutely agree with Wolfgang. This pattern of behavior in the trace file is common for applications that do an excessive amount of LIO processing. (LIO = "logical I/O," or a fetch of a buffer's content from the database buffer cache.)

Henry, you don't show the database call whose line was eventually written to the trace file (unless you killed the job before it completed). As Wolfgang pointed out, a line emits to the trace data only after the event it represents completed. There's probably a FETCH line (if this was a SELECT) or an EXEC line (if it was an INSERT, UPDATE, DELETE, or MERGE) that came out eventually with a whopper of a cr or cu field value (or both).

If you were to query V$SESS_IO repeatedly while this trace file is being generated, you would probably see an enormous number of BLOCK_GETS or CONSISTENT_GETS racking up to the session you were tracing.

The verdict: Inefficient SQL. I can tell you this with 100% certainty. The only way out is to tune it. No init.ora parameter, and no amount of memory, CPU capacity, or faster I/O devices can fix this.

By the way... Even though the WAIT events you see in the trace file constitute a miniscule percentage of your response time, their very existence correlates strongly with the verdict I just proposed. The occasional 'db file scattered read' is an indication of the kernel periodically executing multiblock reads upon a database file descriptor. The fact that they're only occasional is a good indication that your inefficient SQL is revisiting the same blocks over and over again. That's great for your cache hit ratio, but horrible for performance, because every LIO typically consumes something on the order of 50 microseconds per call. (That's 50 seconds of execution time for every million LIOs your program executes.)

The 'latch free' wait indicates competition for an acquisition of latch #66 (p2) on your system, which--just guessing--is probably your 'cache buffers chains' latch. (Check your V$LATCHNAME to be sure.) An acquisition of one of these latches is required for every single LIO your system ever does. Acquiring this latch also burns CPU (see "Why you should focus on LIOs instead of PIOs" at www.hotsos.com/catalog for details). When concurrency is high, competition for these latches leads to "latch free" waits such as the one you see here. If you were to watch V$LATCH while this thing runs, you'll also notice that GETS for the latch with LATCH#=66 is probably spinning through the roof, just like your V$SESS_IO stuff was.

It's all a part of the same big, very common pattern.

Hope this helps...

Cary Millsap
Hotsos Enterprises, Ltd.
http://www.hotsos.com

Upcoming events:

- Hotsos Clinic 101 in Denver, Sydney
- Hotsos Symposium 2004, March 7-10 Dallas
- Visit www.hotsos.com for schedule details...

-----Original Message-----
From: ml-errors_at_fatcity.com [mailto:ml-errors_at_fatcity.com] On Behalf Of Wolfgang Breitling
Sent: Thursday, July 24, 2003 11:44 AM
To: Multiple recipients of list ORACLE-L Subject: RE: Timestamps in trace files (and other trace file oddities)

My interpretation would be as follows:

The wait entries are written whenever a wait ends, so at 15:40:59.149 the session has just been waiting .00 seconds for a scattered read of 18 blocks. At 15:46:06.340 it just had been waiting on a latch free event. For the almost seven minutes between, it had not been waiting on any of the established wait events. It had either been processing the blocks returned, or it could have been waiting in the OS scheduler queue waiting for a cpu to become available - or both intermittendly. After coming out of the latch free wait, it found that the blocks it had previously read had been flushed from the buffer (not surprising after 7 minutes) and needed to read them again.

At 07:04 AM 7/24/2003 -0800, you wrote:
>(Tried sending this yesterday. I'll try again)
>
>Dan,
>I was running a 10046 (level 12) trace on an awful piece of PeopleSoft SQL
>today and got some really odd results in my trace file (8.1.7).
>
>*** 2003-07-23 15:40:59.149
>WAIT #1: nam='db file scattered read' ela= 0 p1=65 p2=6041 p3=18
>*** 2003-07-23 15:46:06.340
>WAIT #1: nam='latch free' ela= 1 p1=813986232 p2=66 p3=0
>*** 2003-07-23 15:47:53.851
>WAIT #1: nam='db file scattered read' ela= 0 p1=65 p2=6042 p3=17
>
>Two things struck me (three if it takes me too long to write this and I get
>home late). First, the timestamps show an elapsed time of ~7 minutes, but
>the trace file has ela=1 (one onehundredth of a second). The 7 minutes is
>closer to reality. Huh???
>
>Secondly, the first scattered read reads 18 blocks starting at 6041. Why
>does the next scattered read start at block# 6042?
>
>Any ideas?

Wolfgang Breitling
Oracle7, 8, 8i, 9i OCP DBA
Centrex Consulting Corporation
http://www.centrexcc.com

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Wolfgang Breitling
  INET: breitliw_at_centrexcc.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may

Received on Thu Jul 24 2003 - 23:27:39 CDT