10046 or gettimeofday bug..?

From: Matt McClernon <mccmx_at_hotmail.com>
Date: Mon, 19 Jul 2010 23:05:28 +0000
Message-ID: <COL117-W292E3E382FA040E7C2A386B7BF0_at_phx.gbl>

RHEL 4 Update 8 x86_64 (Kernel 2.6.9-89)Oracle 11.1.0.7 EE

I'm seeing some strange behaviour in a 10046 trace file which appears to be a bug. �The CPU time is significantly greater than the Elapsed time for one of our key database transactions:
********************************************************************************begin epp_domain.domain_check(connection_id => :conid,session_id => :sesid,�cltrid => :cltri,domains => :doms,response => :res,chkdata => :chkd); end;
call � � count � � � cpu � �elapsed � � � disk � � �query � �current � � � �rows------- ------ �-------- ---------- ---------- ---------- ---------- �----------Parse � � � �0 � � �0.00 � � � 0.00 � � � � �0 � � � � �0 � � � � �0 � � � � � 0Execute � 1919 � � 11.66 � � � 7.30 � � � � �0 � � 107099 � � 213114 � � � �1919Fetch � � � �0 � � �0.00 � � � 0.00 � � � � �0 � � � � �0 � � � � �0 � � � � � 0------- ------ �-------- ---------- ---------- ---------- ---------- �----------total � � 1919 � � 11.66 � � � 7.30 � � � � �0 � � 107099 � � 213114 � � � �1919

Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user id: 63

Elapsed times include waiting on following events:��Event waited on � � � � � � � � � � � � � � Times � Max. Wait �Total Waited��---------------------------------------- � Waited �---------- �------------��SQL*Net message to client � � � � � � � � � �1919 � � � �0.00 � � � � �0.00��log file sync � � � � � � � � � � � � � � � �1815 � � � �0.32 � � � � 10.91��SQL*Net message from client � � � � � � � � �1919 � � � �0.41 � � � � 21.25��library cache: mutex X � � � � � � � � � � � 1282 � � � �0.01 � � � � �0.24��latch free � � � � � � � � � � � � � � � � � � 39 � � � �0.00 � � � � �0.01��latch: row cache objects � � � � � � � � � � � 11 � � � �0.01 � � � � �0.04��latch: cache buffers chains � � � � � � � � � � 5 � � � �0.00 � � � � �0.00


********************************************************************************

I've had a read through Chapter 7 of Optimizing Oracle Performance to try to identify the cause of the missing time but none of the causes seem to fit this anomaly (at least not on this scale).

So I started digging into the raw 10046 and found some of the EXEC database calls for that transaction had elapsed times of zero despite a CPU time of about 10,000. �Here is a snippet of the raw trace file showing this behaviour: EXEC #9:c=10998,e=44097,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545724361EXEC #9:c=11999,e=30131,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545961372EXEC #9:c=10999,e=17654,p=3,cr=95,cu=184,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546021160EXEC #9:c=15997,e=22003,p=1,cr=99,cu=108,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546054541EXEC #9:c=9998,e=0,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546056298EXEC #9:c=11999,e=15062,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546103366EXEC #9:c=10998,e=15593,p=0,cr=89,cu=155,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546136343EXEC #9:c=10998,e=19331,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546209544EXEC #9:c=12998,e=22715,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546232848

This 'e=0' pattern occurs approximately every 20 executions of that transaction.If I check the 10046 trace I can see many hundreds of lines of recursive SQL (associated with an EXEC call that has e=0) all with the same "tim=" value. It's as if time stops briefly periodically.

So I searched for lines with e=0 where c<>0 and found EXEC calls for recursive SQL which also seemed to be missing time:

EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3207244213,tim=1278652525564956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2502456972,tim=1278652525564956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1940757184,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1906887687,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3034500768,tim=1278652525586956EXEC #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=328706489,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1515982702,tim=1278652525586956EXEC #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=5,cu=12,mis=0,r=1,dep=1,og=1,plh=1021020385,tim=1278652525586956

To give you an idea of the extent of this 'pause', there are 94 lines with the same tim= value in the 10046 file:

$ grep 1278652525586956 epp4MATT.trc |wc -l94

Is this a bug in gettimeofday..? �or am I missing something here..? �any help appreciated.

http://clk.atdmt.com/UKM/go/197222280/direct/01/ Do you have a story that started on Hotmail? Tell us now-- http://www.freelists.org/webpage/oracle-l Received on Mon Jul 19 2010 - 18:05:28 CDT

This message: [ Message body ]
Next message: Ravi Madabhushanam: "Re: Mutiple sessions for Managed Recovery Oracle10gr2"
Previous message: Matt McClernon: "RE: RAC h/w basic costs"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message