10046 or gettimeofday bug..?
Date: Mon, 19 Jul 2010 23:05:28 +0000
Message-ID: <COL117-W292E3E382FA040E7C2A386B7BF0_at_phx.gbl>
RHEL 4 Update 8 x86_64 (Kernel 2.6.9-89)Oracle 11.1.0.7 EE
I'm seeing some strange behaviour in a 10046 trace file which appears to be a bug. The CPU time is significantly greater than the Elapsed time for one of our key database transactions:
********************************************************************************begin epp_domain.domain_check(connection_id => :conid,session_id => :sesid, cltrid => :cltri,domains => :doms,response => :res,chkdata => :chkd); end;
call count cpu elapsed disk query current rows------- ------ -------- ---------- ---------- ---------- ---------- ----------Parse 0 0.00 0.00 0 0 0 0Execute 1919 11.66 7.30 0 107099 213114 1919Fetch 0 0.00 0.00 0 0 0 0------- ------ -------- ---------- ---------- ---------- ---------- ----------total 1919 11.66 7.30 0 107099 213114 1919
Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user id: 63
Elapsed times include waiting on following events: Event waited on Times Max. Wait Total Waited ---------------------------------------- Waited ---------- ------------ SQL*Net message to client 1919 0.00 0.00 log file sync 1815 0.32 10.91 SQL*Net message from client 1919 0.41 21.25 library cache: mutex X 1282 0.01 0.24 latch free 39 0.00 0.01 latch: row cache objects 11 0.01 0.04 latch: cache buffers chains 5 0.00 0.00
********************************************************************************
I've had a read through Chapter 7 of Optimizing Oracle Performance to try to identify the cause of the missing time but none of the causes seem to fit this anomaly (at least not on this scale).
So I started digging into the raw 10046 and found some of the EXEC database calls for that transaction had elapsed times of zero despite a CPU time of about 10,000. Here is a snippet of the raw trace file showing this behaviour: EXEC #9:c=10998,e=44097,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545724361EXEC #9:c=11999,e=30131,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545961372EXEC #9:c=10999,e=17654,p=3,cr=95,cu=184,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546021160EXEC #9:c=15997,e=22003,p=1,cr=99,cu=108,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546054541EXEC #9:c=9998,e=0,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546056298EXEC #9:c=11999,e=15062,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546103366EXEC #9:c=10998,e=15593,p=0,cr=89,cu=155,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546136343EXEC #9:c=10998,e=19331,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546209544EXEC #9:c=12998,e=22715,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546232848
This 'e=0' pattern occurs approximately every 20 executions of that transaction.If I check the 10046 trace I can see many hundreds of lines of recursive SQL (associated with an EXEC call that has e=0) all with the same "tim=" value. It's as if time stops briefly periodically.
So I searched for lines with e=0 where c<>0 and found EXEC calls for recursive SQL which also seemed to be missing time:
EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3207244213,tim=1278652525564956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2502456972,tim=1278652525564956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1940757184,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1906887687,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3034500768,tim=1278652525586956EXEC #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=328706489,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1515982702,tim=1278652525586956EXEC #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=5,cu=12,mis=0,r=1,dep=1,og=1,plh=1021020385,tim=1278652525586956
To give you an idea of the extent of this 'pause', there are 94 lines with the same tim= value in the 10046 file:
$ grep 1278652525586956 epp4MATT.trc |wc -l94
Is this a bug in gettimeofday..? or am I missing something here..? any help appreciated.
http://clk.atdmt.com/UKM/go/197222280/direct/01/ Do you have a story that started on Hotmail? Tell us now-- http://www.freelists.org/webpage/oracle-l Received on Mon Jul 19 2010 - 18:05:28 CDT