10046 or gettimeofday bug..?

From: Matt McClernon <mccmx_at_hotmail.com>
Date: Mon, 19 Jul 2010 23:05:28 +0000
Message-ID: <COL117-W292E3E382FA040E7C2A386B7BF0_at_phx.gbl>


RHEL 4 Update 8 x86_64 (Kernel 2.6.9-89)Oracle 11.1.0.7 EE

I'm seeing some strange behaviour in a 10046 trace file which appears to be a bug.  The CPU time is significantly greater than the Elapsed time for one of our key database transactions:
********************************************************************************begin epp_domain.domain_check(connection_id => :conid,session_id => :sesid, cltrid => :cltri,domains => :doms,response => :res,chkdata => :chkd); end;
call     count       cpu    elapsed       disk      query    current        rows------- ------  -------- ---------- ---------- ---------- ----------  ----------Parse        0      0.00       0.00          0          0          0           0Execute   1919     11.66       7.30          0     107099     213114        1919Fetch        0      0.00       0.00          0          0          0           0------- ------  -------- ---------- ---------- ---------- ----------  ----------total     1919     11.66       7.30          0     107099     213114        1919

Misses in library cache during parse: 0Optimizer mode: ALL_ROWSParsing user id: 63

Elapsed times include waiting on following events:  Event waited on                             Times   Max. Wait  Total Waited  ----------------------------------------   Waited  ----------  ------------  SQL*Net message to client                    1919        0.00          0.00  log file sync                                1815        0.32         10.91  SQL*Net message from client                  1919        0.41         21.25  library cache: mutex X                       1282        0.01          0.24  latch free                                     39        0.00          0.01  latch: row cache objects                       11        0.01          0.04  latch: cache buffers chains                     5        0.00          0.00

********************************************************************************

I've had a read through Chapter 7 of Optimizing Oracle Performance to try to identify the cause of the missing time but none of the causes seem to fit this anomaly (at least not on this scale).

So I started digging into the raw 10046 and found some of the EXEC database calls for that transaction had elapsed times of zero despite a CPU time of about 10,000.  Here is a snippet of the raw trace file showing this behaviour: EXEC #9:c=10998,e=44097,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545724361EXEC #9:c=11999,e=30131,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652545961372EXEC #9:c=10999,e=17654,p=3,cr=95,cu=184,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546021160EXEC #9:c=15997,e=22003,p=1,cr=99,cu=108,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546054541EXEC #9:c=9998,e=0,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546056298EXEC #9:c=11999,e=15062,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546103366EXEC #9:c=10998,e=15593,p=0,cr=89,cu=155,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546136343EXEC #9:c=10998,e=19331,p=0,cr=92,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546209544EXEC #9:c=12998,e=22715,p=0,cr=89,cu=154,mis=0,r=1,dep=0,og=1,plh=0,tim=1278652546232848

This 'e=0' pattern occurs approximately every 20 executions of that transaction.If I check the 10046 trace I can see many hundreds of lines of recursive SQL (associated with an EXEC call that has e=0) all with the same "tim=" value. It's as if time stops briefly periodically.

So I searched for lines with e=0 where c<>0 and found EXEC calls for recursive SQL which also seemed to be missing time:

EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3207244213,tim=1278652525564956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=2502456972,tim=1278652525564956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1940757184,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1906887687,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=3034500768,tim=1278652525586956EXEC #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=328706489,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1515982702,tim=1278652525586956EXEC #10:c=999,e=0,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,plh=1388734953,tim=1278652525586956EXEC #10:c=1000,e=0,p=0,cr=5,cu=12,mis=0,r=1,dep=1,og=1,plh=1021020385,tim=1278652525586956

To give you an idea of the extent of this 'pause', there are 94 lines with the same tim= value in the 10046 file:

$ grep 1278652525586956 epp4MATT.trc |wc -l94

Is this a bug in gettimeofday..?  or am I missing something here..?  any help appreciated.                                                



http://clk.atdmt.com/UKM/go/197222280/direct/01/ Do you have a story that started on Hotmail? Tell us now-- http://www.freelists.org/webpage/oracle-l Received on Mon Jul 19 2010 - 18:05:28 CDT

Original text of this message