Fw: Random Slowdowns on Hash Joins

From: Charlotte Hammond <charlottejanehammond_at_yahoo.com>
Date: Thu, 6 Nov 2008 14:24:28 -0800 (PST)
Message-ID: <505408.88889.qm@web50206.mail.re2.yahoo.com>


Hi Stéphane  & All,

Sorry - should have mentioned the waits - nothing really significant on I/O.   I've put the tkprof output for one of the main queries during a normal run and a slow run below (I hope the formatting isn't too bad).

As you can see the CPU time on the slow run FETCH is much larger. 

In the explain plan this time appears to be concentrated on the hash joins.  The raw trace files show lines like FETCH #10:c=80000,e=87368,p=1,cr=11,cu=0,mis=0,r=1,dep=0,og=1,tim=18141452357101 so it does seem to be genuine CPU time.

As mentioned the server is lightly loaded (but not completely idle) at the time. (The application also seems to do a lot of unnecessary parsing but that doesn't appear to affect the time much)

  • NORMAL RUN ***
SELECT query:
call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    11498      0.67       0.55          0          0          0           0
Execute  11498     15.40      15.25          0          0          0           0
Fetch    11497    134.96     133.06          2     174679          0       11497
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total    34493    151.03     148.87          2     174679          0       11497

Misses in library cache during parse: 1
Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS
Parsing user id: 353

Rows     Row Source Operation
-------  ---------------------------------------------------
      1  SORT AGGREGATE (cr=11 pr=2 pw=0 time=23437 us)
      4   HASH JOIN  (cr=11 pr=2 pw=0 time=23087 us)
      7    TABLE ACCESS BY INDEX ROWID ACCT_TRANS (cr=6 pr=0 pw=0 time=161 us)
      7     INDEX RANGE SCAN ATB3_I (cr=3 pr=0 pw=0 time=103 us)(object id 8072)
     13    VIEW  index$_join$_001 (cr=5 pr=2 pw=0 time=22373 us)
     13     HASH JOIN  (cr=5 pr=2 pw=0 time=22335 us)
     13      INLIST ITERATOR  (cr=2 pr=0 pw=0 time=103 us)
     13       INDEX RANGE SCAN BTB2_I (cr=2 pr=0 pw=0 time=89 us)(object id 10243)
     80      INDEX FAST FULL SCAN BTB_PK (cr=3 pr=2 pw=0 time=17326 us)(object id 10244)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   11498        0.00          0.02
  db file sequential read                         2        0.01          0.01
  SQL*Net message from client                 11498        0.00          7.05
  latch: shared pool                              1        0.00          0.00

  • SLOW RUN ***   call     count       cpu    elapsed       disk      query    current        rows ------- ------  -------- ---------- ---------- ---------- ----------  ---------- Parse    11497      0.76       0.61          0          0          0           0 Execute  11497     15.87      15.37          0          0          0           0 Fetch    11496    726.43     709.89          1     174669          0       11496 ------- ------  -------- ---------- ---------- ---------- ----------  ---------- total    34490    743.06     725.89          1     174669          0       11496   Misses in library cache during parse: 1 Misses in library cache during execute: 1 Optimizer mode: ALL_ROWS Parsing user id: 287   Rows     Row Source Operation -------  ---------------------------------------------------       1  SORT AGGREGATE (cr=11 pr=1 pw=0 time=87366 us)       4   HASH JOIN  (cr=11 pr=1 pw=0 time=86957 us)       7    TABLE ACCESS BY INDEX ROWID ACCT_TRANS (cr=6 pr=0 pw=0 time=130 us)       7     INDEX RANGE SCAN ATB3_I (cr=3 pr=0 pw=0 time=83 us)(object id 8072)      13    VIEW  index$_join$_001 (cr=5 pr=1 pw=0 time=86298 us)      13     HASH JOIN  (cr=5 pr=1 pw=0 time=86244 us)      13      INLIST ITERATOR  (cr=2 pr=0 pw=0 time=140 us)      13       INDEX RANGE SCAN BTB2_I (cr=2 pr=0 pw=0 time=111 us)(objectid 10243)      80      INDEX FAST FULL SCAN BTB_PK (cr=3 pr=1 pw=0 time=15315 us)(object id 10244)

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                   11497        0.00          0.02
  db file sequential read                         1        0.01          0.01
  SQL*Net message from client                 11497        0.00          9.39
  latch: shared pool                              2        0.00          0.00
 

Thanks again!  
Charlotte

 

  • Original Message ---- From: Stephane Faroult <sfaroult_at_roughsea.com> To: charlottejanehammond_at_yahoo.com Cc: oracle-l_at_freelists.org Sent: Wednesday, November 5, 2008 10:32:41 PM Subject: Re: Random Slowdowns on Hash Joins

Charlotte,

Since you insist one the fact that the elapsed time is much higher, I presume that the CPU time is about the same. What do the trace file says about waits? That's probably where you'll find the difference. What kind of storage have you at the back? If it's a shared SAN, it's quite possible that some other application flushes the cache - same nominal number of physical I/Os for Oracle, but very different time since instead of hitting the SAN cache you really have to get the data from disk. It should show somewhere. If someone is randomly computing statistics in your back in another database on an other machine that shares the SAN storage, it can happen ...

HTH Stéphane Faroult

Charlotte Hammond wrote:

> Hi All!  We're running a batch job on test. After each run everything is reset using an RMAN restore. So each time it runs with exactly the same data, execution plans etc.
>  However sometimes (occassionally) its about 2-3 times as slow as normal. Comparing a slow trace file with a normal trace file I find that execution plans are the same (as expected) and the biggest difference is that HASH JOINs all have (say) e=80000 instead of (say) e=20000.  Hash Joins are small but very frequent in this job, so this does add up to the time difference observed. But why is this slow down happening?
>  The "server" is a Sun Solaris LDOM on a T2 processor with 32 CMTs masquerading as 32 virtual processors. This may not be relevant but that's where my suspicion is at the moment. The load is fairly consistent during each test - a few (maybe 6 or 7) application connections doing minor bits of work concurrently but the overall load is low. The batch job is single threaded (no parallel queries etc.)  Any ideas or suggestions for further diagnostics? (This is Oracle 10.2.0.3 EE on Solaris 10)
> (BTW - it's frustrating trying to get it to be slow when you want it, it just sort of happens from time to time unpredictably)
>  Many thanks!
> Charlotte 
> 
>      --
> http://www.freelists.org/webpage/oracle-l
> 
> 
> 
>  


--
http://www.freelists.org/webpage/oracle-l
Received on Thu Nov 06 2008 - 16:24:28 CST

Original text of this message