RE: hash join waits on cpu 100% time

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Sun, 4 Jan 2015 12:12:37 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D928266A76_at_EXMBX01.thus.corp>




You might as well capture the full SQL Monitoring output for the completed query.

Then I'd check the definition of the view V_SA_tab_personal_ADD  to see what it's doing with the "with subquery".  

I think most of the work relates purely to the number and size of the tablescans of the resultant GTT - but the CPU used still looks a little high (even if it does mange to scan 2 billiong rows in the first 1,000 seconds) so I want to know how it reduces those rows down to one per scan - as indicated by the Rows (Actual) in lines 19 and 18.  I'd want the execution plan from v$sql_plan so that I could check the predicate section to see how any predicates were applied.

If I were to start running the query again I would use Tanel's code to capture a snapshot of latch activity over a few seconds to see if there were any significant latch gets over and above those related to the buffer cache.


Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle

________________________________________
From: GG [grzegorzof_at_interia.pl]
Sent: 04 January 2015 10:48
To: Jonathan Lewis; oracle-l_at_freelists.org
Subject: Re: hash join waits on cpu 100% time

W dniu 2015-01-04 o 11:30, Jonathan Lewis pisze:
> Checking the description of the bug it seems to say that it's the creation of the internal GTT that is the threat. In the context of the bug - it's Siebel (which typically means large numbers of columns) and a very large number of (possibly small ?) queries. This would make the recursive SQL a significant fraction of the total work done, and for concurrent queries introduce a degree of latch contention.
>
> In your case you are talking about the execution of a single big query; moreover, if the problem were recursive SQL then the Tanel snapshot would show a high value for STAT execute count. Finally, if you check the monitoring execution plan you can see (lines 12/13) that the Load as Select starts 8 seconds into the query and completes within one second - that's the point at which the bug would apply.
>
> There is one oddity that I would like  an explanation for - how come line 18 started at +4 seconds, when line 12 didn't start until +8 seconds ?  The only (non-buggy) thing I can think of is that there's a run-time dependency that doesn't show up in the execution plan - even then that's only showing (at worst) only a 4 second anomaly, not a very slow query.
Got Your point Jonathan . Query finished after 10h , I'm out of office
right now so cant provide any more details .
Just wondering ,how can we narrow down the issue, should I go for
detailed lvl 12 10046 trace or/and better 10053 ?
Seems like Oracle instrumentation lacks of granularity if Your query
'hangs on' CPU .
Maybe I'll try with few short stack dumps using oradebug when query is
running .

Regards
GG


--
http://www.freelists.org/webpage/oracle-l Received on Sun Jan 04 2015 - 13:12:37 CET

Original text of this message