Re: Approach to identify root cause of CPU spike by using ASH report

From: Chris Taylor <christopherdtaylor1994_at_gmail.com>
Date: Sun, 18 Mar 2018 11:17:16 -0500
Message-ID: <CAP79kiSJ_r5SDQX22AoqEFyq7thWtBOU78dJD=oH13CyfRTsJw_at_mail.gmail.com>



Nirav,

You may want to use FORCE_MATCHING_SIGNATURE and MIN(SQL_ID) as sample_sql_id to group all the similar sql_ids together (as they will often share the same FORCE_MATCHING_SIGNATURE but have different SQL_IDs).

Here's something I use (or a version of it). This will group the top SQLs by sample_hour - you can tweak that or totally remove it if its too fine grained.

  SELECT /*+ parallel(4) */

         TRUNC ( sh.sample_time, 'HH' ) AS sample_hour
        ,MIN ( sh.sql_id )            AS sample_sql_id
        ,sh.force_matching_signature
        ,sh.top_level_sql_id
        ,SUM ( sh.tm_delta_cpu_time ) AS sum_cpu_time
    FROM dba_hist_active_sess_history sh
   WHERE sh.sample_time >= TRUNC ( SYSDATE ) + 2 / 24
     AND sh.sample_time <= TRUNC ( SYSDATE ) + 4 / 24
     AND sh.session_state = 'ON CPU'
     AND sh.force_matching_signature != 0
GROUP BY TRUNC ( sh.sample_time, 'HH' )
        ,sh.force_matching_signature
        ,sh.top_level_sql_id

ORDER BY sum_cpu_time DESC
--fetch first 5 rows only
/

Chris

On Sun, Mar 18, 2018 at 10:52 AM, Nirav A Shah <shivam71_at_hotmail.com> wrote:

> Hi Experts,
>
>
> I have a question on what is the right approach to identify root cause of
> CPU spike from ASH report and will be thankful for review of the following
> , which is the approach that I have been using – and now I am getting some
> question/doubt on it:
>
> For the scope of this question, I am assuming that it is a specific sql
> that is the cause of the issue. Please consider this to be an assumption
> that is true for this analysis, that given that as a fact that it is indeed
> a sql that causing the CPU spike, how it should be identified from the ASH
> report.
>
> Also assume that we have required licence neeeded for AWR/ASH.
>
>
> We are on Oracle 11.2.0.4 on Linux. We have alerts setup for CPU and
> monitor for CPU spike when it goes above 70%. –When this happens I do the
> following:
>
> 1) Identify which database node had the spike (we have 4 node RAC). For
> that node, take a 10 minute ASH report e.g. if spike is at 5.04 pm then I
> take an ASH report from 5.00 pm to 5.10 pm.
>
> 2) Look at what is the top most one/two sql id in the “Top SQL with Top
> Events” section and consider this as the root cause of the CPU spike.
>
>
> Now I think that a better approach could have been:
>
> 1) Look at DBA_HIST_ACTIVE_SESS_HISTORY for that 10 minute period
> (basically the time frame of spike) and check which sql id was ‘ON CPU’
> during that time. 2) Among all such sql id that were ‘ON CPU’ one that
> present for highest number of times is the one that is the culprit.
>
>
> Again I am not 100% sure if this is the right approach but this is what I
> think as of now.
>
>
>
> With Regards,
>
> Nirav
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Mar 18 2018 - 17:17:16 CET

Original text of this message