Re: Approach to identify root cause of CPU spike by using ASH report
Date: Sun, 18 Mar 2018 11:17:16 -0500
Message-ID: <CAP79kiSJ_r5SDQX22AoqEFyq7thWtBOU78dJD=oH13CyfRTsJw_at_mail.gmail.com>
Nirav,
You may want to use FORCE_MATCHING_SIGNATURE and MIN(SQL_ID) as sample_sql_id to group all the similar sql_ids together (as they will often share the same FORCE_MATCHING_SIGNATURE but have different SQL_IDs).
Here's something I use (or a version of it). This will group the top SQLs by sample_hour - you can tweak that or totally remove it if its too fine grained.
SELECT /*+ parallel(4) */
TRUNC ( sh.sample_time, 'HH' ) AS sample_hour ,MIN ( sh.sql_id ) AS sample_sql_id ,sh.force_matching_signature ,sh.top_level_sql_id ,SUM ( sh.tm_delta_cpu_time ) AS sum_cpu_timeFROM dba_hist_active_sess_history sh
WHERE sh.sample_time >= TRUNC ( SYSDATE ) + 2 / 24 AND sh.sample_time <= TRUNC ( SYSDATE ) + 4 / 24 AND sh.session_state = 'ON CPU' AND sh.force_matching_signature != 0 GROUP BY TRUNC ( sh.sample_time, 'HH' ) ,sh.force_matching_signature ,sh.top_level_sql_id
ORDER BY sum_cpu_time DESC
--fetch first 5 rows only
/
Chris
On Sun, Mar 18, 2018 at 10:52 AM, Nirav A Shah <shivam71_at_hotmail.com> wrote:
> Hi Experts,
>
>
> I have a question on what is the right approach to identify root cause of
> CPU spike from ASH report and will be thankful for review of the following
> , which is the approach that I have been using – and now I am getting some
> question/doubt on it:
>
> For the scope of this question, I am assuming that it is a specific sql
> that is the cause of the issue. Please consider this to be an assumption
> that is true for this analysis, that given that as a fact that it is indeed
> a sql that causing the CPU spike, how it should be identified from the ASH
> report.
>
> Also assume that we have required licence neeeded for AWR/ASH.
>
>
> We are on Oracle 11.2.0.4 on Linux. We have alerts setup for CPU and
> monitor for CPU spike when it goes above 70%. –When this happens I do the
> following:
>
> 1) Identify which database node had the spike (we have 4 node RAC). For
> that node, take a 10 minute ASH report e.g. if spike is at 5.04 pm then I
> take an ASH report from 5.00 pm to 5.10 pm.
>
> 2) Look at what is the top most one/two sql id in the “Top SQL with Top
> Events” section and consider this as the root cause of the CPU spike.
>
>
> Now I think that a better approach could have been:
>
> 1) Look at DBA_HIST_ACTIVE_SESS_HISTORY for that 10 minute period
> (basically the time frame of spike) and check which sql id was ‘ON CPU’
> during that time. 2) Among all such sql id that were ‘ON CPU’ one that
> present for highest number of times is the one that is the culprit.
>
>
> Again I am not 100% sure if this is the right approach but this is what I
> think as of now.
>
>
>
> With Regards,
>
> Nirav
>
>
-- http://www.freelists.org/webpage/oracle-lReceived on Sun Mar 18 2018 - 17:17:16 CET