RE: Missing SQL in DBA_HIST_SQLSTAT

From: Herring Dave - dherri <Dave.Herring_at_acxiom.com>
Date: Tue, 2 Nov 2010 11:04:24 -0500
Message-ID: <7ED53A68952D3B4C9540B4EFA5C76E36087C2FD8_at_CWYMSX04.Corp.Acxiom.net>



SELECT topnsql
  FROM sys.wrm$_wr_control;

           TOPNSQL


        2000000000

As an example of what I'm talking about:

SELECT inst_id, plan_hash_value, first_load_time, MAX(last_active_time)   FROM cgv$sql
 WHERE sql_id = '8qfp7bf2tcw57'
 GROUP BY inst_id, plan_hash_value, first_load_time  ORDER BY inst_id;

I PLAN_HASH_VALUE FIRST_LOAD_TIME MAX(LAST_ACTIVE_TI

- ------------------ ------------------- ------------------
1         2586770207 2010-10-10/15:09:48 02-NOV-10 11:39:49
2         2586770207 2010-10-10/15:19:15 02-NOV-10 08:24:58

SELECT sq.instance_number, sq.plan_hash_value, MAX(end_interval_time)   FROM dba_hist_sqlstat sq, dba_hist_snapshot s  WHERE sq.sql_id = '8qfp7bf2tcw57'

   AND (    sq.dbid = s.dbid
        AND sq.instance_number = s.instance_number
        AND sq.snap_id = s.snap_id) 

 GROUP BY sq.instance_number, sq.plan_hash_value  ORDER BY 1, 2;    INSTANCE_NUMBER PLAN_HASH_VALUE MAX(END_INTERVAL_TIME)
------------------ ------------------ ----------------------------------------
                 1          690739501 22-OCT-10 04.00.17.953 AM
                 1         2586770207 22-OCT-10 04.00.17.953 AM
                 2          690739501 03-SEP-10 04.30.42.593 PM
                 2         2586770207 03-SEP-10 04.30.42.593 PM

Dave Herring  | DBA
Acxiom Global Technology Solutions   

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax 1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

-----Original Message-----

From: Petr Novak [mailto:Petr.Novak_at_trivadis.com] Sent: Tuesday, November 02, 2010 10:32 AM To: Herring Dave - dherri; Teehan, Mark; ORACLE-L Subject: AW: Missing SQL in DBA_HIST_SQLSTAT

Hallo Dave,

in Metalink ist 10.2.0.4 Bug 8484269 , fixed in 11.2 . Topnsql must be in range (30,50000), otherwise you get ORA-13530. The question is how TOPNSQL setting works under 11.2. Where do you find out, that default is 2 billion ? It does not make sense trying to catch so much statements (if you would have such v$SQL). Maybe such big value is ignored/corrected internally.

Best Regards,
Petr



Von: Herring Dave - dherri [Dave.Herring_at_acxiom.com] Gesendet: Dienstag, 2. November 2010 16:14 Bis: Petr Novak; Teehan, Mark; ORACLE-L
Betreff: RE: Missing SQL in DBA_HIST_SQLSTAT

True, smaller statements may not be as important, but per my original post, why would they be skipped if the default for TOPNSQL is 2 billion?

I got into this research because I was looking into SQL statements suffering from bind variable peeking and skewed values. I wanted to check AWR history to see the frequency of different plans chosen for some statements and the average resource impact per SQL_ID + PLAN_HASH_VALUE. In reviewing the results I saw that some days certain plans weren't chosen at all. Those were for recent entries and it just so happened that those entries were still in the shared pool, even after a few days (because they're executed multiple times per day and haven't been invalidated). Because of this, AWR won't help because it's inaccurate, which sent me down this whole path.

Dave Herring | DBA
Acxiom Global Technology Solutions

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax 1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

-----Original Message-----

From: Petr Novak [mailto:Petr.Novak_at_trivadis.com] Sent: Tuesday, November 02, 2010 10:01 AM To: Herring Dave - dherri; Teehan, Mark; ORACLE-L Subject: AW: Missing SQL in DBA_HIST_SQLSTAT

Hallo Dave,

I executed your query and checked one of the returned SQL_IDs.

SQL> select FIRST_LOAD_TIME,EXECUTIONS,LAST_LOAD_TIME,LAST_ACTIVE_TIME ,PARSING_SCHEMA_NAME,CHILD_NUMBER,PLAN_HASH_VALUE,OPTIMIZER_ENV_HASH_VALUE from v$sql where sql_id='cjkmnxsvny74t';

FIRST_LOAD_TIME EXECUTIONS LAST_LOAD_TIME LAST_ACTIVE_TIME PARSING_SCHEMA_NAME CHILD_NUMBER PLAN_HASH_VALUE OPTIMIZER_ENV_HASH_VALUE
------------------- -------------- ------------------- ------------------- ------------------------------ -------------- --------------- ------------------------

2010-09-01/05:00:29             45 2010-11-02/14:55:50 02.11.2010 14:55:49 TMU                                         0      1721219251               2180547739
2010-09-01/05:00:29              7 2010-11-01/05:00:48 01.11.2010 05:07:01 TMU                                         1      1721219251               2180547739

AWR reports only one interval for this query:

End Plan Hash Optim Env CPU Elapsed IOTime AppTime Cluster Concurr Execs Loa PaC Ela/Ex Buffer/Ex DReads/Ex Rows/Ex Prf
----------- ------------- ----------- ---------- ---------- --------- -------- -------- -------- -------- ---- ------ ----------- ------------- ----------- ----------- ---
25.10 11:00 1721219251 2180547739 .01 .05 .0 .0 .0 .0 3 0 3 .02 20.0 14.0 3.0

I thing it is natural, that AWR does not capture every statement executed in the interval,even if the statement was not aged out from V$SQL. Such statements are very 'small' , concerning resource consumption. I didnt try to play with TOPNSQL, but you could check the maximum count of captured statements for snapshot and instance, if TOPNSQL has any impact. It is not important how many statements are captured, but how big percentage of DB load (reads,gets,writes,CPU) you can explain with captured statements.

Best Regards,
Petr



Von: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org]&quot; im Auftrag von &quot;Herring Dave - dherri [Dave.Herring_at_acxiom.com] Gesendet: Dienstag, 2. November 2010 14:37 Bis: Teehan, Mark; ORACLE-L
Betreff: RE: Missing SQL in DBA_HIST_SQLSTAT

Mark,

Thanks for the detailed response!

I can understand issues where SQL is aged out/invalidated inbetween AWR snapshots, causing statements to be missed. But I'm finding SQL statements having changes in executions ACROSS snapshots, which is where I'm confused.

Here's one of the SQL statements I'm using to try and determine the extent of this problem. It'll list statements whose last active time is greater than then date of the last AWR snapshot + snapshot interval, meaning it should have been captured but hasn't:

SELECT sql_id, TO_CHAR(awr_last_active_dt, 'DD-MON-YY HH24:MI:SS') chr_awr_last_active_dt

     , TO_CHAR(last_snapshot_dt, 'DD-MON-YY HH24:MI:SS') chr_last_snapshot_dt
     , shared_pool_last_active_dt
  FROM (SELECT MAX(ss.end_interval_time) awr_last_active_dt
             , sq.sql_id, snap_interval, ss2.last_snapshot_dt, s.shared_pool_last_active_dt
          FROM dba_hist_sqlstat sq
             , dba_hist_snapshot ss
               /* Get the date of the most recent snapshot */
             , (SELECT MAX(begin_interval_time) last_snapshot_dt
                  FROM dba_hist_snapshot) ss2
               /* Get the date of LAST_ACTIVE_TIME for every SQL stmt in the shared pool */
             , (SELECT sql_id, plan_hash_value
                     , MAX(last_active_time) shared_pool_last_active_dt
                  FROM v$sql
                 GROUP BY sql_id, plan_hash_value) s
               /* Get the SNAP_INTERVAL, to help determine the offset for the most recent
                  possible LAST_ACTIVE_TIME */
             , (SELECT snap_interval FROM dba_hist_wr_control)
         WHERE (    sq.snap_id = ss.snap_id
                AND sq.instance_number = ss.instance_number
                AND sq.dbid = ss.dbid)
           AND (    sq.sql_id = s.sql_id
                AND sq.plan_hash_value = s.plan_hash_value)
         GROUP BY sq.sql_id, snap_interval, ss2.last_snapshot_dt, s.shared_pool_last_active_dt)
 WHERE LEAST(shared_pool_last_active_dt, last_snapshot_dt) - snap_interval > awr_last_active_dt  ORDER BY LEAST(shared_pool_last_active_dt, last_snapshot_dt) - snap_interval -

          awr_last_active_dt DESC;

Dave Herring | DBA
Acxiom Global Technology Solutions

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax 1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com

-----Original Message-----

From: Teehan, Mark [mailto:mark.teehan_at_credit-suisse.com] Sent: Tuesday, November 02, 2010 12:53 AM To: Herring Dave - dherri; ORACLE-L
Subject: RE: Missing SQL in DBA_HIST_SQLSTAT

Dave,
I've noticed that the quality of AWR data collection varies a lot depending on the application profile.

I am not an expert on database internals ("Tanel!!") but this is what I have worked out; my problems in this area are due to flushing of the sql area. As you can see the metrics in gv$sqlstat; then it may not be relevant.

A database hosting an application that executes most of its workload during a batch window seems to have lower quality of AWR data collection during its busiest period. One major reason for this is aging of single-execution statements. The SQL Area sizes automatically based on the workload of the entire day, so when a large burst of statements is run (= overnight batch cycle), the SQL area will not resize up immediately (if at all), and single-execution statements will be aged out first before the next AWR snap occurs. I've seen this happen a lot - the statement metrics are no longer visible in gv$sqlstat at snap-time so it never goes into AWR.

Aging of metrics for statements is only a problem if the database needs to flush the sql area: it can be flushed by other activity. Single-execution statements appear to be chosen first, irrespective of how long they ran for - AFAIK no attempt is made to classify which single-execution statements may still have important metrics.

A flush offender is a stats gather on objects with a lot of column statistics: it generates a large number of recursive statements for each column (select min(), max() etc) in a short span, which can flush a lot of metrics from the sql area. The AWR partition rolling job can also generate a lot of statements - this generally runs once a day.

I've raised an SR to find out if there is any way to keep statements in the SQL area until they have been snapped into AWR, but support told me to go away, its supposed to be like that. It is not possible to directly set the size of the sql area to improve metric retention. Statement pinning doesn't work either: the text is kept, but the metrics (gv$sqlstat) are flushed! I hit a brick wall on this. It seems a design flaw to me - the sql area flush process should assess the metrics for each statement to see if it is eligible for AWR retention. Not trivial I know, but a lot depends on AWR quality.

Interesting that you have seen this on a 4-node RAC - the worst offender I have is a 3-node RAC - is there any chance AWR is not picking up metrics across all instances?

HTH
Mark Teehan
Singapore

-----Original Message-----

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Herring Dave - dherri Sent: 26 October 2010 06:08
To: ORACLE-L
Subject: Missing SQL in DBA_HIST_SQLSTAT

Has anyone checked into the accuracy and reliability of AWR SQL stats? From checks I've done it appears that a lot of SQL statements are not getting captured in AWR snapshots.

Here's our config: 4-node RAC, Oracle 10.2.0.2, RH 4.x, 30-minute AWR snapshots.

We've left TOPNSQL as "DEFAULT", which in our case should be enough:

SELECT topnsql
  FROM sys.wrm$_wr_control;

           TOPNSQL


        2000000000

This means that the first 2 BILLION SQL statements per category will be selected (seems rather excessive!). Yet I ran a few checks where I compared (GV$SQL.LAST_ACTIVE_TIME - snapshot interval) to the MAX(end_interval_time) for snapshots (all by SQL_ID) and found over 1600 statements that are executed yet no updates are captured by AWR.

As a double check, I reviewed data in DBA_HIST_ACTIVE_SESS_HISTORY and found that 1 of the 1600 statements from above is executed multiple times per day, yet the last DBA_HIST_SQLSTAT snapshot it's in is from 4 months ago.

Has anyone else seen this type of thing and/or done detailed checking on the accuracy of AWR SQL data?

BTW, I tried tracing the snapshot activity and can't get past what's done to populate internal objects X$KEWRSQLIDTAB and X$KEWRSQLCRIT.

Dave Herring | DBA
Acxiom Global Technology Solutions

630-944-4762 office | 630-430-5988 cell | 630-944-4989 fax 1501 Opus Pl | Downers Grove, IL, 60515 | U.S.A. | www.acxiom.com Service Desk: 888-243-4566, https://servicedesk.acxiom.com, GSCA_at_DNB.com



The information contained in this communication is confidential, is intended only for the use of the recipient named above, and may be legally privileged.

If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited.

If you have received this communication in error, please resend this communication to the sender and delete the original message or any copy of it from your computer system.

Thank You.


--
http://www.freelists.org/webpage/oracle-l



===============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
===============================================================================

--
http://www.freelists.org/webpage/oracle-l


--
http://www.freelists.org/webpage/oracle-l
Received on Tue Nov 02 2010 - 11:04:24 CDT

Original text of this message