Dominic Brooks

Subscribe to Dominic Brooks feed Dominic Brooks
Dominic Brooks on Oracle Performance, Tuning, Data Quality & Sensible Design ... (Now with added Sets Appeal)
Updated: 7 hours 16 min ago

Outer Join with OR and Lateral View Decorrelation

Wed, 2016-07-06 11:33

Use of ANSI SQL is a personal thing.

Historically I have not been a fan apart from where it makes things easier/possible.

This reticence was mainly due to optimizer bugs and limitations in the earlier days.

Recently I have been using it much more because I find that the developers I interact with prefer it / understand it better.

You might/should be aware that Oracle will rewrite ANSI SQL to an Oracle syntax representation, this transformation being listed in the optimizer trace file.

You might/should also be aware that Oracle outer join syntax does not allow OR or IN:

drop table t1;
drop table t2;

create table t1
as
select floor((rownum+1)/2) col1
,      case when mod(rownum,2) = 0 then 1 else 2 end col2
,      10 col3
from   dual
connect by rownum <= 20;

create table t2
as
select rownum col1
,      case when mod(rownum,2) = 0 then 2 else 1 end col3
from   dual
connect by rownum <= 10;

select *
from   t1
,      t2
where  t1.col1 = t2.col1 (+) 
and  ((t1.col2 = 1
and    t2.col3 (+) > t1.col3)
or    (t1.col2 = 2
and    t2.col3 (+) < t1.col3));

ORA-01719: outer join operator (+) not allowed in operand of OR or IN

ANSI SQL remedies this:

alter session tracefile_identifier = 'domlg1';
alter session set events 'trace[rdbms.SQL_Optimizer.*]';
select *
from   t1
left join t2
on    t1.col1 = t2.col1
and ((t1.col2 = 1
and   t2.col3 > t1.col3)
or   (t1.col2 = 2
and   t2.col3 < t1.col3));

alter session set events 'trace off';

But it comes at a price.

Note the execution plan:

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    20 |  1300 |    42   (0)| 00:00:01 |
|   1 |  NESTED LOOPS OUTER |      |    20 |  1300 |    42   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL | T1   |    20 |   780 |     2   (0)| 00:00:01 |
|   3 |   VIEW              |      |     1 |    26 |     2   (0)| 00:00:01 |
|*  4 |    TABLE ACCESS FULL| T2   |     1 |    26 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter("T1"."COL1"="T2"."COL1" AND ("T1"."COL2"=1 AND
              "T2"."COL3">"T1"."COL3" OR "T1"."COL2"=2 AND "T2"."COL3"<"T1"."COL3"))   

Now, maybe you will have better luck than me but no matter what I try I cannot change the NESTED LOOPS OUTER operation.
So, if that lateral view involves some full table scans or other significant operations, they might be very expense on the outer operation of a nested loop.

The reason is in the optimizer trace.

Query after View Removal
******* UNPARSED QUERY IS ********
SELECT "T1."COL1" "COL1", "T1."COL2" "COL2", "T1."COL3" "COL3", "VW_LAT_AE9E49E8"."ITEM_1_0" "COL1", "VW_LAT_AE9E49E8"."ITEM_2_1" "COL3" FROM "DOM"."T1" "T1", LATERAL( (SELECT "T2"."COL1" "ITEM_1_0", "T2"."COL3" "ITEM_2_1" FROM "DOM"."T2" "T2" WHERE "T1"."COL1"="T2"."COL1" AND ("T1"."COL2"=1 AND "T2"."COL3">"T1"."COL3" OR "T1"."COL2"=2 AND "T2"."COL3" < "T1"."COL3"))) (+) "VW_LAT_AE9E49E8"
DCL:Checking validity of lateral view decorrelation SEL$BCD4421C (#1)
DCL: Bypassed: view has non-well-formed predicate
DCL: Failed decorrelation validity for lateral view block SEL$BCD4421C (#1)

The OR prevents the decorrelation which seems to mean that we’re stuck with a NESTED LOOP for now.

Further Reading on ANSI:
Oracle Optimizer Blog
Jonathan Lewis on ANSI Outer
Jonathan Lewis on ANSI


INSERT ALL caveat

Thu, 2016-05-26 11:33

Why you might want to think twice about using INSERT ALL.

One of those things I knew and then forgot.

So, let’s say you’ve got three tables or a partitioned table or something like that.

Let’s use regional tables for simplicity.

drop table t1_r1;
drop table t1_r2;
drop table t1_r3;

create table t1_r1
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R1')));

create table t1_r2
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R2')));

create table t1_r3
(col1 varchar2(2) not null
,col2 number not null
,check( col1 in ('R3')));

insert into t1_r1 values ('R1',1);
insert into t1_r2 values ('R2',1);
insert into t1_r3 values ('R3',1);

commit;

And you want a routine that will insert into one of those tables depending on region.

And you’re a simple fellow, so you go with an IF statement:

create or replace procedure p1 (
  col1 in varchar2, 
  col2 in number
)
as
begin
  if col1 = 'R1'
  then
      insert into t1_r1 values(col1,col2);
  elsif col1 = 'R2'
  then
      insert into t1_r3 values(col1,col2);
  else 
      insert into t1_r3 values(col1,col2);
  end if;
end p1;
/

Procedure P1 compiled

And then in the same session you run this uncommitted:

exec p1('R1',2);

PL/SQL procedure successfully completed.

And then in another session you decide to truncate table T1_R3:

truncate table t1_r3;

Table T1_R3 truncated.

No problem.
None was expected.

However…

Let’s say that we decide to tidy up that procedure and get rid of some of the repetition by using an INSERT ALL statement.
I will use a standalone sql statement just to demonstrate a further minor aspect rather than using a procedure with a bound parameter.

insert all
when col1 = 'R1' then into t1_r1
when col1 = 'R2' then into t1_r2
when col1 = 'R3' then into t1_r3
select 'R1' col1,2 col2
from dual;

1 row inserted.

Let’s revisit the truncate:

truncate table t1_r3;

SQL Error: ORA-00054: resource busy and acquire with NOWAIT specified or timeout expired
00054. 00000 -  "resource busy and acquire with NOWAIT specified or timeout expired"
*Cause:    Interested resource is busy.
*Action:   Retry if necessary or increase timeout.

TM share locks from the INSERT ALL on all three possible targets prevent the TRUNCATE.

So, a simple/simplisitic illustration of why you might want to think twice about whether INSERT ALL is the best feature for your use case, based on a real life problem.


Side-effect of ALTER SESSION SET CURRENT_SCHEMA

Mon, 2016-04-18 11:11

For most of our database set-ups we use a different TEMP space for application users than for end-user/support/developer/reporting usage.
The intention is to minimise the potential impact of a rogue ad-hoc query on the main applicaiton.

However, turns out this is ineffective IF you use:

ALTER SESSION SET CURRENT_SCHEMA = main_app_schema;

This is documented but I wasn’t previously aware.
No surprise that this learning opportunity was presented as a direct result of a rogue query on a prod DB blowing out the main TEMP space.

Deduction quickly suggested that the above must be the case and it was quickly proven by a test case and also supported by the documentation .

the database uses the temporary tablespace of the specified schema for sorts, joins, and storage of temporary database objects

Obviously… not using ALTER SESSION and using fully qualified object names prevents this “vulnerability”… but that’s not good enough unfortunately.


Did it really fix it 2: Plan flip, unprofileable, baseline OFE round 2 and “stability”

Mon, 2015-11-16 12:31

Snappy title, huh?

Aka: Why a sql plan baseline may be no guarantee of stability.

The other day, a problematic plan flip…

Tangent 1:
Cue much discussion about plan flips etc.
My thoughts on stability are that the priority for most application owners is stability and predictability but that does not tally with the defaul CBO behaviour and potentially you have to turn off a helluva lot to even get close.

Tangent 2:
I have commented before about the common sensitive of many databases to concurrent direct path reads (or rather the sensitivity is on the underlying SAN).

Here is one such illustration caused by this particular plan flip.

See that our single block read times go out from a “normal” range of 5 – 20ms to an average of 50+ ms. At this point anything doing physical IO starts to notice.
(Needless to say, I’m not a fan of these “normal” IO times – I think they’re awful but the SLA of the tier 2 (non flash) SAN storage is 20ms so we don’t alert until we breach that.)
sbr

Source:

select snap_id, begin_time, end_time, round(average,2)
from   dba_hist_sysmetric_summary
where  metric_name     = 'Average Synchronous Single-Block Read Latency'
order by snap_id;

Back to reality
Anyway, standard immediate-term fix for a plan flip is to hunt down a better plan from AWR and fix it with a profile or a baseline.

Not a problem to find one for this SQL as it nearly always executed with the same decent plan.
(AWR history is much longer than this but it gives the picture)

select to_char(trunc(sn.end_interval_time),'DD-MON-YYYY') dt
,      st.sql_id
,      st.plan_hash_value
,      sum(rows_processed_delta) rws
,      sum(executions_delta)     execs
,      round(sum(elapsed_time_delta)/1000/1000)   elp
,      round(sum(elapsed_time_delta)/1000/1000/nvl(nullif(sum(executions_delta),0),1),2)   elpe
,      round(sum(iowait_delta)/1000/1000)         io
,      round(sum(cpu_time_delta)/1000/1000)       cpu
,      sum(buffer_gets_delta)    gets
,      round(sum(disk_reads_delta))         disk_reads
from   dba_hist_snapshot sn
,      dba_hist_sqlstat  st
where  st.snap_id            = sn.snap_id
and    sn.instance_number = st.instance_number
and    st.sql_id             IN ('6xrx006fmqbq2')
group by trunc(sn.end_interval_time), st.sql_id, st.plan_hash_value
order by trunc(sn.end_interval_time) desc, elp desc; 
DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

Problem plan is obvious, right?

Started during the day on the 9th, continued overnight until eventually noticed.

PHV 3871506300 does a FTS and gets direct path reads which causes our IO stress.

So, as mentioned, no problem, hint it with a sql profile via COE_XFR_SQL_PROFILE.sql (see Metalink note 215187.1)

But it didn’t work.

Not properly.

See top line.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

We didn’t get “good” plan PHV 3803735407, we avoided “nasty” plan PHV 3871506300 but we got not-quite-so-nasty-but-still-not-nice PHV 3353364157.

The SQL Profile was not able to reproduce the desired PHV.

But at least PHV was not doing the problematic direct path reads!

I took the hints from PHV 3803735407 via

select * from table(dbms_xplan.display_awr('6xrx006fmqbq2',3803735407,format=>'+OUTLINE'));

I double checked they matched the hints in the SQL Profile – they did.

I tried using the hints manually in a SQL statement and the result was consistent with the SQL Profile – PHV 3353364157.

At this point, the DBA suggested using a SQL Plan Baseline.

I resisted but eventually gave in.

Why would it make any difference?

When a plan cannot be reproduced then there are good reasons why a baseline is much, much safer than a profile.

A baseline must reproduce the desired PHV or it will be rejected completely.

A profile has no knowledge of PHV so it is possible that under certain circumstances the hints in a profile may be only partially applied.

Profile was dropped and baseline was created.

This was the immediate aftermath (top line)… Success!

And V$SQL was showing that the baseline was being used.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

I maintained that this didn’t make sense but the ticket was closed so I couldn’t get an SPM trace.

At this point a copy of the statement without profile and without baseline and without the manual fullset of hints was returning the original PHV 3803735407.

I went away that night and on the train I had a thought: “what if it wasn’t round 1 of the baseline which worked but round 2”.

Reminder of the baseline reproduction system

Round 2 is OFE only.

Doesn’t matter whether it is round 1 or round 2, if it’s reproduced it’s reproduced.

But if it was under OFE only then that was no guarantee of stability.

Then the following night, late night call, the baseline was no longer reproducing PHV 3803735407 but was back to the big problem PHV 3871506300!

Although I didn’t have an SPM trace, I’m adamant it verified my theory about it being round 2 only.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
11-NOV-2015 6xrx006fmqbq2      3871506300          0        471      80513     170.94      77936       2543  193096137  188992511 
11-NOV-2015 6xrx006fmqbq2      3803735407        212       9933        690        .07         14        586   97253038        154 
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465 

Note that at times of change, no database structures were being changed.
Stats will have been gathered, sure – that was the agent of change no doubt.
But no index maintenance was happening, nothing was going unusable, no partition maintenance was happening.
Nothing was changing which should mean that a particular plan was impossible.

This means that the set of hints for this particular plan was not deterministic or rather the hintset was surely incomplete in some way which mean that the optimizer was free to do certain transformations which meant that the hintset was then invalid? Not 100% sure, still to be looked at in more detail…

At this point we dropped the baseline and went with a cut-down sql profile which applied only two hints – to politely request index scans for the two problematic FTS in each of PHV 3871506300 and 3353364157.

And this worked and has been ok since (so far) but it will not be the longer term solution.

DT          SQL_ID        PLAN_HASH_VALUE        RWS      EXECS        ELP       ELPE         IO        CPU       GETS DISK_READS
----------- ------------- --------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
16-NOV-2015 6xrx006fmqbq2      3803735407         80       6887        348        .05         11        310   44574494         86 
13-NOV-2015 6xrx006fmqbq2      3803735407        114      13054        842        .06         19        713  117260543        482 
12-NOV-2015 6xrx006fmqbq2      3803735407        585      28074       1854        .07         12       1823  321890748       1983 
11-NOV-2015 6xrx006fmqbq2      3871506300          0        471      80513     170.94      77936       2543  193096137  188992511 
11-NOV-2015 6xrx006fmqbq2      3803735407        212       9933        690        .07         14        586   97253038        154 
10-NOV-2015 6xrx006fmqbq2      3871506300          5       1815     236211     130.14     226340       9993  757853211  745588219 
10-NOV-2015 6xrx006fmqbq2      3353364157         20       4485     114084      25.44          4     111246 2907305774        216 
10-NOV-2015 6xrx006fmqbq2      3803735407          4       1866        120        .06          1        116   17401768         30 
09-NOV-2015 6xrx006fmqbq2      3871506300          1       1232     156412     126.96     149660       6846  507571690  499305166 
09-NOV-2015 6xrx006fmqbq2      3803735407         53       6835        743        .11          3        723  190488567         41 
07-NOV-2015 6xrx006fmqbq2      3803735407         31       3079        230        .07          3        218   37385043         56 
06-NOV-2015 6xrx006fmqbq2      3803735407        166      18931       1200        .06         11       1128  180142678        484 
05-NOV-2015 6xrx006fmqbq2      3803735407        305       9904        553        .06          5        508   75239139         93 
04-NOV-2015 6xrx006fmqbq2      3803735407        160      18900       1089        .06          9       1027  150523728        204 
03-NOV-2015 6xrx006fmqbq2      3803735407        226      12048        859        .07         58        794  154111239        424 
02-NOV-2015 6xrx006fmqbq2      3803735407       1081      13327       1276         .1         66       1200  278129234        465

Never before seen an execution plan which had quite this problem interacting with SPM.

A test case to get to the root of the problem is still on the TODO list.

Something to do with a statement involving a WITH subquery and NO_MERGE
e.g.

WITH x AS (SELECT /*+ no_merge...)
SELECT 
FROM ...
WHERE EXISTS (SELECT
              FROM   x
              ...
              WHERE...);

Did it really fix it 1: OFE & nasty FIRST_ROWS plan on 11.2.0.4

Mon, 2015-11-16 10:50

Plan degradations on upgrade are normal.

This one’s no different.

On further investigation, turned out application was setting optimizer_mode = first_rows somewhere.

First point about this is that first_rows really shouldn’t be used either as a hint or as an optimizer_mode.

What does FIRST_ROWS mean?
From 11g doco:

The optimizer uses a mix of costs and heuristics to find a best plan for fast delivery of the first few rows.

If any sort of FIRST_ROWS type intervention is required, then it might be better to be using a more modern, fully-costed FIRST_ROWS_n (where n = 1 | 10 | 100 | 1000).
If it’s even required.

The 12c documentation is clearer about this:

FIRST_ROWS is available for backward compatibility and plan stability; use FIRST_ROWS_n instead.

But I don’t think we should attach too much credence to that statement not being in the 11g documentation.

Here’s the problem plan.

Pre upgrade, snippet of the plan looked as per below.
Please note position of T1.

-------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   1 |  SORT ORDER BY                              |                             |    34 | 13668 |  3003   (1)| 00:00:22 |       |       |
|   2 |   NESTED LOOPS OUTER                        |                             |    34 | 13668 |  3002   (1)| 00:00:22 |       |       |
|   3 |    NESTED LOOPS OUTER                       |                             |    34 | 12648 |  2968   (1)| 00:00:22 |       |       |
|   4 |     NESTED LOOPS OUTER                      |                             |    34 | 12274 |  2935   (1)| 00:00:22 |       |       |
|   5 |      NESTED LOOPS OUTER                     |                             |    34 | 11900 |  2867   (1)| 00:00:21 |       |       |
|   6 |       NESTED LOOPS OUTER                    |                             |    34 | 11628 |  2833   (1)| 00:00:21 |       |       |
|   7 |        NESTED LOOPS OUTER                   |                             |    34 | 10404 |  2799   (1)| 00:00:21 |       |       |
|   8 |         NESTED LOOPS OUTER                  |                             |    34 |  8228 |  2765   (1)| 00:00:20 |       |       |
|   9 |          NESTED LOOPS                       |                             |    34 |  6732 |  2731   (1)| 00:00:20 |       |       |
|  10 |           NESTED LOOPS                      |                             |    69 | 12282 |  2524   (1)| 00:00:19 |       |       |
|  11 |            NESTED LOOPS                     |                             |    69 |  9591 |  2386   (1)| 00:00:18 |       |       |
|* 12 |             TABLE ACCESS FULL               | T1                          |  1658 |   213K|   725   (3)| 00:00:06 |       |       |
|* 13 |             TABLE ACCESS BY INDEX ROWID     | T2                          |     1 |     7 |     1   (0)| 00:00:01 |       |       |
|* 14 |              INDEX UNIQUE SCAN              | T2_U1                       |     1 |       |     0   (0)| 00:00:01 |       |       |
|  15 |            TABLE ACCESS BY INDEX ROWID      | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 16 |             INDEX UNIQUE SCAN               | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|  17 |           TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |     1 |    20 |     3   (0)| 00:00:01 | ROWID | ROWID |
|* 18 |            INDEX RANGE SCAN                 | T4_N1                       |     1 |       |     2   (0)| 00:00:01 |       |       |

Post upgrade, plan has degraded to:

--------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                    | Name                        | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                             |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS OUTER                          |                             |    69 | 27738 |   508M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS OUTER                         |                             |    69 | 26979 |   508M  (3)|999:59:59 |       |       |
|   3 |    NESTED LOOPS OUTER                        |                             |    69 | 26220 |   508M  (3)|999:59:59 |       |       |
|   4 |     NESTED LOOPS OUTER                       |                             |    69 | 23184 |   508M  (3)|999:59:59 |       |       |
|   5 |      NESTED LOOPS OUTER                      |                             |    69 | 22632 |   508M  (3)|999:59:59 |       |       |
|   6 |       NESTED LOOPS OUTER                     |                             |    69 | 20562 |   508M  (3)|999:59:59 |       |       |
|   7 |        NESTED LOOPS OUTER                    |                             |    69 | 16146 |   508M  (3)|999:59:59 |       |       |
|   8 |         NESTED LOOPS                         |                             |    69 | 13662 |   508M  (3)|999:59:59 |       |       |
|   9 |          NESTED LOOPS                        |                             |  1675 |   312K|   508M  (3)|999:59:59 |       |       |
|  10 |           NESTED LOOPS                       |                             |   703K|    39M|  3476K  (1)| 06:54:55 |       |       |
|* 11 |            TABLE ACCESS BY GLOBAL INDEX ROWID| T4                          |   703K|    13M|  2068K  (1)| 04:06:54 | ROWID | ROWID |
|  12 |             INDEX FULL SCAN                  | T4_N2                       |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|  13 |            TABLE ACCESS BY INDEX ROWID       | T3                          |     1 |    39 |     2   (0)| 00:00:01 |       |       |
|* 14 |             INDEX UNIQUE SCAN                | T3_U1                       |     1 |       |     1   (0)| 00:00:01 |       |       |
|* 15 |           TABLE ACCESS FULL                  | T1                          |     1 |   132 |   718   (3)| 00:00:06 |       |       |

In the post-upgrade plan, the most pertinent part was step 15.

A FULL TABLE SCAN as the inner rowsource of a NESTED LOOP, in other words for every row in the driving rowsource do a full table scan.”

Standard reaction to such a degradation is to test the SQL with OPTIMIZER_FEATURES_ENABLE (OFE).

With OFE set either via OPT_PARAM hint or via ALTER SESSION then the SQL reverted to pre-upgrade plan.

This is good!

But… how much confidence can we have that this actually fixed the problem?

It would be great if we could find a bug which was definitively responsible for this change in execution plan.

OFE is an umbrella but it doesn’t turn off ALL fixes/optimizer features in a release.

Many of those fixes covered by the OFE umbrella are represented in V$SYSTEM_FIX_CONTROL

So, I went looking for anything which sounded related and went back to default 11.2.0.4 OFE and starting turning off the BUGNOs one-by-one – no difference, still a bad plan

Then still at OFE 11.2.0.4, I turned off all the BUGNOs…. no difference, still a bad plan.

At this point, my confidence in OFE as a fix for this problem was starting to waiver.

So, I tried to reduce the problem SQL to it’s smallest form representing the problem.

alter session set optimizer_mode = first_rows;
alter session set optimizer_features_enable = '11.2.0.4';

 explain plan for
 SELECT *
 FROM   T1
 ,      T3
 ,      T4
 WHERE  T1.COL1             = T3.COL1
 AND    T3.COL2             = T4.COL2
 AND    T1.DATE_COL        '+OUTLINE'));

This distilled statement reproduced the issue at the heart of the original statement:

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name    | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   1 |  NESTED LOOPS                        |         |  6029 |  4533K|  1612M  (3)|999:59:59 |       |       |
|   2 |   NESTED LOOPS                       |         |  2237K|   795M|  6551K  (1)| 13:01:53 |       |       |
|   3 |    TABLE ACCESS BY GLOBAL INDEX ROWID| T4      |  2240K|   188M|  2068K  (1)| 04:06:53 | ROWID | ROWID |
|   4 |     INDEX FULL SCAN                  | T4_N2   |  2240K|       |  7943   (1)| 00:00:57 |       |       |
|   5 |    TABLE ACCESS BY INDEX ROWID       | T3      |     1 |   285 |     2   (0)| 00:00:01 |       |       |
|*  6 |     INDEX UNIQUE SCAN                | T3_U1   |     1 |       |     1   (0)| 00:00:01 |       |       |
|*  7 |   TABLE ACCESS FULL                  | T1      |     1 |   397 |   718   (3)| 00:00:06 |       |       |
----------------------------------------------------------------------------------------------------------------

Is this test case a 100% indication that OFE wasn’t applicable to the original problem?
Not sure.
But at this point I had littleconfidence that OFE provided any sort of reliable fix to the original SQL statement.

I don’t know what the internal heuristics are/were for FIRST_ROWS.

But that is a stupid place for a full table scan regardless of the number of rows expected to return.

If we remove one of the T1 predicates above and the various row estimates go up then the FTS moves to a driving position.

I could/should get a 10053 trace but FIRST_ROWS shouldn’t be in use in the first place.

Solution – hunt down where the app code is doing the FIRST_ROWS thing and exterminate.

All the other possible OPTIMIZER_MODE modes gave a plan for the original SQL which started like this:

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                   | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                            |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   1 |  SORT ORDER BY                              |      |    35 | 14105 |  1386   (2)| 00:00:10 |       |       |
|   2 |   NESTED LOOPS OUTER                        |      |    35 | 14105 |  1385   (2)| 00:00:10 |       |       |
|*  3 |    HASH JOIN OUTER                          |      |    35 | 12565 |  1350   (2)| 00:00:10 |       |       |
|*  4 |     HASH JOIN OUTER                         |      |    35 | 11305 |  1315   (2)| 00:00:10 |       |       |
|   5 |      NESTED LOOPS OUTER                     |      |    35 |  9065 |  1280   (2)| 00:00:10 |       |       |
|   6 |       NESTED LOOPS OUTER                    |      |    35 |  8015 |  1245   (2)| 00:00:09 |       |       |
|   7 |        NESTED LOOPS OUTER                   |      |    35 |  7735 |  1210   (2)| 00:00:09 |       |       |
|   8 |         NESTED LOOPS OUTER                  |      |    35 |  7350 |  1176   (2)| 00:00:09 |       |       |
|   9 |          NESTED LOOPS                       |      |    35 |  6965 |  1106   (2)| 00:00:08 |       |       |
|  10 |           NESTED LOOPS                      |      |    69 | 12351 |   899   (3)| 00:00:07 |       |       |
|* 11 |            HASH JOIN                        |      |    69 |  9660 |   761   (3)| 00:00:06 |       |       |
|* 12 |             TABLE ACCESS FULL               | T2   |   237 |  1659 |    36   (6)| 00:00:01 |       |       |
|* 13 |             TABLE ACCESS FULL               | T1   |  1671 |   217K|   725   (3)| 00:00:06 |       |       |

So…. when optimizer upgrade problems strike, you should always try OFE.

But it’s also worth further investigation to confirm that OFE really is a reliable fix for the problem.


SQL Tuning: Thinking in Sets / How and When to be Bushy

Thu, 2015-11-05 11:48

Below is a SQL statement from a performance problem I was looking at the other day.

This is a real-world bit of SQL which has slightly simplified and sanitised but, I hope, without losing the real-worldliness of it and the points driving this article.

You don’t really need to be familiar with the data or table structures (I wasn’t) as this is a commentary on SQL structure and why sometimes a rewrite is the best option.

SELECT bd.trade_id
,      bdp.portfolio_id
,      bd.deal_id 
,      bd.book_id
,      pd.deal_status   prev_deal_status
FROM   deals            bd
,      portfolios       bdp
,      deals            pd
,      portfolios       pdp
-- today's data
WHERE  bd.business_date         = :t_date
AND    bd.src_bus_date          < :t_date
AND    bd.type                  = 'Trade'
AND    bdp.ref_portfolio_id     = bd.book_id
-- yesterday's data
AND    pd.business_date         = :y_date
AND    pd.type                  = 'Trade'
AND    pdp.ref_portfolio_id     = pd.book_id
-- some join columns
AND    bd.trade_id              = pd.trade_id
AND    bdp.portfolio_id         = pdp.portfolio_id;

There is no particular problem with how the SQL statement is written per se.

It is written in what seems to be a standard developer way.

Call it the “lay everything on the table” approach.

This is a common developer attitude:

“Let’s just write a flat SQL structure and let Oracle figure the best way out.”

Hmmm… Let’s look at why this can be a problem.

First, what is the essential requirement of the SQL?

Compare information (deal status) that we had yesterday for a subset of deals/trades

Something like that anyway…

So … What is the problem?

The Optimizer tends to rewrite and transform any SQL we give it anyway and tries to flatten it out.

The SQL above is already flat so isn’t that a good thing? Shouldn’t there be less work for the optimizer to do?

No, not necessarily. Flat SQL immediately restricts our permutations.

The problem comes with how Oracle can take this flat SQL and join the relevant row sources to efficiently get to the relevant data.

Driving Rowsource

Let’s assume that we should drive from today’s deal statuses (where we actually drive from will depend on what the optimizer estimates / costs).

SELECT ...
FROM   deals            bd
,      portfolio        bdp
...
-- today's data
WHERE  bd.business_date         = :t_date
AND    bd.src_bus_date          < :t_date
AND    bd.type                  = 'Trade'
AND    bdp.ref_portfolio_id     = bd.book_id
....
Where do we go from here?

We want to join from today’s deals to yesterdays deals.

But the data for the two sets of deals data is established via the two table join (DEALS & PORTFOLIOS).

thinking_in_Sets_venn

We want to join on TRADE_ID which comes from the two DEALS tables and PORTFOLIO_ID which comes from the two PORTFOLIOS tables.

SELECT ...
FROM   ...
,      deals            pd
,      portfolios       pdp
WHERE  ...
-- yesterday's data
AND    pd.business_date         = :y_date
AND    pd.type                  = 'Trade'
AND    pdp.ref_portfolio_id     = pd.book_id
...

And joined to via:

AND    bd.trade_id              = pd.trade_id
AND    bdp.portfolio_id         = pdp.portfolio_id

So from our starting point of today’s business deals, we can either go to PD or to PDP, but not to both at the same time.

Hang on? What do you mean not to both at the same time?

For any multi-table join involving more than two tables, the Optimizer evaluates the different join tree permutations.

Left-Deep Tree

Oracle has a tendency to choose what is called a left-deep tree.

If you think about a join between two rowsources (left and right), a left-deep is one where the second child (the right input) is always a table.

NESTED LOOPS are always left-deep.

HASH JOINS can be left-deep or right-deep (normally left-deep as already mentioned)

Zigzags are also possible, a mixture of left-deep and right-deep.

Below is an image of a left-based tree based on the four table join above.

left_tree

Here is an execution plan which that left-deep tree might represent:

---------------------------------------------------------------
| Id  | Operation                               | Name        |
---------------------------------------------------------------
|   0 | SELECT STATEMENT                      |               |
|   1 |  NESTED LOOPS                         |               |
|   2 |   NESTED LOOPS                        |               |
|   3 |    NESTED LOOPS                       |               |
|   4 |     NESTED LOOPS                      |               |
|*  5 |      TABLE ACCESS BY ROWID            | DEALS         |
|*  6 |       INDEX RANGE SCAN                | DEALS_IDX01   |
|   7 |      TABLE ACCESS BY INDEX ROWID      | PORTFOLIOS    |
|*  8 |       INDEX UNIQUE SCAN               | PK_PORTFOLIOS |
|*  9 |     TABLE ACCESS BY INDEX ROWID       | DEALS         |
|* 10 |      INDEX RANGE SCAN                 | DEALS_IDX01   |
|* 11 |    INDEX UNIQUE SCAN                  | PK_PORTFOLIOS |
|* 12 |   TABLE ACCESS BY INDEX ROWID         | PORTFOLIOS    |
---------------------------------------------------------------

Predicate Information (identified by operation id): 
--------------------------------------------------- 
 
   5 - filter("BD"."TYPE"='Trade' AND "BD"."SRC_BUS_DATE"<:t_date) 
   6 - access("BD"."BUSINESS_DATE"=:t_date) 
   8 - access("BD"."BOOK_ID"="BDP"."REF_PORTFOLIO_ID") 
   9 - filter(("BD"."TYPE"='Trade' AND "BD"."TRADE_ID"="PD"."TRADE_ID")) 
  10 - access("PD"."BUSINESS_DATE"=:y_date) 
  11 - access("PD"."BOOK_ID"="PDP"."REF_PORTFOLIO_ID") 
  12 - filter("BDP"."PORTFOLIO_ID"="PDP"."PORTFOLIO_ID")

Right-Deep Tree

A right-deep tree is one where the first child, the left input, is a table.

Illustration not specific to the SQL above:

right_tree

Bushy Tree

For this particular SQL, this is more what we are looking for:

bushy_tree

The essence of the problem is that we cannot get what is called bushy join, not with the original flat SQL.

The Optimizer cannot do this by default. And this isn’t an approach that we can get at by hinting (nor would we want to if we could, of course!).

Rewrite Required

To get this bushy plan, we need to rewrite our SQL to be more explicit around the set-based approach required.

WITH subq_curr_deal AS
     (SELECT /*+ no_merge */
             bd.trade_id
      ,      bd.deal_id
      ,      bd.book_id
      ,      bdp.portfolio_id
      FROM   deals      bd
      ,      portfolios bdp
      WHERE  bd.business_date         = :t_date
      AND    bd.src_bus_date          < :t_date
      AND    bd.type                  = 'Trade'
      AND    bdp.ref_portfolio_id     = bd.book_id)
,    subq_prev_deal AS
     (SELECT /*+ no_merge */
             pd.trade_id 
      ,      pd.deal_status
      ,      pdp.portfolio_id
      FROM   deals      pd
      ,      portfolios pdp
      WHERE  pd.business_date         = :y_date
      AND    pd.type                  = 'Trade'
      AND    pdp.ref_portfolio_id     = pd.book_id)
SELECT cd.trade_id
,      cd.portfolio_id
,      cd.deal_id
,      cd.book_id 
,      pd.deal_status prev_deal_status
FROM   subq_curr_deal cd
,      subq_prev_deal pd
WHERE  cd.trade_id             = pd.trade_id
AND    cd.portfolio_id         = pd.portfolio_id;
How exactly does the rewrite help?

By writing the SQL deliberately with this structure, by using WITH to create subqueries in conjunction with no_merge, we are deliberately forcing the bushy join.

This is an example execution plan that this bushy tree might represent.

----------------------------------------------------------
| Id  | Operation                        | Name          |
----------------------------------------------------------
|   0 | SELECT STATEMENT                 |               |
|*  1 |  HASH JOIN                       |               |
|   2 |   VIEW                           |               |
|   3 |    NESTED LOOPS                  |               |
|   4 |     NESTED LOOPS                 |               |
|*  5 |      TABLE ACCESS BY INDEX ROWID | DEALS         |
|*  6 |       INDEX RANGE SCAN           | DEALS_IDX01   |
|*  7 |      INDEX UNIQUE SCAN           | PK_PORTFOLIOS |
|   8 |     TABLE ACCESS BY INDEX ROWID  | PORTFOLIOS    |
|   9 |   VIEW                           |               |
|  10 |    NESTED LOOPS                  |               |
|  11 |     NESTED LOOPS                 |               |
|* 12 |      TABLE ACCESS BY INDEX ROWID | DEALS         |
|* 13 |       INDEX RANGE SCAN           | DEALS_IDX01   |
|* 14 |      INDEX UNIQUE SCAN           | PK_PORTFOLIOS |
|  15 |     TABLE ACCESS BY INDEX ROWID  | PORTFOLIOS    |
----------------------------------------------------------
 
Predicate Information (identified by operation id): 
--------------------------------------------------- 
                                 
   1 - access("CD"."TRADE_ID"="PD"."TRADE_ID" AND "CD"."PORTFOLIO_ID"="PD"."PORTFOLIO_ID")
   5 - filter(("BD"."TYPE"='Trade' AND "BD"."SRC_BUS_DATE"<:t_date)) 
   6 - access("BD"."BUSINESS_DATE"=:t_date ) 
   7 - access("BD"."BOOK_ID"="BDP"."REF_PORTFOLIO_ID") 
  12 - filter("PD"."TYPE"='Trade') 
  13 - access("PD"."BUSINESS_DATE"=:y_date) 
  14 - access("PDP"."REF_PORTFOLIO_ID"="PD"."BOOK_ID")
Is this a recommendation to go use WITH everywhere?

No.

What about the no_merge hint?

No.
The no_merge hint is a tricky one. This is not necessarily a recommendation but its usage here prevents the Optimizer from flattening. I often find it goes hand-in-hand with this sort of deliberately structured SQL for that reason, and similar goes for push_pred.

Do developers need to know about left deep, right deep and bushy?

No, not at all.

Takeaways?

It helps to think in sets and about what sets of data you are joining and recognise when SQL should be deliberately structured.

Further Reading

http://www.oaktable.net/content/right-deep-left-deep-and-bushy-joins
https://tonyhasler.wordpress.com/2008/12/27/bushy-joins/
https://www.toadworld.com/platforms/oracle/b/weblog/archive/2014/06/16/hitchhikers-guide-to-explain-plan-8
https://jonathanlewis.wordpress.com/2007/01/24/left-deep-trees/
https://antognini.ch/top/


What AWR isn’t telling you

Fri, 2015-10-16 14:08

It is well-known that AWR, and Statspack before, take snapshots of V$ views (or rather the underlying objects) to produce the data in AWR.

It is also well-known that, when considering sql and its statistics in the shared pool, if something big hitting happens but the big-hitter is no longer in the shared pool by the time of the snapshot, then it can’t be recorded in your AWR picture of activity.

But like many things supposedly well-known, it can still come back and remind you that you’ve forgotten or overlooked this behaviour.

Here is a little illustration which happened earlier this week.

This week I’ve been looking closely at activity which happens on a particular DB between 16:00 and 17:00 because the IO subsystem is showing signs of stress and the timings for ‘Average Synchronous Single-Block Read Latency’ (V$SYSMETRIC_SUMMARY / DBA_HIST_SYSMETRIC_SUMMARY) have pushed above our amber warning levels.

Don’t get me started but our amber level for a single block read is 20ms!
Way too high for my liking for such sustained average but that is apparently the SLA from the SAN.

Why do we have such an alert?
The purpose is two-fold.

  1. It can sometimes be a warning that some of our IO-sensitive batch work might show some performance degradations, threatening SLAs
  2. In the past, it has been an effective warning that a sql statement executed concurrently in multiple threads from the applications has switched to a "bad" execution plan whose impact is system-wide (normally heavy concurrent direct path reads).

So… I was hovering around this system during the relevant timeframe looking at what was going on.

And I observed some heavy hitting queries running in parallel originating from another system (i.e. not my application).

There were two executions of one sql id (0cu2sm062jutc) executing concurrently with one single execution of a another sql statement.

The top-level RTSM execution metrics for these three executions were all very similar to this:

Executing for about 10 minutes, in parallel doing a lot of physical direct path reads (38GB) and accumulating over 1 hour in parallel processing time.

 Status              :  DONE (ALL ROWS)          
 Instance ID         :  2                        
 SQL ID              :  0cu2sm062jutc            
 SQL Execution ID    :  33554432                 
 Execution Started   :  10/13/2015 16:03:35      
 First Refresh Time  :  10/13/2015 16:03:37      
 Last Refresh Time   :  10/13/2015 16:13:19      
 Duration            :  584s                     
 Module/Action       :  JDBC Thin Client/-       
 Program             :  JDBC Thin Client         
 Fetch Calls         :  257                      

Global Stats
===================================================================================================================================
| Elapsed |   Cpu   |    IO    | Application | Concurrency | Cluster  |  Other   | Fetch | Buffer | Read  | Read  | Write | Write |
| Time(s) | Time(s) | Waits(s) |  Waits(s)   |  Waits(s)   | Waits(s) | Waits(s) | Calls |  Gets  | Reqs  | Bytes | Reqs  | Bytes |
===================================================================================================================================
|    4041 |     252 |     3782 |        0.06 |        1.75 |     0.97 |     3.95 |   257 |     6M | 75051 |  38GB |  1070 | 125MB |
===================================================================================================================================

And then whilst I was fiddling with various queries against V$SQL the queries in question disappeared from V$SQL.

No big surprise of course, stuff is liable to be released from the library cache whenever it sees fit.

But then the remembrance penny hit home and I was left thinking that it was no wonder that I couldn’t see this regularly when looking retrospectively at this timeframe via AWR.

But this cumulative exection time above is larger than the number 3 SQL statement in AWR by elapsed time … so I’m not making this up. It’s significant.

This got me thinking – how can we try to identify heavy-hitting SQL which is being missed by AWR – ASH surely? – and what can we do to help – COLORED SQL?

Here is my first attempt at such an ASH query, trying to find queries which have taken longer than 5 minutes in a single execution in a single snapshot period but which aren’t in AWR.

SELECT * 
FROM   (SELECT h.dbid
        ,      h.instance_number
        ,      h.snap_id
        ,      h.sql_id
        ,      h.sql_exec_id
        ,      h.sql_exec_start
        ,      h.qc_instance_id||'::'||h.qc_session_id||'::'||h.qc_session_serial# px_details
        ,      COUNT(*)
        ,      COUNT(DISTINCT h.session_id)
        --,      MIN(h.snap_id) min_snap
        --,      MAX(h.snap_id) max_snap
        ,      MIN(h.sample_time)
        ,      MAX(h.sample_time)
        ,    ((EXTRACT(DAY    FROM (MAX(h.sample_time) - MIN(h.sample_time)))*86400)
            + (EXTRACT(HOUR   FROM (MAX(h.sample_time) - MIN(h.sample_time)))*3600)
            + (EXTRACT(MINUTE FROM (MAX(h.sample_time) - MIN(h.sample_time)))*60)
            + (EXTRACT(SECOND FROM (MAX(h.sample_time) - MIN(h.sample_time))))) duration
        FROM   dba_hist_active_sess_history h
        ,      dba_hist_snapshot            s
        WHERE  h.dbid            = s.dbid
        AND    h.snap_id         = s.snap_id
        AND    h.instance_number = s.instance_number
        AND    sql_id           IS NOT NULL
        AND    sql_exec_id      IS NOT NULL
        AND    NOT EXISTS (SELECT 1
                           FROM   dba_hist_sqlstat st
                           WHERE  st.dbid            = h.dbid
                           AND    st.snap_id         = h.snap_id --BETWEEN xxx.min_snap AND xxx.max_snap
                           AND    st.instance_number = h.instance_number
                           AND    st.sql_id          = h.sql_id)
        GROUP BY 
               h.dbid
        --
        ,      h.snap_id
        --
        ,      h.instance_number
        ,      h.sql_id
        ,      h.sql_exec_id
        ,      h.sql_exec_start
        ,      h.qc_instance_id
        ,      h.qc_session_id
        ,      h.qc_session_serial#) xxx
WHERE  duration > 600
ORDER BY snap_id DESC, duration DESC;

The main problem with this query is that DURATION does not tell us how long the query has been active in the database.

We could have a query which has been active all that time but was doing relatively little in the DB.

I had a quick look (again) at TM_DELTA_TIME and TM_DELTA_DB_TIME but they are still unreliable – in some of the rows it looks ok, in others not.

Let’s forget that and pretend that the SQL above works ok.

Perhaps we can add a limit WHERE the COUNT(*) > threshold because otherwise this is currently giving me 100s of statements in my production database.

That gives me a token 166 statements over the past 30 days.

And that’s a problem because my intention was to use this to identify SQL statements to add to AWR via ADD_COLORED_SQL.
And that has a limit of 100 statements.

Ah… but no, the other penny has just dropped whilst writing this…

If I add color this SQL, that will make no difference. That just means AWR will record that SQL if it’s there.

But if it’s not in the shared pool at snapshot time, it’s not in the shared pool so it’s not in AWR.

Duh!

It’s almost as if we need a process which if a sql statement has breached a certain threshold then it won’t let it be flushed until MMON has written it to AWR.

No, that’s not right. That would present other problems.

I could shorten the AWR snapshot interval but I’m not convinced.

I think this is an oft-overlooked yet significant shortcoming.

So, for the moment at least, I’m left with what I think is a problem and I can’t think of a good solution… can you?

Or are you thinking mountain vs molehill?


Quick Cross-Period AWR Comparison

Thu, 2015-10-15 04:43

Here’s a query which I find useful in order to have a very quick comparison across AWR snapshots of the high level time model statistics.
The numbers should match those in the associated section in the AWR report.

If you feel compulsed, obsessively, with tuning then you may see some blips here and there which then encourage you to dive into the AWR detail for that snapshot.

Or quite often I get in in the morning and there might be an email about slow overnight processes and generic “database has been slow” enquiries and before I start hunting around to prove or disprove the allegations, this can be a useful frame.

WITH subq_snaps AS
(SELECT dbid                dbid
 ,      instance_number     inst
 ,      snap_id             e_snap
 ,      lag(snap_id) over (partition by instance_number, startup_time order by snap_id) b_snap
 ,      TO_CHAR(begin_interval_time,'D') b_day
 ,      TO_CHAR(begin_interval_time,'DD-MON-YYYY HH24:MI') b_time
 ,      TO_CHAR(end_interval_time,'HH24:MI')   e_time
 ,    ((extract(day    from (end_interval_time - begin_interval_time))*86400)
     + (extract(hour   from (end_interval_time - begin_interval_time))*3600)
     + (extract(minute from (end_interval_time - begin_interval_time))*60)
     + (extract(second from (end_interval_time - begin_interval_time)))) duration
 FROM   dba_hist_snapshot)
SELECT ss.inst
,      ss.b_snap
,      ss.e_snap
,      ss.b_time
,      ss.e_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END)/1000000/60,2),'999999990.99')                                  db_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END)/(ss.duration*1000000),1),'999999990.99')        aas
,      (SELECT round(average,2)
        FROM   dba_hist_sysmetric_summary sm
        WHERE  sm.dbid            = ss.dbid
        AND    sm.snap_id         = ss.e_snap
        AND    sm.instance_number = ss.inst
        AND    sm.metric_name     = 'Average Synchronous Single-Block Read Latency'
        AND    sm.group_id        = 2)                                                                                                                   assbl
,      (SELECT round(average,2)
FROM   dba_hist_sysmetric_summary sm
WHERE  sm.dbid            = ss.dbid
AND    sm.snap_id         = ss.e_snap
AND    sm.instance_number = ss.inst
AND    sm.metric_name     = 'Host CPU Utilization (%)'
AND    sm.group_id        = 2)                                                                                                                   cpu_util
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'DB CPU' THEN em.value - bm.value END)/1000000,2),'999999990.99')                                      db_cpu
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'sql execute elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')                    sql_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'PL/SQL execution elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')               plsql_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'parse time elapsed' THEN em.value - bm.value END)/1000000,2),'999999990.00')                          parse_time
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'failed parse elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')                   failed_parse
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'hard parse (sharing criteria) elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')  hard_parse_sharing
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'RMAN cpu time (backup/restore)' THEN em.value - bm.value END)/1000000,2),'999999990.99')              rman_cpu
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'connection management call elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')     connection_mgmt
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'sequence load elapsed time' THEN em.value - bm.value END)/1000000,2),'999999990.99')                  sequence_load
,      TO_CHAR(ROUND(100*MAX(CASE WHEN bm.stat_name = 'DB CPU' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           db_cpu_perc
,      TO_CHAR(ROUND(100*MAX(CASE WHEN bm.stat_name = 'sql execute elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           sql_time_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'PL/SQL execution elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           plsql_time_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'parse time elapsed' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           parse_time_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'failed parse elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           failed_parse_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'hard parse (sharing criteria) elapsed time' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           hard_parse_sharing_perc
,      TO_CHAR(ROUND(MAX(CASE WHEN bm.stat_name = 'RMAN cpu time (backup/restore)' THEN em.value - bm.value END)
           / NULLIF(MAX(CASE WHEN bm.stat_name = 'DB time' THEN em.value - bm.value END),0),2),'999999990.99')                                           rman_cpu_perc
FROM  subq_snaps              ss
,     dba_hist_sys_time_model em
,     dba_hist_sys_time_model bm
WHERE bm.dbid                   = ss.dbid
AND   bm.snap_id                = ss.b_snap
AND   bm.instance_number        = ss.inst
AND   em.dbid                   = ss.dbid
AND   em.snap_id                = ss.e_snap
AND   em.instance_number        = ss.inst
AND   bm.stat_id                = em.stat_id
GROUP BY
       ss.dbid
,      ss.inst
,      ss.b_day
,      ss.b_snap
,      ss.e_snap
,      ss.b_time
,      ss.e_time
,      ss.duration
--HAVING b_day NOT IN (6,7)
--AND    inst = 2
--AND b_snap = 18673
--AND    e_time = '17:00'
ORDER BY b_snap DESC;

Also, similar query for comparing the different IO metrics.
Again should match numbers in AWR.

WITH subq_snaps AS
(SELECT dbid                dbid
 ,      instance_number     inst
 ,      snap_id             e_snap
 ,      lag(snap_id) over (partition by instance_number, startup_time order by snap_id) b_snap
 ,      TO_CHAR(begin_interval_time,'D') b_day
 ,      TO_CHAR(begin_interval_time,'DD-MON-YYYY HH24:MI') b_time
 ,      TO_CHAR(end_interval_time,'HH24:MI')   e_time
 ,    ((extract(day    from (end_interval_time - begin_interval_time))*86400)
     + (extract(hour   from (end_interval_time - begin_interval_time))*3600)
     + (extract(minute from (end_interval_time - begin_interval_time))*60)
     + (extract(second from (end_interval_time - begin_interval_time)))) duration
 FROM   dba_hist_snapshot)
,    io_stats AS
(SELECT ss.*
 ,      bv.event_name
 ,      ev.time_waited_micro_fg - bv.time_waited_micro_fg time_waited_micro
 ,      ev.total_waits_fg       - bv.total_waits_fg       waits
 FROM   subq_snaps            ss
 ,      dba_hist_system_event bv
 ,      dba_hist_system_event ev
 WHERE  bv.dbid                   = ss.dbid
 AND    bv.snap_id                = ss.b_snap
 AND    bv.instance_number        = ss.inst
 AND    bv.event_name            IN ('db file sequential read','direct path read','direct path read temp','db file scattered read','db file parallel read')
 AND    ev.dbid                   = ss.dbid
 AND    ev.snap_id                = ss.e_snap
 AND    ev.instance_number        = ss.inst
 AND    ev.event_id               = bv.event_id)
SELECT io.dbid
,      io.inst
,      io.b_snap
,      io.e_snap
,      io.b_time
,      io.e_time
,      (SELECT ROUND(average,2)
        FROM   dba_hist_sysmetric_summary sm
        WHERE  sm.dbid            = io.dbid
        AND    sm.snap_id         = io.e_snap
        AND    sm.instance_number = io.inst
        AND    sm.metric_name     = 'Average Synchronous Single-Block Read Latency'
        AND    sm.group_id        = 2) assbl
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END) single_waits
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END) multi_waits
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END) prefch_wait
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits  END) END) direct_waits
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END) END)  temp_waits
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_single
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) single_secs_total
,      MAX(CASE WHEN event_name = 'db file sequential read' THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) single_avg
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_multi
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) multi_secs_total
,      MAX(CASE WHEN event_name = 'db file scattered read'  THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) multi_avg
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_prefch
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) prefch_secs_total
,      MAX(CASE WHEN event_name = 'db file parallel read'   THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) prefch_avg
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits  END/duration) END) iops_direct
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) direct_secs_total
,      MAX(CASE WHEN event_name = 'direct path read'        THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) direct_avg
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND(CASE WHEN waits &lt; 0 THEN NULL ELSE waits END/duration) END) iops_temp
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND(CASE WHEN time_waited_micro/1000/1000 &lt; 0 THEN NULL ELSE time_waited_micro/1000/1000 END) END) temp_secs_total
,      MAX(CASE WHEN event_name = 'direct path read temp'   THEN ROUND((time_waited_micro/1000)/NULLif(waits,0)) END) temp_avg
FROM   io_stats io
GROUP BY
       io.dbid
,      io.inst
,      io.b_day
,      io.b_snap
,      io.e_snap
,      io.b_time
,      io.e_time
,      io.duration
HAVING b_day NOT IN (6,7)
AND    inst = 2
--AND b_snap = 18673
--AND    e_time = '17:00'
ORDER BY b_snap DESC;

Finally, one of the queries I used the most for quick checks – comparing top SQL in AWR across periods:

SELECT x.*, (SELECT sql_text from dba_hist_sqltext t where t.sql_id = x.sql_id and rownum = 1) txt
FROM (
SELECT sn.snap_id
,      TO_CHAR(sn.end_interval_time,'DD-MON-YYYY HH24:MI') dt
,      st.sql_id
,      st.instance_number
,      st.parsing_schema_name
,      st.plan_hash_value
,      SUM(st.fetches_delta) fch
,      SUM(rows_processed_delta) rws
,      SUM(executions_delta)     execs
,      ROUND(SUM(elapsed_time_delta)/1000/1000)   elp
,      ROUND(SUM(elapsed_time_delta)/1000/1000/NVL(NULLIF(SUM(executions_delta),0),1),2)   elpe
,      ROUND(SUM(cpu_time_delta)/1000/1000)       cpu
,      SUM(buffer_gets_delta)    gets
,      ROUND(SUM(iowait_delta)/1000/1000)         io
,      ROUND(SUM(clwait_delta)/1000/1000)         cl
,      ROUND(SUM(ccwait_delta)/1000/1000)         cc
,      ROUND(SUM(apwait_delta)/1000/1000)         ap
,      ROUND(SUM(plsexec_time_delta)/1000/1000)   pl
,      ROUND(SUM(disk_reads_delta))         disk_reads
,      ROUND(SUM(direct_writes_delta))        direct_writes
,      ROW_NUMBER() over (PARTITION BY sn.dbid, sn.snap_id, st.instance_number
                          ORDER BY SUM(elapsed_time_delta) desc) rn
FROM   dba_hist_snapshot sn
,      dba_hist_sqlstat  st
WHERE  st.dbid            = sn.dbid
AND    st.snap_id         = sn.snap_id
AND    sn.instance_number = st.instance_number
GROUP BY
       sn.dbid
,      sn.snap_id
,      sn.end_interval_time
,      st.sql_id
,      st.instance_number
,      st.parsing_schema_name
,      st.plan_hash_value
) x
WHERE rn &lt;= 5
ORDER by snap_id DESC, instance_number, rn;

Let me know if you spot any errors or anomolies or obvious improvements / additions.


SQL Plan Management Choices

Tue, 2015-08-11 04:02

My thoughts on SQL plan management decision points: SPM SQL Patches are also available, primarily to avoid a specific problem not to enforce a particular plan, and are not covered in the above flowchart.


Epoch

Fri, 2015-06-26 06:15

Note to self because it’s just one of those date/timezone-related topics which just doesn’t seem to stick…

Epoch/Unix time – See https://en.wikipedia.org/wiki/Unix_time

Unix time (also known as POSIX time or erroneously as Epoch time) is a system for describing instants in time, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970,

Firstly when converting from Oracle date or timestamp – we need to work from UTC not local time.

select systimestamp
,      ((extract( day    from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
      + (extract( hour   from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
      + (extract( minute from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
      + (round(extract( second from systimestamp - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
        now_epoch
from dual;

SYSTIMESTAMP                         NOW_EPOCH
----------------------------------- -------------
26-JUN-15 11.57.09.634813000 +01:00 1435319830000 

If we plug that epoch into somewhere like http://www.epochconverter.com/ , we can see it’s wrong – it’s 1 hour ahead of where it should be:

Assuming that this timestamp is in milliseconds:
GMT: Fri, 26 Jun 2015 11:57:10 GMT
Your time zone: 26 June 2015 12:57:10 GMT+1:00 DST

That’s because we need to work in UTC using SYS_EXTRACT_UTC, i.e.

with now as
(select systimestamp                  now_ts
 ,      sys_extract_utc(systimestamp) now_utc
 ,      ((extract( day    from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
       + (extract( hour   from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
       + (extract( minute from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
       + (round(extract( second from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
         now_epoch
 from dual)
select *
from   now;

NOW_TS                              NOW_UTC             NOW_EPOCH
----------------------------------- ------------------ -------------
26-JUN-15 12.03.35.231688000 +01:00 26-JUN-15 11.03.35 1435316626000 

Better!

This came about because there is a table storing epoch/unix time format which is originating from Java code and the developer said that the conversion was losing 1 hour and speculated that the DB might be “unsafe” when dealing with epoch time.

So, let’s convert this number back to a date or a timestamp and crush that notion.

with now as
(select systimestamp                  now_ts
 ,      sys_extract_utc(systimestamp) now_utc
 ,      ((extract( day    from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
       + (extract( hour   from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
       + (extract( minute from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
       + (round(extract( second from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
         now_epoch
 from dual)
select now_ts
,      now_utc
,      now_epoch
,      TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND') 
       epoch_back_to_ts
from   now;

NOW_TS                              NOW_UTC             NOW_EPOCH    EPOCH_BACK_TO_TS 
----------------------------------- ------------------ ------------- ------------------
26-JUN-15 12.09.45.671605000 +01:00 26-JUN-15 11.09.45 1435316986000 26-JUN-15 11.09.46 

Our conversion back is still in UTC so we need to convert, and there are numerous ways that we might want to convert that back:

with now as
(select systimestamp                  now_ts
 ,      sys_extract_utc(systimestamp) now_utc
 ,      ((extract( day    from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*24*60*60)
       + (extract( hour   from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60*60)
       + (extract( minute from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY'))*60)
       + (round(extract( second from sys_extract_utc(systimestamp) - TO_TIMESTAMP('01/01/1970', 'MM/DD/YYYY')))))*1000
         now_epoch
 from dual)
select now_ts
,      now_utc
,      now_epoch
,      TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND') 
       epoch_back_to_utc
       ,      CAST(
       FROM_TZ(TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND'),'UTC')
            AT TIME ZONE 'Europe/London' AS TIMESTAMP) 
       back_to_name
,      CAST(
       FROM_TZ(TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND'),'UTC')
            AT LOCAL AS TIMESTAMP) 
       back_to_local
,      CAST(
       FROM_TZ(TO_TIMESTAMP('01-01-1970','DD-MM-YYYY') + NUMTODSINTERVAL(now_epoch/1000,'SECOND'),'UTC')
            AT TIME ZONE DBTIMEZONE AS TIMESTAMP) 
       back_to_dblocal
FROM   now;

NOW_TS                              NOW_UTC             NOW_EPOCH     EPOCH_BACK_TO_UTC  BACK_TO_NAME       BACK_TO_LOCAL      BACK_TO_DBLOCAL  
----------------------------------- ------------------ -------------- ------------------ ------------------ ------------------ ------------------
26-JUN-15 12.12.23.936868000 +01:00 26-JUN-15 11.12.23 1435317144000  26-JUN-15 11.12.24 26-JUN-15 12.12.24 26-JUN-15 12.12.24 26-JUN-15 12.12.24