Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 6 hours 7 min ago

Undo Understood

Fri, 2017-04-21 04:45

It’s hard to understand all the ramifications of Oracle’s undo handling, and it’s not hard to find cases where the resulting effects are very confusing. In a recent post on the OTN database forum resulted in one response insisting that the OP was obviously updating a table with frequent commits from one session while querying it from another thereby generating a large number of undo reads in the querying session.

It’s a possible cause of the symptoms that had been described – although not the only possible cause, especially since the symptoms hadn’t been described completely. It’s actually possible to see this type of activity when there are no updates and no outstanding commits taking place at all on the target table. Unfortunately it’s quite hard to demonstrate this with a quick, simple, script in recent versions of Oracle unless you do some insanely stupid things to make the problem appear – but I know how to do “insanely stupid” in Oracle, so here we go; first some data creation:

rem
rem     Script:         undo_rec_apply_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017
rem

create table t2(v1 varchar2(100));
insert into t2 values(rpad('x',100));
commit;

create table t1
nologging
pctfree 99 pctused 1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 8e4 -- > comment to bypass WordPress formatting issue
;

alter table t1 add constraint t1_pk primary key(id)
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

The t2 table is there as a target for a large of updates from a session other than the one demonstrating the problem. The t1 table has been defined and populated in a way that puts one row into each of 80,000 blocks (though, with ASSM and my specific tablespace definition of uniform 1MB extents, the total space is about 80,400 blocks). I’ve got a primary key declaration that allows me to pick single rows/blocks from the table if I want to.

At this point I’m going to do a lot of updates to the main table using a very inefficient strategy to emulate the type of thing that can happen on a very large table with lots of random updates and many indexes to maintain:


begin
        for i in 1..800 loop
                update t1 set v1 = upper(v1) where id = 100 * i;
                execute immediate 'alter system switch logfile';
                execute immediate 'alter system flush buffer_cache';
                commit;
                dbms_lock.sleep(0.01);
        end loop;
end;
/

set transaction read only;

I’m updating every 100th row/block in the table with single row commits, but before each commit I’m switching log files and flushing the buffer cache.

This is NOT an experiment to try on a production system, or even a development system if there are lots of busy developers or testers around – and if you’re running your dev/test in archivelog mode (which, for some of your systems you should be) you’re going to end up with a lot of archived redo logs. I have to do this switch to ensure that the updated blocks are unpinned so that they will be written to disc and flushed from the cache by the flush buffer cache. (This extreme approach would not have been necessary in earlier versions of Oracle, but the clever developers at Oracle Corp. keep adding “damage limitation” touches to the code that I have to work around to create small tests.) Because the block has been flushed from memory before the commit the session will record a “commit cleanout failures: block lost” on each commit. By the time this loop has run to completion there will be 800 blocks from the table on disc needing a “delayed block cleanout”.

Despite the extreme brute force I use in this loop, there is a further very important detail that has to be set before this test will work (at least in 11.2.0.4, which is what I’ve used in my test runs). I had to start the database with the hidden parameter _db_cache_pre_warm set to false. If I don’t have the database started with this feature disabled Oracle would notice that the buffer cache had a lot of empty space and would “pre-warm” the cache by loading a few thousand blocks from t1 as I updated one row – with the side effect that the update from the previous cycle of the loop would be cleaned out on the current cycle of the loop. If you do run this experiment, remember to reset the parameter and restart the instance when you’ve finished.

I’ve finished this chunk of code with a call to “set transaction read only” – this emulates the start of a long-running query: it captures a point in time (through the current SCN) and any queries that run in the session from now on have to be read-consistent with that point in time. After doing this I need to use a second session to do a bit of hard work – in my case the following:

execute snap_rollstats.start_snap

begin
        for i in 1..10000 loop
                update t2 set v1 = upper(v1);
                update t2 set v1 = lower(v1);
                commit;
        end loop;
end;
/

execute snap_rollstats.end_snap

The calls to the snap_rollstats package simply read v$rollstat and give me a report of the changes in the undo segment statistics over the period of the loop. I’ve executed 10,000 transactions in the interval, which was sufficient on my system to use each undo segment header at least 1,000 times and (since there are 34 transaction table slots in each undo segment header) overwrite each transaction table slot about 30 times. You can infer from these comments that I have only 10 undo segments active at the time, your system may have many more (check the number of rows in v$rollstat) so you may want to scale up that 10,000 loop count accordingly.

At this point, then, the only thing I’ve done since the start of my “long running query” is to update another table from another session. What happens when I do a simple count() from t1 that requires a full tablescan ?

alter system flush buffer_cache;

execute snap_filestat.start_snap
execute snap_my_stats.start_snap

select count(v1) from t1;

execute snap_my_stats.end_snap
execute snap_filestat.end_snap

I’ve flushed the buffer cache to get rid of any buffered undo blocks – again an unreasonable thing to do in production but a valid way of emulating the aging out of undo blocks that would take place in a production system – and surrounded my count() with a couple of packaged call to report the session stats and file I/O stats due to my query. (If you’re sharing your database then the file I/O stats will be affected by the activity of other users, of course, but in my case I had a private database.)

Here are the file stats:


--------------
Datafile Stats
--------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max
File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------    ---      ------      ------   ---------    ---
    1          17          17       1.000        .065          17        .065           0        .000      6           0           0        .000     15
/u01/app/oracle/oradata/TEST/datafile/o1_mf_system_938s4mr3_.dbf
    3         665         665       1.000        .020         665        .020           0        .000      6           0           0        .000     15
/u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         631      80,002     126.786        .000           2        .045         629        .000      6           0           0        .000     17
/u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__cz1w7tz1_.dbf

As expected I’ve done a number of multiblock reads of my data tablespace for a total of roughly 80,000 blocks read. What you may not have expected is that I’ve done 665 single block reads of the undo tablespace.

What have I been doing with all those undo blocks ? Check the session stats:


Session stats
-------------
Name                                                                     Value
----                                                                     -----
transaction tables consistent reads - undo records applied              10,014
transaction tables consistent read rollbacks                                10

We’ve been reading undo blocks so that we can create read-consistent copies of the 10 undo segment headers that were active in my instance. We haven’t (and you’ll have to trust me on this, I can’t show you the stats that aren’t there!) reported any “data blocks consistent reads – undo records applied”.

If you want to see a detailed explanation of what has happened you’ll need to read Oracle Core (UK source), chapter 3 (and possibly chapter 2 to warm yourself up for the topic). In outline the following type of thing happens:

  • Oracle gets to the first block updated in t1 and sees that there’s an ITL (interested transaction list) entry that hasn’t been marked as committed (we flushed the block from memory before the commit cleanout could take place so the relevant transaction is, apparently, still running and the row is still marked as locked).
  • Let’s say the ITL entry says the transaction was for undo segment 34, transaction table slot 11, sequence 999. Oracle reads the undo segment header block for undo segment 34 and checks transaction table slot 11, which is now at sequence 1032. Oracle can infer from this that the transaction that updated the table has committed – but can’t yet know whether it committed before or after the start of our “long running query”.
  • Somehow Oracle has to get slot 11 back to sequence 999 so that it can check the commit SCN recorded in the slot at that sequence number. This is where we see “undo records applied” to make the “transaction table read consistent”. It can do this because the undo segment header has a “transaction control” section in it that records some details of the most recent transaction started in that segment. When a transaction starts it updates this information, but saves the old version of the transaction control and the previous version of its transaction table slot in its first undo record, consequently Oracle can clone the undo segment header block, identify the most recent transaction, find its first undo record and apply it to unwind the transaction table information. As it does so it has also wound the transaction control section backwards one step, so it can use that (older) version to go back another step … and so on, until it takes the cloned undo segment header so far back that it takes our transaction table slot back to sequence 999 – and the job is done, we can now check the actual commit SCN.  (Or, if we’re unlucky, we might receive an ORA-01555 before we get there)

So – no changes to the t1 table during the query, but lots of undo records read because OTHER tables have been changing.

Footnote:

In my example the tablescan used direct path reads – so the blocks that went through delayed block cleanout were in private memory, which means they weren’t in the buffer cache and didn’t get written out to disc. When I flushed the buffer cache (again to emulate aging out of undo blocks etc.) and repeated the tablescan Oracle had to go through all that work of creating read consistent transaction tables all over again.

Footnote 2:

Despite the grand claim implied by the title I hope everyone realises that this is just one little detail of how undo handling can interfere with their expectations of performance.

 

 


Character selectivity

Fri, 2017-04-14 06:40

A recent OTN posting asked how the optimizer dealt with “like” predicates for character types quoting the DDL and a query that I had published some time ago in a presentation I had done with Kyle Hailey. I thought that I had already given a detailed answer somewhere on my blog (or even in the presentation) but found that I couldn’t track down the necessary working, so here’s a repeat of the question and a full explanation of the working.

The query is very simple, and the optimizer’s arithmetic takes an “obvious” strategy in the arithmetic. Here’s the sample query, with the equiavalent query that we can use to do the calculation:


select * from t1 where alpha_06 like 'mm%';

select * from t1 where alpha_06 >= 'mm' and alpha_06 < 'mn';

Ignoring the possible pain of the EBCDIC character set and multi-byte national-language character sets with “strange” collation orders, it should be reasonably easy to see that ‘mn’ is the first string in alphabetical order that fails to match ‘mm%’. With that thought in mind we can apply the standard arithmetic for range-based predicates assuming, to stick with the easy example, that there are no histograms involved. For a range closed at one end and and open at the other the selectivity is:


( ( 'mn' - 'mm') / (high_value - low_value) ) + 1/num_distinct

The tricky bits, of course, are how you subtract ‘mm’ from ‘mn’ and how you use the values stored in the low_value and high_value columns of view user_tab_cols. So let’s generate the orginal data set and see where we go (running on 12c, and eliminating redundant bits from the original presentation):


rem
rem     Script:         selectivity_like_char.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2013
rem

execute dbms_random.seed(0)

create table t1 nologging as
with generator as (
        select rownum id
        from dual
        connect by rownum <= 1000
)
select
        cast(dbms_random.string('l',6) as char(6))      alpha_06
from
        generator,
        generator
where
        rownum <= 1e6 -- > comment to avoid WordPress formatting issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

column low_value  format a32
column high_value format a32

select
        column_name,
        num_distinct,
        density,
        low_value,
        high_value
from
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
;

select min(alpha_06), max(alpha_06) from t1;

set autotrace traceonly explain

select
        *
from
        t1
where
        alpha_06 like 'mm%'
;

set autotrace off

It will probably take a couple of minutes to generate the data – it’s 1M random strings, lower-case, 6 characters fixed – and will take up about 12MB of space. Here are the results from the stats and min/max queries, with the execution plan for the query we are testing:


COLUMN_NAME          NUM_DISTINCT    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------- ------------ ---------- -------------------------- --------------------------
ALPHA_06                  1000000    .000001 616161616E72               7A7A7A78747A


MIN(AL MAX(AL
------ ------
aaaanr zzzxtz


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   157 |  1099 |   265  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   157 |  1099 |   265  (20)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ALPHA_06" LIKE 'mm%')


Given that there are power(26,6) = 308,915,776 different combinations available for lower-case strings of 6 charactgers it’s not too surprising that Oracle generated 1M different strings, nor is it particularly surprising that the lowest value string started with ‘aaa’ and the highest with ‘zzz’.

So how do we get 157 as the cardinality for the query or, to put it another way, how do we get 0.000157 as the selectivity of the predicate. We need to refer to a note I wrote a few years ago to help us on our way (with a little caveat due to a change that appeared in 11.2.0.4) – what number would Oracle use to represent ‘mm’ and the other three strings we need to work with ?

According to the rules supplied (and adjusted in later versions) we have to:

  1. pad the strings with ASCII nulls (zeros) up to 15 bytes
  2. treat the results as a hexadecimal number and convert to decimal
  3. round off the last 21 decimal digits

We can model this in SQL with a statement like:


SQL> column dec_value format 999,999,999,999,999,999,999,999,999,999,999,999
SQL> select round(to_number(utl_raw.cast_to_raw(rpad('aaaanr',15,chr(0))),'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),-21) dec_val from dual;

DEC_VAL
------------------------------------------------
505,627,904,294,763,000,000,000,000,000,000,000

1 row selected.

As an alternative, or possibly a cross-check, I created a table with a varchar2(6) column, inserted the four values I was interested in and created a histogram of 4 buckets on the column (there’s a suitable little demo at this URL) and got the following endpoint values:


ENDPOINT_NUMBER                                   ENDPOINT_VALUE
--------------- ------------------------------------------------
              1  505,627,904,294,763,000,000,000,000,000,000,000
              2  568,171,140,227,094,000,000,000,000,000,000,000
              3  568,191,422,636,698,000,000,000,000,000,000,000
              4  635,944,373,827,734,000,000,000,000,000,000,000

Once we’ve got these numbers we can slot them into the standard formula (not forgetting the 1/1,000,000 for the closed end of the predicate) – and to save typing I’m going to factor out 10^21 across the board in the division:

Selectivity = (568,191,422,636,698 – 568,171,140,227,094) / (635,944,373,827,734 – 505,627,904,294,763) + 1/1,000,000

Selectivity = 20,282,409,604 / 130,316,469,532,971 + 1/1,000,000

Selectivity = 0.00015564 + 0.000001 = 0.00015664

From which the cardinality = (selectivity * num_rows) = 156.64, which rounds up to 157. Q.E.D.


Ask Jonathan

Mon, 2017-04-10 14:01

Oracle Scene, the magazine of the UK Oracle User Group is piloting a new regular feature called Ask Jonathan,  a chance to get an answer to any question you may have about how the Oracle database engine works.

I’m aiming to answer two or three questions per issue over the course of the year, using a format similar to the one Tom Kyte used in Oracle Magazine:  so if you have a question about the mechanisms, strategies, or mathematics of how Oracle does its thing then attach it as a comment to this posting.

Ideally the questions will have to be quite short (no 20MB trace files, massive schema definitions, or convoluted and exotic setup requirements or it will be too long), and I’ll aim to write something like half a page of in response.  I may summarise the question, or pick out the most interesting feature if it’s a bit too long to publish and answer in its entirety.

 


Parallel First_rows()

Mon, 2017-04-10 09:53

A recent posting on OTN raised the question of whether or not the “parallel” hint and the “first_rows(n)” hint were mutually incompatible. This reminded me that from time to time other posters on OTN (copying information from various websites, perhaps) have claimed that “parallel doesn’t work with first rows” or, conversely, “first rows doesn’t work with parallel”. This is one of those funny little myths that is so old that the script I’ve got to demonstrate the misconception is dated 2003 with a first test version of 8.1.7.4.

Since I haven’t run the test on any version of Oracle newer than 9.2.0.4 I thought it was time to dust it down, modernise it slightly, and run it again. So here’s the bit that creates a sample data set:


create table t1 (
        id      number,
        v1      varchar2(10),
        padding varchar2(100),
        constraint      t_pk primary key(id) using index local
)
partition by range(id) (
        partition p1000 values less than (1000),
        partition p2000 values less than (2000),
        partition p3000 values less than (3000),
        partition p4000 values less than (4000),
        partition p5000 values less than (5000)
)
;

insert into t1
select
        rownum - 1,
        rpad(rownum-1,10),
        rpad('x',100)
from
        all_objects
where
        rownum <= 5000 -- > hint to avoid WordPress formatting issue
order by 
        dbms_random.value
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1', 
                method_opt       => 'for all columns size 1'
        );
end;
/

Now I’m going to run a simple query, hinted in 4 different ways:

  • no hints
  • parallel hint only: /*+ parallel */
  • first_rows(1) hint only: /*+ first_rows(1) */
  • parallel and first_rows(1): /*+ parallel first_rows(1) */

Here’s the version of the query that has both hints in place:


set serveroutput off
set linesize 156
set pagesize 60
set trimspool on

select
        /*+ parallel first_rows(1) */
        v1
from
        t1
where
        id between 1500 and 2000
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline'));

I’ve actually run the query and used the display_cursor() option to pull the plan from memory – in the original (8i) script I used autotrace and the old (deprecated, backwards compatibility only) first_rows hint. To do any other tests just clone and edit. Here are the 4 outputs from the call to display_cursor() – with a little cosmetic editing:


SQL_ID  63qnzam9b8m9g, child number 0
=====================================
select  /*+ */  v1 from  t1 where  id between 1500 and 2000

Plan hash value: 277861402

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE ITERATOR|      |   502 |  7530 |    15   (0)| 00:00:01 |     2 |     3 |
|*  2 |   TABLE ACCESS FULL      | T1   |   502 |  7530 |    15   (0)| 00:00:01 |     2 |     3 |
-------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("ID"<=2000 AND "ID">=1500))


SQL_ID  ahary3u8q88mq, child number 1
=====================================
select  /*+ parallel */  v1 from  t1 where  id between 1500 and 2000

Plan hash value: 9959369

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |       |       |     8 (100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   502 |  7530 |     8   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   502 |  7530 |     8   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       |   502 |  7530 |     8   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      SHARED(2)
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=2000 AND "ID">=1500))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2


SQL_ID  3m6mnk9b337dd, child number 0
=====================================
select  /*+ first_rows(1) */  v1 from  t1 where  id between 1500 and
2000

Plan hash value: 1044541683

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |      |       |       |     6 (100)|          |       |       |
|   1 |  PARTITION RANGE ITERATOR          |      |     4 |    60 |     6   (0)| 00:00:01 |     2 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| T1   |     4 |    60 |     6   (0)| 00:00:01 |     2 |     3 |
|*  3 |    INDEX RANGE SCAN                | T_PK |       |       |     2   (0)| 00:00:01 |     2 |     3 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      FIRST_ROWS(1)
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1500 AND "ID"<=2000) -- > needs edit to avoid WordPress formatting issue


SQL_ID  9asm7t1zbv4q8, child number 1
=====================================
select  /*+ parallel first_rows(1) */  v1 from  t1 where  id between
1500 and 2000

Plan hash value: 4229065483

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |       |       |     3 (100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR                      |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)                | :TQ10000 |     4 |    60 |     3   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION RANGE ITERATOR       |          |     4 |    60 |     3   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1       |     4 |    60 |     3   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWP |            |
|*  5 |      INDEX RANGE SCAN                | T_PK     |       |       |     1   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      FIRST_ROWS(1)
      SHARED(2)
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("ID">=1500 AND "ID"<=2000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2

Critically we get four different execution plans from the four different strategies – so clearly the optimizer is perfectly happy to accept the parallel and first_rows() hints simultaneously. Note, particularly, how the first_rows(1) hint when combined with the parallel hint moved us from a parallel full tablescan to a parallel index range scan.

Whether or not it’s sensible to use the hint combination in this way is a matter for careful consideration, of course, but there could be circumstances where the combination really is the best way to get the starting row(s) from a query that otherwise has to return a large amount of data.


Block Names

Mon, 2017-04-03 07:04

There are a number of tiny details that I can never remember when I’m sketching out models to test ideas, and one of those is the PL/SQL block name. Virtually every piece of PL/SQL I write ends up with variables which have one of two prefixes in their names “M_” or “G_” (for memory or global, respectively) but I probably ought to be formal than that, so here’s an example of labelling blocks – specifically, labelling anonymous blocks from SQL*Plus using a trivial and silly bit of code:


rem
rem     Script:         plsql_block_names.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017
rem

create table t1
nologging
as
select * from all_objects
;


begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(object_id) nologging;

--
--        Anonymous PL/SQL block with label, and a couple of 
--        uses of the label as the qualifier for variables
--

<<my_block>>
declare
        data_object_id  t1.data_object_id%type;
        object_id       t1.object_id%type;
begin
        select  data_object_id
        into    my_block.data_object_id
        from    t1
        where
                data_object_id is not null
        and     rownum = 1
        ;
        select  object_id
        into    my_block.object_id
        from    t1
        where
                data_object_id = my_block.data_object_id
        and     rownum = 1
        ;

        dbms_output.put_line('Object: ' || object_id || ' - ' || my_block.object_id);
        dbms_output.put_line('Data Object: ' || data_object_id || ' - ' || my_block.data_object_id);

end;
/

Object: 16 - 16
Data Object: 6 - 6

The important point, of course, is that with a qualified variable name you eliminate the risk of a problem that appears occasionally on the public fora where someone has lost track of duplicated variable names, or used (as I have above) a variable name that matches a column name, and doesn’t notice that a little further down the code Oracle has captured the “wrong” interpretation of the name.

You’ll note that the block name has to be surrounded by doubled angle brackets – and now that I’ve written about it I might actually remember that for a couple of weeks ;)

 

Footnote:

This isn’t the only use for labels, by the way, it’s just one that probably ought to be used more frequently in production code.

The other thing I can never remember is how to escape quote marks – so I invariably end up using the old “double the quotes” method when I want to quote quotes.


Adding Indexes

Thu, 2017-03-30 02:31

The following question came up on the OTN database forum recently:


We have below table with columns,

Table T1
Columns:
-----------
Col_1, Col_2, Col_3, Col_4, Col_5, Col_6, Col_7, Col_8, Col_9, Col_10, Col_11, Col_12, Col_13, Col_14, Col_15

on which below indexes are created.

XXTEST_Col_1    Col_1
XXTEST_Col_2    Col_2
XXTEST_Col_3    Col_3
XXTEST_Col_5    Col_5
XXTEST_Col_6    Col_6
XXTEST_Col_7    Col_7
XXTEST_Col_8    Col_8
XXTEST_Col_8    (Col_4, Col_10, Col_11)

I have requirement to update table T1 as below and it’s taking really long. [JPL: I’m assuming that the naming of the second xxtest_col_8 index is a trivial error introduced while the OP was concealing names.)

UPDATE T1
SET Col_3 = x_value,
    Col_6 = y_value
where Col_4='N'
and Col_3 IS NULL;

To improve performance, created below index and it improved the performance of the update.

 
New Index:  XXTEST_Col_4    (Col_4, Col_3)

My question is it good idea to create the above index with already existing indexes? Is there any other better way to handle this?

There is insufficient information to answer these questions properly but if I had been given this problem there are several things I’d want to check before I came up with any answers. (In the absence of complete information the default answers are “probably not”, “probably”).

What Happened

The first question, of course, is “why did the update take a really long time?”, and the second is “why did the performance improve after the index creation?”.

There are (generally) two execution paths for the update – full tablescan or index access through the index starting with col_4. Whichever path Oracle took would the update have been faster (and fast enough) if the alternative path had been taken ?  (I have to include the “generally” caveat since it’s just possible that with silly settings for a couple of parameters and statistics the optimizer could have been doing an amazingly stupid “index full scan” on any index that contained a column declared as not null.)

Then consider the fact that the update is changing two columns that are indexed – and one of those columns (col_3) is changing from null to a value, so it’s going to increase the size of the index. Is the slow performance due to the time spent maintaining the indexes, and is the improved performance due to the fact that there are no rows left to update because the first run of the test updated all of them ?

It does seem reasonably likely that Oracle would use the new index for the update and that the better precision would be the cause of the improved performance; but a side effect of the index creation is that Oracle would now have some statistics about the combination of (col_4, col_3) that wasn’t previously available. Perhaps gathering stats on the column group (col_4, col_3) would have been sufficient to result in a change in execution path that would have given the required performance without introducing the extra  maintenance costs.

What’s the requirement

Before deciding whether or not an index should be used to improve the performance of a statement it’s important to have some idea of the cost/benefit/risk analysis. This often boils down to “how much data should change, how often does it happen, what could possibly go wrong”.

As far as “what could possibly go wrong” – by adding the index (col_4, col_3) we introduce 4 groups of risk:

  • any SQL currently using the col_3 index might change plans
  • any SQL currently using the (col_4, col_10, col_11) index might change plans
  • any query that has a predicate “col_3 is null” might change plans because that predicate could be resolved in the index if col_4 had been declared not null,
  • any query with equality predicates on both col_3 and col_4 might change plans because the new index supplies some statistics about the combination

We don’t have enough information to evaluate the degree of risk – and we haven’t been given any figures about the volume of data that changes or the frequency with which the update runs, so we can’t really say much beyond raising the warning flags.

What does the data look like.

Despite my closing comment above, we have two little clues about the data. First is that one of the predicates is col_4 = ‘N’ – to me this always raises the suspicion that we’re looking at a column which is a Yes/No, or Yes/No/NULL column; secondly the OP has marked the following response as “Correct Answer”:

Depends on your data.
Check count of both queries

select count(*) from T1 where Col_4='N' and Col_3 IS NULL;
select count(*) from T1 where Col_4='N';

If there is a significant difference in count, and delete is executed frequently, it might be good to have

If we assume that this means (a) there is a significant difference in the counts and (b) the delete IS executed frequently then we we have some (though not really enough) useful information about what the data looks like and how it is handled. Perhaps over a short period of time a small volume of new data appears where col_3 is null and there is a scheduled mechanism that looks for that data and modifies it.

If that’s the case then an index that gives use high precision when we’re searching for a (very) small amount of data from a very large table would be very helpful, while an index that starts with the first of our search columns (col_4 is the leading column of an existing index) might actually be a liability if we do a large range scan, visit the table for lots of rows, and then discard them because they fail our second predicate.

However, if this IS the case then it’s not a good idea to create a large index that is constantly subject to change – especially if the update from NULL to something else results in updates that scatter the col_3 values randomly around the index. (The NULL rows will be well packed in the index, and may be in a small collection of recent table blocks; but the update will delete one index entry and we have no idea about the impact that inserting the replacement could have.) So, if we’re also allowed to modify the code, let’s create an index that is as small as possible, won’t be used accidentally by any other code, and minimises the workload when updated. Here’s the code to create the index and one option for gathering stats on the resulting hidden column, and the modified update code:


create index xxtext_fbi1 on t1(
        case when col_4='N' and col_3 is null then 0 end
)
;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for columns (case when col_4=''N'' and col_3 is null then 0 end) size 1'
        );
end;
/

update t1 set
        col_3 = {x_value},
        col_6 = {y_value}
where
        case when col_4='N' and col_3 is null then 0 end = 0
;

This index will contain entries for just those rows that we need to update, and when we update them the index entries will simply disappear. The index is the smallest that we could have, with the minimum workload imposed, and no risk of being hi-jacked by any other process.

As a variation on a theme, and depending on version, we could create a virtual column with this definition (making it invisible in 12c) and creating an index on the virtual column. For preference I think I’d use the FBI in 11g, and the invisible virtual column in 12c.


alter table t1
add (
        wanted_flag invisible
        generated always as (
                case when col_4='N' and col_3 is null then 0 end
        ) virtual
)
;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for columns wanted_flag size 1'
        );
end;
/

create index xxtest_virt1 on t1(wanted_flag);

update t1 set
        col_3 = {x_value},
        col_6 = {y_value}
where
        wanted_flag = 0
;

The thread raised a couple of other worthwhile points – one post pointed out that if they were going to create a new index they would want to make it as useful for as many other processes as possible: this, of course, is a different view on the cost/benefit/risk assessment, especially (I would say) if there is no option to create a very small, low-cost, index that is guaranteeably safe. The point also came up that there are two or three variations on how one could produce an expression that produced index entries only for the rows where col_4 = ‘N’ and col_3 is null; and a point about further possibilities if there were a few different values for col_4 that resulted in similar queries.


Index out of range

Mon, 2017-03-27 02:42

I’ve waxed lyrical in the past about creating suitable column group statistics whenever you drop an index because even when the optimizer doesn’t use an index in its execution path it might have used the number of distinct keys of the index (user_indexes.distinct_keys) in its estimates of cardinality.

I’ve also highlighted various warnings (here (with several follow-on URLs) and here) about when the optimizer declines to use column group statistics. One of those cases is when a predicate on one of the columns goes “out of  range” – i.e. is below the column low_value or above the column high_value. Last night it suddenly crossed my mind that if we drop an index and replace it with a column group we might see an example of inconsistent behaviour: what happens when the index exists but the predicate is out of range – would you find that dropping the index and replacing it with a column group would give you different cardinality estimates for out of range predicates ?

Here’s the demonstration of what happened when I created a simple test on 12.1.0.2:


rem
rem     Script:         index_v_colgrp.sql
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        mod(rownum-1,100)               n1,
        mod(rownum-1,100)               n2,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format problem
;

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

set autotrace traceonly explain

I’ve created a table with 1M rows, where n1 and n2 are perfectly correlated – there are 100 distinct pairs of values (ranging from (0,0) to (99,99)). Now with autotrace enabled I’ll execute three queries – two with an index on the table of which one will be the baseline plan for predicates that are “in-range” and the other will take the predicates out of range, and the third after I’ve dropped the index and substituted a matching column group to see what I get for the “out of range” plan. The plans may produce different paths as the index disappears, of course, but what we’re only interested in the cardinality estimates in this experiment.

Here’s the code to run the three queries:


select  padding
from    t1
where
        n1 = 50
and     n2 = 50
;

select  padding
from    t1
where
        n1 = 110
and     n2 = 110
;

drop index t1_i1;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns (n1, n2) size 1'
        );
end;
/

select  padding
from    t1
where
        n1 = 110
and     n2 = 110
;

And the three execution plans:


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  1044K|  2142   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  1044K|  2142   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50 AND "N2"=50)


---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |    79 |  8453 |    83   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    79 |  8453 |    83   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |    79 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=110 AND "N2"=110)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    79 |  8453 |  2142   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    79 |  8453 |  2142   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=110 AND "N2"=110)

In summary:

  • With the index in place and the predicates in range the optimizer used user_indexes.distinct_keys to calculate cardinality.
  • With the index in place and the predicates (or just one of them, in fact) out of range the optimizer used the individual column selectivities with linear decay.
  • With a column group instead of an index the optimizer behaved exactly as it used to with the index in place.

So my concern that substituting column groups for indexes was unfounded – the optimizer was being silly (legal disclaimer: that’s just my opinion) with indexes, and the silly (ditto) behaviour with column groups hasn’t changed anything.

I’ll have to go back a couple of versions of Oracle to repeat these tests – maybe this behaviour with user_indexes.distinct_keys in place is relatively recent, but it’s another reason why execution plans may change suddenly and badly as time passes when “nothing changed”.

 


min/max Upgrade

Thu, 2017-03-23 03:53

A question came up on the OTN database forum a little while ago about a very simple query that was taking different execution paths on two databases with the same table and index definitions and similar data. In one database the plan used the “index full scan (min/max)” operation while the other database used a brute force “index fast full scan” operation.

In most circumstances the starting point to address a question like this is to check whether some configuration details, or some statistics, or the values used in the query are sufficiently different to result in a significant change in costs; and the first simple procedure you can follow is to hint each database to use the plan from the opposite database to see if this produces any clues about the difference – it’s a good idea when doing this test to use one of the more verbose formatting options for the call to dbms_xplan.

In this case, though, the OP discovered a note on MoS reporting exactly the problem he was seeing:

Doc ID 2144428.1: Optimizer Picking Wrong ‘INDEX FAST FULL SCAN’ Plan vs Correct ‘INDEX FULL SCAN (MIN/MAX)’

which referred to

Bug 22662807: OPTIMIZER PICKING INDEX FFS CAN INSTEAD OF MIN/MAX

Conveniently the document suggested a few workarounds:

  • alter session set optimizer_features_enable = ‘11.2.0.3’;
  • alter session set “_fix_control” = ‘13430622:off’;
  • delete object stats [Ed: so that dynamic sampling takes place … maybe a /*+ dynamic_sampling(alias level) */ hint would suffice].

Of the three options my preference would (at least in the short term) be the _fix_control one. Specifically, from the v$system_fix_control view, we can see that it addresses the problem very precisely with the description: “index min/max cardinality estimate fix for filter predicates”.

The example in the bug note showed a very simple statement (even more simple than the OP’s query which was only a single table query anyway), so I thought I’d build a model and run a few tests to see what was going on. Luckily, before I’d started work, one of the other members of the Oak Table network sent an email to the list asking if anyone knew how the optimizer was costing an example he’d constructed – and I’ve finally got around to looking at his example, and here’s the model and answer(s), starting with the data set:


rem
rem     Script:         test_min_max.sql
rem     Dated:          March 2017
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.2.0.3
rem

create table min_max_test nologging
as
with ids as (
        select /*+ Materialize */ rownum  id from dual connect by rownum <= 50000 -- > comment to protect formatting
),
line_nrs as (
        select /*+ Materialize */  rownum line_nr from dual connect by rownum <= 20 -- > comment to protect formatting
)
select
        id, line_nr ,rpad(' ', 800, '*') data
from
        line_nrs, ids
order by
        line_nr, id
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'min_max_test',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index mmt_ln_id on min_max_test (line_nr, id) nologging;
create index mmt_id    on min_max_test (id)          nologging;

The table has two critical columns: each id has 20 line_nr values associated with it, but the way the data was generated means that the line numbers for a given id are scattered across 20 separate table blocks.

There are two indexes – one on the id which will allow us to find all the rows for a given id as efficiently as possible, and one (slightly odd-looking in this context) that would allow us to find a specific row for a given line_nr and id very efficiently. Two things about these indexes – in a live application they should both be compressed on the first (only, in the case of index mmt_id) column, and secondly the necessity of the mmt_id index is questionable and it might be an index you could drop if you reversed the order of the columns in mmt_ln_id. The thing about these indexes, though, is that they allow us to demonstrate a problem. So let’s query the data – twice, hinting each index in turn:


set serveroutput off

select
        /*+ index(t(id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

select
        /*+ index(t(line_nr, id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

It’s fairly safe to make a prediction about the execution plan and cost of the first query – it’s likely to be a range scan that accesses a couple of branch blocks, a leaf block and 20 separate table blocks followed by a “sort aggregate” – with a cost of about 23.

It’s a little harder to make a prediction about the second query. The optimizer could infer that the min(line_nr) has to be close to the left hand section of the index, and could note that the number of rows in the table is the same as the product of the number of distinct values of the two separate columns, and it might note that the id column is evenly distributed (no histogram) across the data, so it might “guess” that it need only range scan all the entries for the first line_nr to find the appropriate id. So perhaps the optimizer will use the index min/max range scan with a cost that is roughly 2 branch blocks plus total leaf blocks / 20 (since there are 20 distinct values for line_nr); maybe it would divide the leaf block estimate by two because “on average” – i.e. for repeated random selections of value for id – it would have to scan half the leaf blocks. There were 2,618 leaf blocks in my index, so the cost should be close to either 133 or 68.

Here are the two plans – range scan first, min/max second:


select  /*+ index(t(id)) */  min(line_nr) from  min_max_test t where id = :b1
-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |       |       |    23 (100)|          |
|   1 |  SORT AGGREGATE                      |              |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| MIN_MAX_TEST |    20 |   160 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | MMT_ID       |    20 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID"=:B1)


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |    22 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |    22   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Spot on with the estimate for the simple range scan – but what did we do wrong with the estimate for the min/max scan ? You might notice in the first example the “table access by rowid batched” and realise that this is running on 12c. Here’s the plan if I get if I set the optimizer_features_enable back to 11.2.0.3 before running the second query again:


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   136 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |   136   (1)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |   136   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Using the 11.2.0.3 optimizer model the plan has a cost that’s very close to our prediction – we’ll see why there’s a slight difference in a moment. If we set the optimizer_features_enable to 11.2.0.4 the cost drops back to 22. So for our example 11.2.0.3 will use the simple “index range scan” and an upgrade to 11.2.0.4 (or higher) will switch to the “index full scan (min/max)”. If you look at the OTN posting the impact of the change in costing is exactly the other way around – 11.2.0.3 uses the min/max path, 11.2.0.4 uses the simple index range scan.

The techy bit

You really don’t need to know this – experimenting with the optimizer_features_enable (or _fix_control) will give you plans that show you all the numbers you need to see to check whether or not you’ve run into this particular problem – but if you’re interested here’s a little bit from the two 10053 trace files. We need only look at a few critical lines. From the 11.2.0.3 costing for the min/max scan:


Index Stats::
  Index: MMT_ID  Col#: 1
  LVLS: 2  #LB: 2202  #DK: 50000  LB/K: 1.00  DB/K: 20.00  CLUF: 1000000.00  NRW: 1000000.00
  Index: MMT_LN_ID  Col#: 2 1
  LVLS: 2  #LB: 2618  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 125000.00  NRW: 1000000.00

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 135.000000  resc_cpu: 961594
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 135.697679  Resp: 135.697679  Degree: 1

I was running 12.1.0.2 so there were a few extra bits and pieces that I’ve deleted (mostly about SQL Plan Directives and in-memory). Critically we can see that the stats collection has a small error for the ID column – 50,536 distinct values (NDV) instead of exactly 50,000. This seems to have given us a cost for the expected index range of: 2 (blevel) + ceiling(2618 (leaf blocks) * 50536 / 1000000) = 2 + ceil(132.3) = 135, to which we add a bit for the CPU and get to 136. (Q.E.D.)

Then we switch to costing for 11.2.0.4:


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 21.787874  resc_cpu: 156872
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 22.324608  Resp: 22.324608  Degree: 1

We still have the small error in the number of distinct values for id, so the estimated number of rows that we need to access from the table for a given id (before “aggregating” to find its minimum line_nr) is 19.787874 (Computed: / Non Adjusted:) rather than exactly 20. Notice, then, that the cost of using the index is 19.787874 + 2 which looks suspiciously like adding the blevel to the number of table blocks to get a cost and forgetting that we might have to kiss a lot of frogs before we find the prince. Basically, in this example at least, it looks like the costing algorithm has NOTHING to do with the mechanics of what actually has to happen at run-time.

Footnote

This is only an initial probe into what’s going on with the min/max scan; there are plenty more patterns of data that would need to be tested before we could have any confidence that we had produced a generic model of how the optimizer does its calculations – the only thing to note so far is that there IS a big change as  you move from 11.2.0.3 to later versions: the case on OTN showed the min/max scan disappearing on the upgrade, the example above shows the min/max disappearing on the downgrade – either change could be bad news for parts of a production system.

There are a couple of related bugs that might also be worth reviewing.

  • Bug 11834402 : CBO CHOOSES A SLOW INDEX FULL SCAN OVER A MUCH FASTER INDEX RANGE SCAN
  • Bug 13430622 : INDEX SCAN IN VERY SLOW FOR ONE PREDICATE AND FAST FOR OTHERS

There is a note, though that this last bug was fixed in 12.1

Footnote 2

When experimenting, one idea to pursue as the models get more complex and you’re using indexes with more than two columns is to test whether the presence of carefully chosen column group statistics might make a difference to the optimizer’s estimates of cardinality (hence cost) of the min/max scan.

 


Deception

Tue, 2017-03-21 09:41

One of the difficulties with trouble-shooting is that’s it very easy to overlook, or forget to go hunting for, the little details that turn a puzzle into a simple problem. Here’s an example showing how you can read a bit of an AWR report and think you’ve found an unpleasant anomaly. I’ve created a little model and taken a couple of AWR snapshots a few seconds apart so the numbers involved are going to be very small, but all I’m trying to demonstrate is a principle. So here’s a few lines of one of the more popular sections of an AWR report:

SQL ordered by Gets                       DB/Inst: OR32/or32  Snaps: 1754-1755
-> Resources reported for PL/SQL code includes the resources used by all SQL
   statements called by the code.
-> %Total - Buffer Gets   as a percentage of Total Buffer Gets
-> %CPU   - CPU Time      as a percentage of Elapsed Time
-> %IO    - User I/O Time as a percentage of Elapsed Time
-> Total Buffer Gets:         351,545
-> Captured SQL account for   65.0% of Total

     Buffer                 Gets              Elapsed
      Gets   Executions   per Exec   %Total   Time (s)  %CPU   %IO    SQL Id
----------- ----------- ------------ ------ ---------- ----- ----- -------------
      8,094          20        404.7    2.3        0.0 114.1   2.3 017r1rur8atzv
Module: SQL*Plus
UPDATE /*+ by_pk */ T1 SET N1 = 0 WHERE ID = :B1

We have a simple update statement which, according to the hint/comment (that’s not a real hint, by the way) and guessing from column names, is doing an update by primary key; but it’s taking 400 buffer gets per execution!

It’s possible, but unlikely, that there are about 60 indexes on the table that all contain the n1 column; perhaps there’s a massive read-consistency effect going on thanks to some concurrent long-running DML on the table; or maybe there are a couple of very hot hotspots in the table that are being constantly modified by multiple sessions; or maybe the table is a FIFO (first-in, first-out) queueing table and something funny is happening with a massively sparse index.

Let’s just check, first of all, that the access path is the “update by PK” that the hint/comment suggests (cut-n-paste):


SQL> select * from table(dbms_xplan.display_cursor('017r1rur8atzv',null));

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------
SQL_ID  017r1rur8atzv, child number 0
-------------------------------------
UPDATE /*+ by_pk */ T1 SET N1 = 0 WHERE ID = :B1

Plan hash value: 1764744892

----------------------------------------------------------------------------
| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | UPDATE STATEMENT   |       |       |       |     3 (100)|          |
|   1 |  UPDATE            | T1    |       |       |            |          |
|*  2 |   INDEX UNIQUE SCAN| T1_PK |     1 |    14 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("ID"=:B1)

The plan is exactly as expected – so where do we look next to find out what’s going on? I’m a great believer in trying to make sure I have as much relevant information as possible; but there’s always the compromise when collecting information that balances the benefit of the new information against the difficulty of gathering it – sometimes the information that would be really helpful is just too difficult, or time-consuming, to collect.

Fortunately, in this case, there’s a very quick easy way to enhance the information we’ve got so far. The rest of the AWR report – why not search for that SQL_ID in the rest of the report to see if that gives us a clue ? Unfortunately the value doesn’t appear anywhere else in the report. On the other hand there’s the AWR SQL report (?/rdbms/admin/awrsqrpt.sql – or the equivalent drill-down on the OEM screen), and here’s a key part of what it tells us for this statement:


Stat Name                                Statement   Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms)                                36            1.8     0.0
CPU Time (ms)                                    41            2.0     0.1
Executions                                       20            N/A     N/A
Buffer Gets                                   8,094          404.7     2.3
Disk Reads                                        1            0.1     0.0
Parse Calls                                      20            1.0     0.4
Rows                                          2,000          100.0     N/A
User I/O Wait Time (ms)                           1            N/A     N/A
Cluster Wait Time (ms)                            0            N/A     N/A
Application Wait Time (ms)                        0            N/A     N/A
Concurrency Wait Time (ms)                        0            N/A     N/A
Invalidations                                     0            N/A     N/A
Version Count                                     1            N/A     N/A
Sharable Mem(KB)                                 19            N/A     N/A
          -------------------------------------------------------------

Spot the anomaly?

We updated by primary key 20 times – and updated 2,000 rows!

Take another look at the SQL – it’s all in upper case (apart from the hint/comment) with a bind variable named B1 – that means it’s (probably) an example of SQL embedded in PL/SQL. Does that give us any clues ? Possibly, but even if it doesn’t we might be able to search dba_source for the PL/SQL code where that statement appears. And this is what it looks like in the source:

        forall i in 1..m_tab.count
                update  /*+ by_pk */ t1
                set     n1 = 0
                where   id = m_tab(i).id
        ;

It’s PL/SQL array processing – we register one execution of the SQL statement while processing the whole array, so if we can show that there are 100 rows in the array the figures we get from the AWR report now make sense. One of the commonest oversights I (used to) see in places like the Oracle newsgroup or listserver was people reporting the amount of work done but forgetting to consider the equally important “work done per row processed”. To me it’s also one of the irritating little defects with the AWR report – I’d like to see “rows processed” in various of the “SQL ordered by” sections of the report (not just the “SQL ordered by Executions” section), rather than having to fall back on the AWR SQL report.

Footnote:

If you want to recreate the model and tests, here’s the code:


rem
rem     Script:         forall_pk_confusion.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2017
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select  --+ materialize
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        2 * trunc(dbms_random.value(1e10,1e12))         n1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format problem
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1', 
                method_opt       => 'for all columns size 1'
        );
end;

alter table t1 add constraint t1_pk primary key(id);

declare

        cursor c1 is
        select  id
        from    t1
        where   mod(id,10000) = 1
        ;

        type c1_array is table of c1%rowtype index by binary_integer;
        m_tab c1_array;

begin

        open c1;

        fetch c1
        bulk collect
        into m_tab
        ;

        dbms_output.put_line('Fetched: ' || m_tab.count);

        close c1;

        forall i in 1..m_tab.count
                update  /*+ by_pk */ t1
                set     n1 = 0
                where   id = m_tab(i).id
        ;

        dbms_output.put_line('Updated: ' || sql%rowcount);

end;
/

select
        v.plan_table_output
from
        v$sql   sql,
        table(dbms_xplan.display_cursor(sql.sql_id, sql.child_number)) v
where
        sql_text like 'UPDATE%by_pk%'
;

select
        executions, rows_processed, disk_reads, buffer_gets
from    v$sql  
where   sql_id = '017r1rur8atzv'
;


Quiz Night

Thu, 2017-03-09 16:34

The following is a straight, continuous, untouched, cut-n-paste from an SQL*Plus session on 12.1.0.2. How come the update doesn’t execute in parallel – noting that parallel DML has been enabled and the tablescan to identify rows to be updated does execute in parallel ?


SQL> desc t1
 Name                                                                            Null?    Type
 ------------------------------------------------------------------------------- -------- ------------------------------------------------------
 OWNER                                                                           NOT NULL VARCHAR2(128)
 OBJECT_NAME                                                                     NOT NULL VARCHAR2(128)
 SUBOBJECT_NAME                                                                           VARCHAR2(128)
 OBJECT_ID                                                                       NOT NULL NUMBER
 DATA_OBJECT_ID                                                                           NUMBER
 OBJECT_TYPE                                                                              VARCHAR2(23)
 CREATED                                                                         NOT NULL DATE
 LAST_DDL_TIME                                                                   NOT NULL DATE
 TIMESTAMP                                                                                VARCHAR2(19)
 STATUS                                                                                   VARCHAR2(7)
 TEMPORARY                                                                                VARCHAR2(1)
 GENERATED                                                                                VARCHAR2(1)
 SECONDARY                                                                                VARCHAR2(1)
 NAMESPACE                                                                       NOT NULL NUMBER
 EDITION_NAME                                                                             VARCHAR2(128)
 SHARING                                                                                  VARCHAR2(13)
 EDITIONABLE                                                                              VARCHAR2(1)
 ORACLE_MAINTAINED                                                                        VARCHAR2(1)

SQL> select * from t1 minus select * from all_objects;

OWNER           OBJECT_NAME          SUBOBJECT_NAME          OBJECT_ID DATA_OBJECT_ID OBJECT_TYPE             CREATED   LAST_DDL_
--------------- -------------------- ---------------------- ---------- -------------- ----------------------- --------- ---------
TIMESTAMP           STATUS  T G S       NAMESPACE EDITION_NAME         SHARING       E O
------------------- ------- - - - --------------- -------------------- ------------- - -
TEST_USER       T1                                              159331         159331 TABLE                   09-MAR-17 09-MAR-17
2017-03-09:22:16:36 VALID   N N N               1                      NONE            N


1 row selected.

SQL> alter session force parallel dml;

Session altered.

SQL> set serveroutput off
SQL> update t1 set object_name = lower(object_name) where data_object_id is null;

78324 rows updated.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  b16abyv8p2790, child number 0
-------------------------------------
update t1 set object_name = lower(object_name) where data_object_id is
null

Plan hash value: 121765358

---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |          |       |       |    26 (100)|          |        |      |            |
|   1 |  UPDATE               | T1       |       |       |            |          |        |      |            |
|   2 |   PX COORDINATOR      |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)| :TQ10000 | 78324 |  2141K|    26   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   4 |     PX BLOCK ITERATOR |          | 78324 |  2141K|    26   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  5 |      TABLE ACCESS FULL| T1       | 78324 |  2141K|    26   (0)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(:Z>=:Z AND :Z<=:Z)
       filter("DATA_OBJECT_ID" IS NULL)

Note
-----
   - Degree of Parallelism is 8 because of table property
   - PDML disabled because single fragment or non partitioned table used


29 rows selected.

SQL> select * from v$pq_tqstat;

DFO_NUMBER      TQ_ID SERVER_TYPE       NUM_ROWS      BYTES  OPEN_TIME AVG_LATENCY      WAITS   TIMEOUTS PROCESS         INSTANCE     CON_ID
---------- ---------- --------------- ---------- ---------- ---------- ----------- ---------- ---------- --------------- -------- ----------
         1          0 Producer              8997     363737 ##########           0         14          0 P004                   1          0
                                            9721     409075 ##########           0         12          0 P007                   1          0
                                            9774     408591 ##########           0         12          0 P005                   1          0
                                            9844     396816 ##########           0         12          0 P003                   1          0
                                            9965     403926 ##########           0         13          0 P006                   1          0
                                            9727     388829 ##########           0         12          0 P002                   1          0
                                            9951     399162 ##########           0         14          0 P001                   1          0
                                           10345     408987 ##########           0         13          0 P000                   1          0
                      Consumer             78324    3179123 ##########           0          0          0 QC                     1          0



9 rows selected.

If you want to see the fully parallel plan, it would look like this (after running the query above against v$pq_tqstat I executed one statement that I’m not showing before carrying on with the statements below):


SQL> update t1 set object_name = lower(object_name) where data_object_id is null;

78324 rows updated.

SQL> select * from table(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  b16abyv8p2790, child number 0
-------------------------------------
update t1 set object_name = lower(object_name) where data_object_id is
null

Plan hash value: 3991856572

---------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |          |       |       |    26 (100)|          |        |      |            |
|   1 |  PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM) | :TQ10000 | 78324 |  2141K|    26   (0)| 00:00:01 |  Q1,00 | P->S | QC (RAND)  |
|   3 |    UPDATE             | T1       |       |       |            |          |  Q1,00 | PCWP |            |
|   4 |     PX BLOCK ITERATOR |          | 78324 |  2141K|    26   (0)| 00:00:01 |  Q1,00 | PCWC |            |
|*  5 |      TABLE ACCESS FULL| T1       | 78324 |  2141K|    26   (0)| 00:00:01 |  Q1,00 | PCWP |            |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access(:Z>=:Z AND :Z<=:Z)
       filter("DATA_OBJECT_ID" IS NULL)

Note
-----
   - Degree of Parallelism is 8 because of table property


28 rows selected.

SQL> select object_name, object_type from user_objects;

OBJECT_NAME          OBJECT_TYPE
-------------------- -----------------------
T1                   TABLE

1 row selected.

Answer coming some time tomorrow.


Join Elimination

Thu, 2017-03-09 12:39

A question has just appeared on OTN describing a problem where code that works in 11g doesn’t work in 12c (exact versions not specified). The code in question is a C-based wrapper for some SQL, and the problem is a buffer overflow problem. The query supplied is as follows:


select T1.C1 from T1, T2 where T1.C1 = T2.D1;

The problem is that this works in 11g where the receiving (C) variable is declared as

char myBuffer [31];

but it doesn’t work in 12c unless the receiving variable is declared as:

char myBuffer [51];

There’s an important bit of background information that might be giving us a clue about what’s happened (although what I’m about to describe isn’t actually the problem unless the SQL provided is a simplified version of the problem SQL that is expected to display the problem). Column C1 is defined as char(30) and column D1 is defined as char(50). Here’s some sample code showing why you might need a buffer of 50+1 bytes to hold something that ought to be 30+1 bytes long. (This may be nothing to do with the anomaly described in the original posting – it’s just something I thought of when I first saw the question.)


rem     Script:         join_elimination_oddity.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2017

create table t1(
        c30     char(30) primary key
);

create table t2(
        d50     char(50) references t1
);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

explain plan for
select
        t1.c30
from
        t1, t2
where
        t1.c30 = t2.d50
;

select * from table(dbms_xplan.display(null,null,'projection'));

So we’re selecting c30 – the 30 byte character column – from t1; what do we actually get ? Here’s the plan with the projection:


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |    52 |     1   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T2   |     1 |    52 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("T2"."D50" IS NOT NULL)

Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "T2"."D50"[CHARACTER,50]

Table t1 has been eliminated and the projected column is the “equivalent” column from t2 – which is too long for the expected output. To work around this problem you can disable join elimination either by parameter (_optimizer_join_elimination_enabled=false) or by hinting /*+ no_eliminate_join(t1) */ in which case the plan (with my data) became a nested loop join from t2 to t1 with column c1 projected as expected.

Footnote:

Two things to note about my demonstration

  • If you’re going to create a referential integrity constraint between columns they do need to be of exactly the same type.
  • This extremely simple case demonstrates the problem in 11.2.0.4 as well as 12.1.0.2. Possibly a more complex query could be produced where (thanks to limitations in query transformations) 11g doesn’t spot the option for join elimination while 12c does; alternatively, a very simple two-column example in 11g won’t do join elimination while a two-column example in 12.2 can (though it doesn’t always) – so upgrading to 12.2 MIGHT cause more people to see this anomaly appearing.

 


Guesswork

Tue, 2017-03-07 13:57

A recent posting on the OTN database forum described a problem with an insert (as select) statement that sometimes ran extremely slowly: nothing interesting yet, there could be plenty of boring reasons for that to happen. The same SQL statement (by SQL_ID) might take 6 hours to insert 300K rows one night while taking just a few minutes to insert 900K another night (still nothing terribly interesting).

An analysis of the ASH data about the statement showed that the problem was on the “LOAD TABLE CONVENTIONAL” operation – which starts to get interesting if you also assume that someone who was competent to look at ASH would probably have noticed whether or not the time spent was on (the first obvious guess) some variant of “TX enqueue” waits or “log file” waits or something amazingly different. It’s also a little more interesting if you’ve noticed that the title of the posting is about “consuming a lot of CPU time” – so where could that be going if we see most of the excess time going on the insert rather than on the select.

Here’s an enormous clue that this might be a “non-standard” problem – the SQL statement starts like this:


INSERT  /*+ ignore_row_on_dupkey_index(tgt OTC_DAT_TV_PROC_STATUS_UK) */
INTO OTC_DAT_TV_PROC_STATUS TGT (
    {list of columns}
)
SELECT  /*+ parallel(8) */
    ...

See the hint which says “ignore rows if they raise duplicate key errors against index otc_dat_tv_proc_status_uk” ? Think what it might take to implement code that obeys the hint – Oracle can’t know that a row is duplicating a value until it has inserted the row (to get a rowid) then tried to maintain the index and discovered the pre-existing value, at which point it raises an exception, then handles it by undoing the single row insert (I wonder what that does to array inserts, and the logic of triggers), then carries on with the next row. (I suppose the code could check every relevant index before doing the insert – but that would mean a very big, and possibly very resource-intensive, change to all existing “insert a row” code.)

Surely we’re going to see some dramatic effects if a large fraction of our rows result in duplicate values.

Warning – the guess might not be right but it’s much more interesting than all the other guesses you might make at this point and a good excuse for doing a bit of experimentation and learning. So here’s a simple model to test the “excess work” hypothesis – running against 11.2.0.4:

rem
rem     Script:         ignore_dupkey.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Mar 2017
rem

drop table t2;
drop table t1;

create table t1
nologging
as
select  *
from    all_objects
where   rownum <= 50000 ; -- > comment to protect WordPress formatting

create table t2
nologging
as
select  *
from    all_objects
where   rownum <= 50000 ; -- > comment to protect WordPress formatting

update t1 set object_id = 500000 + (select max(object_id) from t1)
;

commit;

create unique index t1_i1 on t1(object_id);

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );

        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T2',
                method_opt       => 'for all columns size 1'
        );
end;
/

All I’ve done is create two copies of the same data – with an update in place that changes one set of data so that the object_id doesn’t overlap with the object_id in the other set. The update statement is something that I’m going to include in one test but exclude from the second. Now for the statement that tests for the effects of the hint:


execute snap_events.start_snap
execute snap_my_stats.start_snap

insert
        /*+ ignore_row_on_dupkey_index(t1 (object_id)) */
        into t1
select  *
from    t2
;

execute snap_my_stats.end_snap
execute snap_events.end_snap


There are several different things I could do once I’ve got this basic test set up if I want to refine what I’m testing and isolate certain aspects of the mechanism, but all I’ll do for this note is a simple report of some figures from the two tests – one where the object_id values don’t overlap and one where t1 and t2 are exact copies of each other. The two procedures wrapping the statement are just my standard routines for capturing changes in v$mystat and v$session_event for my session.

For the insert where I’ve done the update to avoid any duplicates appearing the insert completed in about 0.3 seconds, generating 10MB of redo and 2MB of undo.

When I removed the update statement the (continuously failing) insert took 35.5 seconds to complete, of which almost all the time was CPU time. The redo jumped to 478MB with 14MB of undo. The extreme level of redo surprised me slightly especially since the scale of the change was so much greater than that of the undo – I think it may be due to a problem with Oracle needing to unwind one row from an (internal) array insert before retrying. Here, taken from the session stats of the problem run, is a little indication of why the time (and especially the CPU time) increased so much:

Name                                                                     Value
----                                                                     -----
rollback changes - undo records applied                                 95,014
active txn count during cleanout                                        24,627
cleanout - number of ktugct calls                                       24,627
HSC Heap Segment Block Changes                                          97,509
Heap Segment Array Inserts                                              97,509
recursive calls                                                        682,574
recursive cpu usage                                                      2,193
session logical reads                                                1,341,077

Obviously there’s a lot of work done rolling back changes that should not have been made (though why it’s reported as 95,000 rather than 100,000 I don’t know and I’d rather not do a trace of buffer activity to find out) with an associated extra load of data block activity. Most visible, though, is the huge number of recursive calls with, perhaps associated, a very large number of session logical reads. Clearly it’s worth enabling extended tracing to see what’s going on – if you haven’t already guessed what some of those calls are about. Here’s an extract from the top of an appropriate tkprof output:


tkprof test_ora_32298.trc ignore_dupkey sort=execnt

select /*+ rule */ c.name, u.name
from
 con$ c, cdef$ cd, user$ u  where c.con# = cd.con# and cd.enabled = :1 and
  c.owner# = u.user#


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    48753      1.86       2.08          0          0          0           0
Execute  48753      3.75       4.17          0          0          0           0
Fetch    48753      2.86       3.12          0      97506          0           0
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   146259      8.47       9.38          0      97506          0           0

select o.name, u.name
from
 obj$ o, user$ u  where o.obj# = :1 and o.owner# = u.user#

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse    48753      1.87       1.99          0          0          0           0
Execute  48753      3.60       3.63          0          0          0           0
Fetch    48753      7.35       7.52          0     243765          0       48753
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   146259     12.84      13.14          0     243765          0       48753

The top two statements in the trace file show Oracle first trying to find the name of the constraint that has been breached, then falling back to searching for the name of the unique index that has caused the error to appear. If I had created a unique constraint rather than just a unique index then the second of these two statement would not have appeared in the trace file (and the run would have been a little quicker – hint: constraints are a good thing).

You’ll notice that the total number of calls from the two statement is roughly 292,500 – far short of the 682,000 reported in the session stats. Unfortunately there was nothing else in the trace files that could be blamed for the outstanding 400,000 missing calls. It’s not really necessary to chase up all the details, though; clearly we can see that this feature is very expensive if lots of duplicates appear – like DML error logging it has probably been created as a way of dealing with occasional errors when handling large volumes of data.

Footnote

Notice that my example uses the “index description” method for specifying the index in the hint; it’s legal with either the name or the description. Interestingly (but, perhaps, unsurprisingly) this is a hint that Oracle has to parse for correctness before optimisation. I made a mistake in my first attempt at writing the update statement leaving me with duplicates in the data so Oracle couldn’t create the unique index; as a consequence of the missing unique index the insert statement reported the Oracle error “ORA-38913: Index specified in the index hint is invalid”.

 


I don’t know (yet)

Wed, 2017-03-01 06:53

Here’s a question that came to mind while reading a recent question on the OTN database forum. It’s a question to which I don’t know the answer and, at present, I don’t really want to bother modelling at present – although if I were on a customer site and this looked like a likely explanation for a performance anomaly it’s the sort of thing I would create a model for.

If I have a query that runs parallel and does a “hash join buffered” (see, for example, this URL), it’s possible that the processes creating the build table will manage to create the build table in memory without spilling to disc but then find themselves unable to keep the incoming probe table data in memory and spill it to disc before re-reading it to do the join and forward the results to the parent process.

Here’s the plan from the URL above, showing rowsource execution stats:


------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Starts | E-Time   |    TQ  |IN-OUT| PQ Distrib | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem |  O/1/M   | Max-Tmp |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          |      1 |          |        |      |            |  70101 |00:00:00.17 |      12 |      0 |      0 |       |       |          |         |
|   1 |  PX COORDINATOR         |          |      1 |          |        |      |            |  70101 |00:00:00.17 |      12 |      0 |      0 |       |       |          |         |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 |      0 | 00:00:01 |  Q1,02 | P->S | QC (RAND)  |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|*  3 |    HASH JOIN BUFFERED   |          |      2 | 00:00:01 |  Q1,02 | PCWP |            |  70101 |00:00:00.28 |       0 |    310 |    310 |  5952K|  1953K|     2/0/0|    2048 |
|   4 |     PX RECEIVE          |          |      2 | 00:00:01 |  Q1,02 | PCWP |            |  70101 |00:00:00.08 |       0 |      0 |      0 |       |       |          |         |
|   5 |      PX SEND HASH       | :TQ10000 |      0 | 00:00:01 |  Q1,00 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|   6 |       PX BLOCK ITERATOR |          |      2 | 00:00:01 |  Q1,00 | PCWC |            |  70101 |00:00:00.05 |    1031 |   1005 |      0 |       |       |          |         |
|*  7 |        TABLE ACCESS FULL| T1       |     26 | 00:00:01 |  Q1,00 | PCWP |            |  70101 |00:00:00.02 |    1031 |   1005 |      0 |       |       |          |         |
|   8 |     PX RECEIVE          |          |      2 | 00:00:01 |  Q1,02 | PCWP |            |  70102 |00:00:00.06 |       0 |      0 |      0 |       |       |          |         |
|   9 |      PX SEND HASH       | :TQ10001 |      0 | 00:00:01 |  Q1,01 | P->P | HASH       |      0 |00:00:00.01 |       0 |      0 |      0 |       |       |          |         |
|  10 |       PX BLOCK ITERATOR |          |      2 | 00:00:01 |  Q1,01 | PCWC |            |  70102 |00:00:00.04 |    1031 |   1005 |      0 |       |       |          |         |
|* 11 |        TABLE ACCESS FULL| T2       |     26 | 00:00:01 |  Q1,01 | PCWP |            |  70102 |00:00:00.01 |    1031 |   1005 |      0 |       |       |          |         |
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Note that we have a “hash join buffered” at operation 3, which means incoming data from t2 (the probe table) was buffered before being joined; but we can see 310 writes and reads to temporary space due to that operation. That’s actually t2 spilling from the buffer to disc and then being read back from disc after the t2 scan (at operation 11) completes.

If this pattern of activity appears how is it reflected in the statistics?

Because the build table was completely built in memory you could argue for an “optimal workarea execution”; but because the probe table was temporarily spilled to disc you could argue for a “one-pass workarea operation”. In this case I was able to say that the writes and reads for operation 3 were the t2 table spilling to disc  because (a) I understand (mostly) how parallel hash joins work, (b) I had trace files I could check for timing, and (c) the O/1/M column of the output show 2 optimal operations and no one-pass operations.

The question to which I don’t (yet) know the answer is this: is the I/O caused by this spill to disc captured in v$pgastat under the stats:

NAME                                              VALUE UNIT
-------------------------------------------- ---------- ------------
extra bytes read/written                              0 bytes
cache hit percentage                                100 percent

If so, then you end up with questions like the one on OTN:

How come OEM reports my PGA cache Hit Ratio at 76% when I’ve got so much more memory than seems necessary ?

With the follow-up comment:

All my workarea executions were optimal

Footnote

Though I haven’t examined it recently, I remember noting in the past that the buffer allocation for the incoming probe table was typically a small fraction (often 1/8th or 1/16th) of the size of the buffer allocated for the build table. This may help to explain why you could end up doing a lot of I/O without exhausting the available memory.

I’m probably going to resurrect the example from the linked blog note and check the effects before I publish this post.

 


Cost is Time (again)

Tue, 2017-02-28 05:19

The hoary old question about lower cost queries running faster or slower that higher cost queries has appeared once again on the OTN database forum. It’s one I’ve addressed numerous times in the past – including on this blog – but the Internet being what it is the signal keeps getting swamped by the noise. This time around a couple of “new” thoughts crossed my mind when reading the question.

There is a Time column on the standard forms of the execution plan output, and the description of this column is available in the manuals and has been for years (here’s a definition from v$sql_plan from 10gR2, for example):

Elapsed time (in seconds) of the operation as estimated by the optimizer’s cost-based approach. For statements that use the rule-based approach, this column is null.

So the first question is this: why are people looking at the cost when they’re asking about the time ? The second question arises from the bit in brackets (parentheses): the time is given in seconds – so how accurate do you think the optimizer’s estimates of ANYTHING are when the best estimate the optimizer will give you for run-time has a granularity of a second ?

Of course there’s a further observation I could make (which only echoes the first question):  I don’t think I’ve ever seen anyone come up with the question: “Will a query with a lower value for Time run faster or slower than a query with a higher value for Time?”

Bottom Line:

Cost is supposed to be a measure of resource usage (per execution of each operation) and should therefore be a measure of time – but the model fails in many ways so when a plan clearly doesn’t meet reasonable expectations for performance you can (often) use the Cost column as an indicator of where the model has failed and this may give you some clues of how to address the problem.

It is unfortunate that before you can recognise when a particular Cost figure is bad you usually need to know something about the data content, the data distribution  pattern, the run-time caching effects, and the way the optimizer does its arithmetic.

 


Truncate 12c

Thu, 2017-02-16 06:52

Here’s one of those little improvements in 12c (including 12.1) that will probably end up being described as “little known features” in about 3 years time. Arguably it’s one of those little things that no-one should care about because it’s not the sort of thing you should do on a production system, but that doesn’t mean it won’t be seen in the wild.

Rather than simply state the feature I’m going to demonstrate it, starting with a little code to build a couple of tables with referential integrity:


create table parent (
        id      number(4),
        name    varchar2(10),
        constraint par_pk primary key (id)
)
;

create table child(
        id_p    number(4)
                        constraint chi_fk_par
                        references parent
                        on delete cascade,
        id      number(4),
        name    varchar2(10),
        constraint chi_pk primary key (id_p, id)
)
;

insert into parent values (1,'Smith');
insert into parent values (2,'Jones');

insert into child values(1,1,'Sally');
insert into child values(1,2,'Simon');

insert into child values(2,1,'Jack');
insert into child values(2,2,'Jill');

commit;


There’s one important detail in this code that isn’t taking the default and isn’t used very frequently – it’s the option on the foreign key to take the action “on delete cascade”. If you delete a row from the parent table then Oracle will automatically delete any referenced rows from the child table first thus avoiding the error ORA-02292: integrity constraint (TEST_USER.CHI_FK_PAR) violated – child record found. (Conveniently I have a suitable index on the child table that will bypass the problem of a mode 4 (or, where child rows already exist, mode 5) TM lock being taken on the child as the parent row is deleted.)

And here’s the demonstration of the new feature – working in 12.1 onwards:


truncate table parent;

truncate table parent cascade;

The first command will raise Oracle error ORA-02266: unique/primary keys in table referenced by enabled foreign keys, but the second command will truncate the parent and child tables “simultaneously”: but only if the referential integrity constraint is set to “on delete cascade”. If the referential integrity constraint is left to its default action then the second command will raise error: ORA-14705: unique or primary keys referenced by enabled foreign keys in table “TEST_USER”.”CHILD”

This feature (and several broadly similar features) also works with matching partitions of equi-partitioned (or ref partitioned) tables – and that’s a context where the requirement  is much more likely to appear than with non-partitioned tables.

 


Band Join 12c

Mon, 2017-02-13 07:53

One of the optimizer enhancements that appeared in 12.2 for SQL is the “band join”. that makes certain types of merge join much more  efficient.  Consider the following query (I’ll supply the SQL to create the demonstration at the end of the posting) which joins two tables of 10,000 rows each using a “between” predicate on a column which (just to make it easy to understand the size of the result set)  happens to be unique with sequential values though there’s no index or constraint in place:

select
        t1.v1, t2.v1
from
        t1, t2
where
        t2.id between t1.id - 1
                  and t1.id + 2
;

This query returns nearly 40,000 rows. Except for the values at the extreme ends of the range each of the 10,000 rows in t2 will join to 4 rows in t1 thanks to the simple sequential nature of the data. In 12.2 the query, with rowsource execution stats enabled, completed in 1.48 seconds. In 12.1.0.2 the query, with rowsource execution stats OFF, took a little over 14 seconds. (With rowsource execution stats enabled it took 12.1.0.2 a little over 1 minute to return the first 5% of the data – I didn’t bother to wait for the rest, though the rate would have improved over time.)

Here are the two execution plans – spot the critical difference:


12.1.0.2
-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   1 |  MERGE JOIN          |      |    25M|   715M|  1058  (96)| 00:00:01 |
|   2 |   SORT JOIN          |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   3 |    TABLE ACCESS FULL | T1   | 10000 |   146K|    27   (4)| 00:00:01 |
|*  4 |   FILTER             |      |       |       |            |          |
|*  5 |    SORT JOIN         |      | 10000 |   146K|    29  (11)| 00:00:01 |
|   6 |     TABLE ACCESS FULL| T2   | 10000 |   146K|    27   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T2"."ID"<="T1"."ID"+2) 5 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID">="T1"."ID"-1)

12.2.0.1
----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   1 |  MERGE JOIN         |      | 40000 |  1171K|    54  (12)| 00:00:01 |
|   2 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T1   | 10000 |   146K|    25   (4)| 00:00:01 |
|*  4 |   SORT JOIN         |      | 10000 |   146K|    27  (12)| 00:00:01 |
|   5 |    TABLE ACCESS FULL| T2   | 10000 |   146K|    25   (4)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access("T2"."ID">="T1"."ID"-1)
       filter("T2"."ID"<="T1"."ID"+2 AND "T2"."ID">="T1"."ID"-1)

Notice how operation 4, the FILTER, that appeared in 12.1 has disappeared in 12.2 and the filter predicate that it used to hold is now part of the filter predicate of the SORT JOIN that has been promoted to operation 4 in the new plan.

As a reminder – the MERGE JOIN operates as follows: for each row returned by the SORT JOIN at operation 2 it calls operation 4. In 12.1 this example will then call operation 5 so the SORT JOIN there happens 10,000 times. It’s important to know, though, that the name of the operation is misleading; what’s really happening is that Oracle is “probing a sorted result set in local memory” 10,000 times – it’s only on the first probe that it finds it has to call operation 6 to read and move the data into local memory in sorted order.

So in 12.1 operation 5 probes (accesses) the in-memory data set starting at the point where t2.id >= t1.id – 1; I believe there’s an optimisation here because Oracle will recall where it started the probe last time and resume searching from that point; having found the first point in the in-memory set where the access predicate it true Oracle will walk through the list passing each row back to the FILTER operation as long as the access predicate is still true, and it will be true right up until the end of the list. As each row arrives at the FILTER operation Oracle checks to see if the filter predicate there is true and passes the row up to the MERGE JOIN operation if it is. We know that on each cycle the FILTER operation will start returning false after receiving 4 rows from SORT JOIN operation – Oracle doesn’t.  On average the SORT JOIN operation will send 5,000 rows to the FILTER operation (for a total of 50,000,000 values passed and discarded).

In 12.2, and for the special case here where the join predicate uses constants to define the range, Oracle has re-engineered the code to eliminate the FILTER operation and to test both parts of the between clause in the same subroutine it uses to probe and scan the rowsource. In 12.2 the SORT JOIN operation will pass 4 rows up to the MERGE JOIN operation and stop scanning on the fifth row it reaches. In my examples that’s an enormous (CPU) saving in subroutine calls and redundant tests.

Footnote:

This “band-join” mechanism only applies when the range is defined by constants (whether literal or bind variable). It doesn’t work with predicates like (e.g.):

where t2.id between t1.id - t1.step_back and t1.id + t1.step_forward

The astonishing difference in performance due to enabling rowsource execution statistics is basically due to the number of subroutine calls eliminated – I believe (subject to a hidden parameter that controls a “sampling frequency”) that Oracle will call the O/S clock twice each time it calls the SORT JOIN operation from the FILTER operation to acquire the next row. In 12.1 we’re doing 50M calls redundant calls to SORT JOIN.

The dramatic difference in performance even when rowsource execution statistics isn’t enabled is probably something you won’t see very often in a production system – after all, I engineered a fairly extreme data set and query for the purposes of demonstration. Note, however, the band join does introduce a change in cost, so it’s possible that on the upgrade you may find a few cases where the optimizer will switch from a nested loop join to a merge join using a band-join.


Index bouncy scan

Thu, 2017-02-09 07:05

There’s a thread running on OTN at present about deleting huge volumes of duplicated data from a table (to reduce it from 1.1 billion to about 22 million rows). The thread isn’t what I’m going to talk about, though, other than quoting some numbers from it to explain what this post is about.

An overview of the requirement suggests that a file of about 2.2 million rows is loaded into the table every week with (historically) no attempt to delete duplicates. As a file is loaded into the table every row gets the same timestamp, which is the sysdate at load time. I thought it would be useful to know how many different timestamps there were in the whole table.  (From an averaging viewpoint, 1.1 billion rows at 2.2 million rows per week suggests about 500 dates/files/weeks – or about 9.5 years – but since the table relates to “customer accounts” it seems likely that the file was originally smaller and has grown over time, which means the hiostory may be rather longer than that.)

Conveniently there is an index on the “input_user_date” column in the table so we might feel happy running a query that simply does:


select
        distinct input_user_date
from
        customer_account
order by
        input_user_date
;

We might then refine the query to do a count(*) aggregate, or do some analytics to find any strange gaps in the timing of the weekly loads. However, all I’m really interested in is the number of dates because I’ve suggested we could de-duplicate the data by running a PL/SQL process that does a simple job for each date in turn, and I want to get an idea of how many times that job will run so that I can estimate how long the entire process might take.

The trouble with the basic query is that the table is (as you probably noticed) rather large, and so is the index. If we assume 8 bytes (which includes the length byte) for a date, 7 bytes for the rowid, 4 bytes overhead, and 100% packing we get about 420 index entries per leaf blocks, so with 1.1 billion entries the index is about 2.6 million leaf blocks. If the index had been built with compression (which means you’d only be recording a date once per leaf block) it would still be about 1.6 million leaf blocks. Fortunately we wouldn’t have to do much “real” sorting to report just a list of distinct values, or even the count(*) for each date, if we made Oracle use an index full scan – but it’s still a lot of work to read 1.6 million blocks (possibly using single block reads) and do even something as simple as a running count as you go. So I whipped up a quick and dirty bit of PL/SQL to do the job.

declare
        m_d1 date := to_date('01-Jan-0001');
        m_d2 date := to_date('01-Jan-0001');
        m_ct number := 0;
begin
        loop
                select
                        min(input_user_date)
                into
                        m_d2
                from
                        customer_account
                where
                        input_user_date > m_d1
                ;

                exit when m_d2 is null;

                m_ct := m_ct + 1;
                dbms_output.put_line('Count: ' || m_ct || '  Date: ' || m_d2);
                m_d1 := m_d2;

        end loop;
end;
/

The code assumes that the input_user_date hasn’t gone back to a silly date in the past to represent a “null date” (which shouldn’t exist anyway; if you want to use code like this but have a problem with a special “low-value” then you would probably be safest adding a prequel SQL that selects the min(columnX) where columnX is not null to get the starting value instead of using the a constant as I have done.

The execution path for the SQL statement should be an index-only: “index range scan (min/max)” which typically requires only 3 or 4 logical I/Os to find the relevant item for each date (which compares well with the estimated 2,200,000 / 420 = 5,238 leaf blocks we would otherwise have to scan through for each date). Here’s the path you should see:


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |       |       |     3 (100)|          |
|   1 |  SORT AGGREGATE              |       |     1 |     8 |            |          |
|   2 |   FIRST ROW                  |       |     1 |     8 |     3   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN (MIN/MAX)| CA_I1 |     1 |     8 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("INPUT_USER_DATE">:B1)

I did build a little data set as a proof of concept – and produced a wonderful example of how the scale and the preceding events makes a difference that requires you to look very closely at what has happened. I used a table t1 in my example with a column d1, but apart from the change in names the PL/SQL block was as above.Here’s the code I used to create the data and prepare for the test:


create table t1 nologging
as
select
        trunc(sysdate) + trunc((rownum - 1)/100) d1,
        rpad('x',100)   padding
from
        all_objects
where
        rownum <= 50000
;

execute dbms_stats.gather_table_stats(user,'t1')
alter table t1 modify d1 not null;

create index t1_i1 on t1(d1) nologging pctfree 95
;

select index_name, leaf_blocks from user_indexes;

alter system flush buffer_cache;

alter session set events '10046 trace name context forever, level 8';

My data set has 500 dates with 100 rows per date, and the pctfree setting for the index gives me an average of about 8 leaf blocks per date (for a total of 4,167 leaf blocks). It’s only a small index so I’m expecting to see just 2 or 3 LIOs per date, and a total of about 500 physical reads (one per date plus a handful for reading branch blocks). Here’s the output from the running tkprof against the trace file:


SELECT MIN(D1)
FROM
 T1 WHERE D1 > :B1


call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        1      0.00       0.00          0          0          0           0
Execute    501      0.00       0.01          0          0          0           0
Fetch      501      0.08       0.18       4093       1669          0         501
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1003      0.09       0.19       4093       1669          0         501

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 62     (recursive depth: 1)
Number of plan statistics captured: 1

Rows (1st) Rows (avg) Rows (max)  Row Source Operation
---------- ---------- ----------  ---------------------------------------------------
         1          1          1  SORT AGGREGATE (cr=3 pr=64 pw=0 time=9131 us)
         1          1          1   FIRST ROW  (cr=3 pr=64 pw=0 time=9106 us cost=3 size=8 card=1)
         1          1          1    INDEX RANGE SCAN (MIN/MAX) T1_I1 (cr=3 pr=64 pw=0 time=9089 us cost=3 size=8 card=1)(object id 252520)

I’ve done a physical read of virtually every single block in the index; but I have done only 3 buffer gets per date – doing fewer buffer gets than physical reads.

I’ve been caught by two optimisations (which turned out to be “pessimisations” in my test): I’ve flushed the buffer cache, so the Oracle runtime engine has decided to consider “warming up” the cache by reading extra blocks from any popular-looking objects that I’m accessing, and the optimizer may have given the run-time engine enough information to allow it to recognise that this index is subject to range scans and could therefore be a suitable object to use while warming up. As you can see from the following extracts from session events and session activity stats – we’ve done a load of multiblock reads through the index.


Event                                             Waits   Time_outs           Csec    Avg Csec    Max Csec
-----                                             -----   ---------           ----    --------    --------
db file sequential read                               1           0           0.03        .031           6
db file scattered read                              136           0          13.54        .100           1


Name                                                                     Value
----                                                                     -----
physical reads                                                           4,095
physical reads cache                                                     4,095
physical read IO requests                                                  137
physical reads cache prefetch                                            3,958
physical reads prefetch warmup                                           3,958

This isn’t likely to happen, of course, in the production system where we’ll be starting with a fully loaded cache and the leaf blocks we need are (logically) spaced apart by several thousand intervening blocks.

Footnote

I can’t remember who first brought this strategy to my attention – though I’m fairly sure it was one of my Russian colleagues, who has blogged about ways to work around what is effectively a limitation of the “index skip scan”. Apologies to the originator, and if you recognise your work here please add a comment with URL below.


Upgrades

Thu, 2017-02-02 07:38

This is a note I wrote a couple of years ago, but never published. Given the way it’s written I think it may have been the outline notes for a presentation that I was thinking about rather than an attempt to write a little essay. Since it covers a number of points that are worth considering and since I’ve just rediscovered it by accident I thought I’d publish it pretty much as is. Many of the examples of change are now quite old – the intent was to demonstrate how to be cautious rather than trying to supply every possible change that might your next upgrade.

We start with a couple of

  • The effort worth spending to minimise the risk of performance-related surprises on an upgrade depends on the how critical the system is.
  • The effort needed to minimise the risk of performance-related surprises on an upgrade depends on how complex the system is.
  • The more “rare” features and “cunning strategies” and the greater the level of concurrent activity the more likely you are to find surprising effects.

Typical causes of problems are:

  • New automatic jobs installed by Oracle, which might cause extra load during overnight batch tasks
    • e.g. automatic stats collection (10g)
    • Automatic evolution of baselines (12c)
  • Changes to existing packaged procedures
    • e.g. switch to atomic refresh of MVs (11g)
      • changed the time take to do the refresh itself and added a new load to the redo log activity
  • automatic histograms (10g)
    • changed the time taken to collect stats
    • changed lots of execution plans as a side effect
    • changed the behaviour of cursor_sharing=similar as a side effect,
      • which increased CPU and library cache loading
  • “Notable changes” in behaviour
    • e.g. The standard audit trail no longer updates aud$ rows, it always inserts new records
      • This had side effects on redo generation
      • This also allowed Oracle to drop an index, with side effects on typical DBA queries
    • Oracle also introduced a timestamp with time zone as the “audit timestamp”
      • This had a catastrophic side effects on a particular OEM “failed logins” query
  • New optimizer  features appear
    • This can introduce lots of new plans – and a few may perform badly for your data set
  • Optimizer bugs will be fixed
    • This can also introduce new plans – and a few may perform badly for your data set
  • New stats collection mechanisms
    • e.g. approximate_ndv with auto_sample_size (11g)
    • If introduced by default you could get unlucky
      • (See histogram comment above, for example).
    • If not introduced by default (now) you might want to test and adopt them on the upgrade
    • This may require changes to your current coding, and checks for bad changes in plans.
  • Concurrency issues
    • Could introduce side effects in locking, latching, mutexes on busy systems
    • Very hard to test without knowing the system
    • e.g. partition exchange and query optimisation colliding (various changes in 11g)

Background reading which can prompt you to look at particular features of your application that might be at risk.

  • Read the “New Features” guide
  • Read the “Changes in this release” section of the various reference guides
  • Check MoS for upgrade notes for the version you are aiming at
  • Search MoS for any documents with the text: “Notable changes {in|of} behaviour”
  • Check MoS for bug fix notes for later releases (e.g. 12.1 if upgrading to 11.2)

Test as much as possible, and as realistically as possible. You could take advantage of flashback database to repeat large test suites starting from the same point in time with the same data set, but making changes to database structures or code strategies between tests. Proper concurrency tests are ultimately going to be the hardest things to do right – but you have to work on it (or understand Oracle’s architecture really well) if you are upgrading a highly concurrent system.

 


ASSM Help

Mon, 2017-01-30 06:33

I’ve written a couple of articles in the past about the problems of ASSM spending a lot of time trying to find blocks with usable free space. Without doing a bit of rocket science with some x$ objects, or O/S tracing for the relevant calls, or enabling a couple of nasty events, it’s not easy proving that ASSM might be a significant factor in a performance problem – until you get to 12c Release 2 where a staggering number of related statistics appear in v$sysstat.

I’ve published the extract (without explanation) at the end of this note, but here’s just a short extract showing the changes in my session’s ASSM stats due to a little PL/SQL loop inserting one row at a time into an empty table with a single index:

Name                                                                         Value
----                                                                         -----
ASSM gsp:get free block                                                        185
ASSM cbk:blocks examined                                                       185
ASSM gsp:L1 bitmaps examined                                                   187
ASSM gsp:L2 bitmaps examined                                                     2
ASSM gsp:Search hint                                                             2
ASSM gsp:good hint                                                             185

It looks like we’ve checked a couple of “level 2” bitmap blocks (one for the table, one for the index, presumably) to pick a sequence of “level 1” bitmap blocks that have been very good at taking us to a suitable data (table or index) block that can be used.

You might have expected to see numbers more like 10,000 in the output, but remember that PL/SQL has lots of little optimisations built into it and one of those is that it pins a few blocks while the anonymous block is running so it doesn’t have to keep finding blocks for every single row.

In comparison here’s the effect of the same data load when operated at 10,000 separate insert statements called from SQL*Plus:

Name                                                                         Value
----                                                                         -----
ASSM gsp:get free block                                                     10,019
ASSM cbk:blocks examined                                                    10,220
ASSM cbk:blocks marked full                                                    201
ASSM gsp:L1 bitmaps examined                                                10,029
ASSM gsp:L2 bitmaps examined                                                     6
ASSM gsp:L2 bitmap full                                                          1
ASSM gsp:Search all                                                              1
ASSM gsp:Search hint                                                             2
ASSM gsp:Search steal                                                            1
ASSM gsp:bump HWM                                                                1
ASSM gsp:good hint                                                          10,016
ASSM rsv:fill reserve                                                            1

It’s interesting to note that in this case we see (I assume) a few cases where we’ve done the check for an L1 bitmap block, gone to a data blocks that was apparently free, and discovered that our insert would make to over full – hence the 201 “blocks marked full”.

Critically, of course, this is just another of the many little indications of how “client/server” chatter introduces lots of little bits of extra work when compared to the “Thick DB “ approach.

One final set of figures. Going back to an example that first alerted me to the type of performance catastrophes that ASSM could contribute to, I re-ran my test case on 12.2 and checked the ASSM figures reported. The problem was that a switch from a 4KB or 8KB blocks size to a 16KB bblock size produced a performance disaster. A version of my  test case and some timing results are available on Greg Rahn’s site.

In my test case I have 830,000 rows and do an update that sets column2 to column1 changing it from null to an 8-digit value. With a 16KB block size and PCTFREE set to a highly inappropriate value of 10 this is what the ASSM statistics looks like:


Name                                   Value
----                                   -----
ASSM gsp:get free block              668,761
ASSM cbk:blocks examined             671,404
ASSM cbk:blocks marked full            2,643
ASSM gsp:L1 bitmaps examined       1,338,185
ASSM gsp:L2 bitmaps examined         672,413
ASSM gsp:Search all                      332
ASSM gsp:Search hint                 668,760
ASSM gsp:Search steal                    332
ASSM gsp:bump HWM                        332
ASSM wasted db state change          669,395

I’d love to know what the figures would have looked like if they had been available in the original Oracle 9.2.0.8 case; they look fairly harmless in this case even though the database (according to other stats) did something like 10 times the work you might expect.

Even here, though, where the original catastrophic bug has been addressed, the ASSM stats give you an important clue: we’ve been doing a simple update so why have we spent so much effort looking for free space (get free block); even stranger, how come we had to examine 1.3M L1 bitmaps when we’ve only updated 830,000 rows surely the worst case scenario shouldn’t have been worse that 1 to 1; and then there’s that “wasted db state change” – I don’t understand exactly what that last statistic is telling me but when I’m worried about performance I tend to worry about anything that’s being wasted.

In passing – if you want to insert a single row into an unindexed table you can expect Oracle to examine the segment header, then an L2 bitmap block, then an L1 bitmap block to find a data block for the insert. (In rare cases that might be segment header, L3, L2, L1). There are then optimisation strategies for pinning blocks – the session will pin the L1 bitmap block briefly because it may have to check several data blocks it references because they may be full even though they are flagged as having space; similarly the session will pin the L2 bitmap block because it may need to mark an L1 bitmap block as full and check another L1 block. The latter mechanism probably explains why we have examined more L1 bitmaps than L2 bitmaps.

Finally, the full monty

Just a list of all the instance statistics that start with “ASSM”:

ASSM bg: segment fix monitor
ASSM bg:create segment fix task
ASSM bg:mark segment for fix
ASSM bg:slave compress block
ASSM bg:slave fix one segment
ASSM bg:slave fix state
ASSM cbk:blocks accepted
ASSM cbk:blocks examined
ASSM cbk:blocks marked full
ASSM cbk:blocks rejected
ASSM fg: submit segment fix task
ASSM gsp:Alignment unavailable in space res
ASSM gsp:L1 bitmaps examined
ASSM gsp:L2 bitmap full
ASSM gsp:L2 bitmaps examined
ASSM gsp:L3 bitmaps examined
ASSM gsp:Optimized data block rejects
ASSM gsp:Optimized index block rejects
ASSM gsp:Optimized reject DB
ASSM gsp:Optimized reject l1
ASSM gsp:Optimized reject l2
ASSM gsp:Search all
ASSM gsp:Search hint
ASSM gsp:Search steal
ASSM gsp:add extent
ASSM gsp:blocks provided by space reservation
ASSM gsp:blocks rejected by access layer callback
ASSM gsp:blocks requested by space reservation
ASSM gsp:bump HWM
ASSM gsp:get free block
ASSM gsp:get free critical block
ASSM gsp:get free data block
ASSM gsp:get free index block
ASSM gsp:get free lob block
ASSM gsp:good hint
ASSM gsp:reject L1
ASSM gsp:reject L2
ASSM gsp:reject L3
ASSM gsp:reject db
ASSM gsp:space reservation success
ASSM gsp:use space reservation
ASSM rsv:alloc from reserve
ASSM rsv:alloc from reserve fail
ASSM rsv:alloc from reserve succ
ASSM rsv:clear reserve
ASSM rsv:fill reserve
ASSM wasted db state change

DBaaS Performance

Fri, 2017-01-27 01:58

I don’t know how I missed it but Randolf Geist has been doing writing a series of posts on the performance of Oracle’s DBaaS offering, using a series of long-running tests to capture not only raw performance figures but also an indication of consistency. You can find all of these tests with a search URL on his blog, but I’ve also created a little index here to make it easier for me to access them in order.

Oracle Database Cloud (DBaaS) Performance Consistency Oracle Database Cloud (DBaaS) Performance

… to be continued (I hope).

h/t to Connor McDonald for the tweet that took me back to Randolf’s blog.

 


Pages