Jonathan Lewis

Subscribe to Jonathan Lewis feed Jonathan Lewis
Just another Oracle weblog
Updated: 16 hours 20 min ago

Unpivot

Wed, 2017-06-14 09:46

An interesting observation appeared recently as a side-channel on a question on the OTN database forum – how does Oracle execute an unpivot() operation. Here’s an example of such a query:

rem
rem     Script:         unpivot_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2017
rem

create table t1( id, col1, col2, col3, col4, col5, padding )
cache
pctfree 95 pctused 5
-- compress for query low
as
select
        1, 100 , 200 , 300 , 400 , 500,rpad('x',100)
from
        all_objects
where
        rownum <= 50000 ; execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

select 
        /*+ gather_plan_statistics find this */
        id, max(val) as high_val
from
        t1
unpivot include nulls (
        val for source in (col1, col2, col3, col4, col5)
)
group by id
order by id
; 

I’ve created a table with 50,000 rows (all_objects is quite large in 12.1.0.2 and 12.2.0.1), but with lots of free space per block so that I get three rows per block for a total of roughly 16,667 blocks which is going to make it fairly easy to spot any interesting session statistics. Then I’ve used an unpivot() call that has the effect of turning one row with five columns into five rows with one column.

Here’s the basic execution plan for the query (as pulled from memory):


-------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        | 10695 (100)|      1 |00:00:00.18 |   16671 |       |       |          |
|   1 |  SORT GROUP BY       |      |      1 |      1 | 10695   (2)|      1 |00:00:00.18 |   16671 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW               |      |      1 |    250K| 10653   (2)|    250K|00:00:01.41 |   16671 |       |       |          |
|   3 |    UNPIVOT           |      |      1 |        |            |    250K|00:00:00.52 |   16671 |       |       |          |
|   4 |     TABLE ACCESS FULL| T1   |      1 |  50000 |  2131   (2)|  50000 |00:00:00.12 |   16671 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------

According to the plan Oracle has done one full tablescan on the data (Starts = 1), but the unpivot operation has used this to produce 250,000 rows of output from 50,000 rows (A-Rows=) of input. Note, however, that the cost of the view operation is 5 times the cost of the tablescan but, on the other hand, the number of buffers visited is 16,671 (which matches the size of the table). So have we done 5 tablescans with a union all, or have we done one tablescan ?

The next step is to look at the 10053 (optimizer) trace file, specifically for the “unparsed query” which (despite the final plan table showing the plan we’ve just seen above) looked like this – after applying a few purely cosmetic changes:

Final query after transformations:******* UNPARSED QUERY IS *******
SELECT "from$_subquery$_002"."ID" "ID",MAX("from$_subquery$_002"."VAL") "HIGH_VAL" 
FROM    ( 
                   (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL1' "SOURCE","T1"."COL1" "VAL" FROM "TEST_USER"."T1" "T1") 
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL2' "SOURCE","T1"."COL2" "VAL" FROM "TEST_USER"."T1" "T1") 
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL3' "SOURCE","T1"."COL3" "VAL" FROM "TEST_USER"."T1" "T1") 
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL4' "SOURCE","T1"."COL4" "VAL" FROM "TEST_USER"."T1" "T1") 
        UNION ALL  (SELECT "T1"."ID" "ID","T1"."PADDING" "PADDING",'COL5' "SOURCE","T1"."COL5" "VAL" FROM "TEST_USER"."T1" "T1")
        ) "from$_subquery$_002" 
GROUP BY "from$_subquery$_002"."ID" 
ORDER BY "from$_subquery$_002"."ID"
;

And then there’s the outline (which I can see in the 10053 trace, or in the plan pulled from memory by a call to dbms_xplan.display_cursor()).


Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.1.0.2')
      DB_VERSION('12.1.0.2')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SEL$3")
      OUTLINE_LEAF(@"SEL$4")
      OUTLINE_LEAF(@"SEL$5")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"SEL$6")
      NO_ACCESS(@"SEL$6" "from$_subquery$_002"@"SEL$6")
      FULL(@"SEL$5" "T1"@"SEL$5")
      FULL(@"SEL$4" "T1"@"SEL$4")
      FULL(@"SEL$3" "T1"@"SEL$3")
      FULL(@"SEL$2" "T1"@"SEL$2")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

And if I also pull the alias (query block and fully qualified table name) information from memory:


Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$6
   2 - SET$1 / from$_subquery$_002@SEL$6
   3 - SET$1 
   4 - SEL$5 / T1@SEL$5

Then there’s the alternative (though not necessarily correct) way to find an “unparsed” version of the query – dbms_sql2.expand_sql_text() in 11g, dbms_utility.expand_sql_text() in 12c:


variable m_sql_out clob

declare
        m_sql_in    clob :=
                'select /*+ gather_plan_statistics find this */ id, max(val) as high_val
                from    t1
                unpivot include nulls ( val for source in (col1, col2, col3, col4, col5) )
                group by id
                order by id
                '
        ;
begin

--      dbms_sql2.expand_sql_text(        -- 11g
        dbms_utility.expand_sql_text(     -- 12c
                m_sql_in,
                :m_sql_out
        );

end;
/

set long 20000
print m_sql_out

M_SQL_OUT
--------------------------------------------------------------------------------
SELECT "A1"."ID" "ID",MAX("A1"."VAL") "HIGH_VAL" FROM  ( (SELECT "A3"."ID" "ID",
"A3"."PADDING" "PADDING",'COL1' "SOURCE","A3"."COL1" "VAL" FROM "TEST_USER"."T1"
 "A3") UNION ALL  (SELECT "A4"."ID" "ID","A4"."PADDING" "PADDING",'COL2' "SOURCE
","A4"."COL2" "VAL" FROM "TEST_USER"."T1" "A4") UNION ALL  (SELECT "A5"."ID" "ID
","A5"."PADDING" "PADDING",'COL3' "SOURCE","A5"."COL3" "VAL" FROM "TEST_USER"."T
1" "A5") UNION ALL  (SELECT "A6"."ID" "ID","A6"."PADDING" "PADDING",'COL4' "SOUR
CE","A6"."COL4" "VAL" FROM "TEST_USER"."T1" "A6") UNION ALL  (SELECT "A7"."ID" "
ID","A7"."PADDING" "PADDING",'COL5' "SOURCE","A7"."COL5" "VAL" FROM "TEST_USER".
"T1" "A7")) "A1" GROUP BY "A1"."ID" ORDER BY "A1"."ID"

So at some level the optimizer does think it’s doing 5 tablescans – and the t1 reported in the unpivot plan is from the fifth (i.e. last) copy of the table in the union all. But the execution statistics and the session statistics say I’ve done just one tablescan – and this persists even when I make the table so large that it (a) won’t fit the buffer cache, and/or (b) uses direct path reads, and/or (c) runs under Exadata with Hybrid columnar compression.

So what’s (probably) happening ?

I think Oracle has a code path that says it’s doing a union all of tablescans (at least for this example) but tells it that the union all is there as an expansion of an unpivot so (sticking with an example that does a tablescan into the buffer cache) Oracle reads the number of blocks dictated by the current multiblock read count into the cache, pins that one batch of blocks, scans the batch 5 times (or as required), unpins the batch and then reads the next batch. So the session does five tablescans but does them in a way that lets you see only one tablescan in the statistics.

Footenote

There was a change in the results when I tested this on 12.2.0.1; the unparsed query reported only a two-part union all subquery, and the table alias information in the plan identified the referenced table as the copy of the table from the second subquery in the union all. More significantly the cost of the VIEW operation was a close match to the cost of a single tablescan, rather than being a multiple thereof:


select * from table(dbms_xplan.display_cursor('1k077bzp0t6mn',null,'outline alias cost 

----------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      |      1 |        |  1117 (100)|      1 |00:00:00.13 |    8350 |   8334 |       |       |          |
|   1 |  SORT GROUP BY       |      |      1 |    250K|  1117   (6)|      1 |00:00:00.13 |    8350 |   8334 |  2048 |  2048 | 2048  (0)|
|   2 |   VIEW               |      |      1 |    250K|  1076   (2)|    250K|00:00:00.13 |    8350 |   8334 |       |       |          |
|   3 |    UNPIVOT           |      |      1 |        |            |    250K|00:00:00.09 |    8350 |   8334 |       |       |          |
|   4 |     TABLE ACCESS FULL| T1   |      1 |  50000 |  1074   (2)|  50000 |00:00:00.07 |    8350 |   8334 |       |       |          |
----------------------------------------------------------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$3
   2 - SET$1 / from$_subquery$_002@SEL$3
   3 - SET$1
   4 - SEL$2 / T1@SEL$2

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('12.2.0.1')
      DB_VERSION('12.2.0.1')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      OUTLINE_LEAF(@"SEL$2")
      OUTLINE_LEAF(@"SET$1")
      OUTLINE_LEAF(@"SEL$3")
      NO_ACCESS(@"SEL$3" "from$_subquery$_002"@"SEL$3")
      FULL(@"SEL$2" "T1"@"SEL$2")
      END_OUTLINE_DATA
  */


M_SQL_OUT
--------------------------------------------------------------------------------
SELECT "A1"."ID" "ID",MAX("A1"."VAL") "HIGH_VAL" FROM  (SELECT "A3"."ID" "ID","A
3"."PADDING" "PADDING",'COL1' "SOURCE","A3"."COL1" "VAL" FROM "TEST_USER"."T1" "
A3" UNION ALL SELECT "A3"."ID" "ID","A3"."PADDING" "PADDING",'COL2' "SOURCE","A3
"."COL2" "VAL" FROM "TEST_USER"."T1" "A3" UNION ALL SELECT "A3"."ID" "ID","A3"."
PADDING" "PADDING",'COL3' "SOURCE","A3"."COL3" "VAL" FROM "TEST_USER"."T1" "A3"
UNION ALL SELECT "A3"."ID" "ID","A3"."PADDING" "PADDING",'COL4' "SOURCE","A3"."C
OL4" "VAL" FROM "TEST_USER"."T1" "A3" UNION ALL SELECT "A3"."ID" "ID","A3"."PADD
ING" "PADDING",'COL5' "SOURCE","A3"."COL5" "VAL" FROM "TEST_USER"."T1" "A3") "A1
" GROUP BY "A1"."ID" ORDER BY "A1"."ID"
allstats last'));

It used to be hard enough working out what the run-time engine was doing when the optimizer and the run-time engine claimed that they were working on the same (transformed version of the) SQL; but now we can get the impression that the optimizer is directing the run-time engine to execute a plan that couldn’t possibly match the (unparsed) query that the optimizer had been considering.


dbms_sqldiag

Mon, 2017-06-12 06:48

If you’re familiar with SQL Profiles and SQL Baselines you may also know about SQL Patches – a feature that allows you to construct hints that you can attach to SQL statements at run-time without changing the code. Oracle 12c Release 2 introduces a couple of important changes to this feature:

  • It’s now official – the feature had been moved from package dbms_sqldiag_internal to package dbms_sqldiag.
  • The limitation of 500 characters has been removed from the hint text – it’s now a CLOB column.

H/T to Nigel Bayliss for including this detail in his presentation to the UKOUG last week, and pointing out that it’s also available for Standard Edition.

There are a couple of other little changes as you can see below from the two extract from the 12.2 declarations of dbms_sqldiag and dbms_sqldiag_internal below:


dbms_sqldiag
------------
FUNCTION CREATE_SQL_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_TEXT                       CLOB                    IN
 HINT_TEXT                      CLOB                    IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

FUNCTION CREATE_SQL_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_ID                         VARCHAR2                IN
 HINT_TEXT                      CLOB                    IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

dbms_sqldiag_internal
---------------------
FUNCTION I_CREATE_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_ID                         VARCHAR2                IN
 HINT_TEXT                      CLOB                    IN
 CREATOR                        VARCHAR2                IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

FUNCTION I_CREATE_PATCH RETURNS VARCHAR2
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_TEXT                       CLOB                    IN
 HINT_TEXT                      CLOB                    IN
 CREATOR                        VARCHAR2                IN
 NAME                           VARCHAR2                IN     DEFAULT
 DESCRIPTION                    VARCHAR2                IN     DEFAULT
 CATEGORY                       VARCHAR2                IN     DEFAULT
 VALIDATE                       BOOLEAN                 IN     DEFAULT

  • The function names changes from i_create_patch to create_patch when exposed in dbms_sqldiag.
  • There are two versions of the function – one that requires you to supply the exact SQL text, and a new version that allows you to supply an SQL ID.
  • The internal function also adds a creator to the existing parameter list – and it doesn’t have a default, so if you’ve got some code to use the internal version already it’s not going to work on 12.2 until you change it.

I was prompted to write this note by a tweet asking me if there’s any SQL available to see the contents of an SQL Profile in 11g and 12c. (I published some simple code several years ago for 10g, but Oracle changed the base tables in 11g). The answer is yes, probably on the Internet somewhere, but here’s some code I wrote a couple of years ago to scan the more recent versions of Oracle:

rem
rem     sql_profile_baseline_11g.sql
rem     J.P.Lewis
rem     July 2010
rem

set pagesize 60
set linesize 132
set trimspool on

column hint format a70 wrap word
column signature format 999,999,999,999,999,999,999

break on signature skip 1 on opt_type skip 1 on plan_id skip 1

spool sql_profile_baseline_11g

select
        prf.signature,
        decode(
                obj_type,
                1,'Profile',
                2,'Baseline',
                3,'Patch',
                'Other'
        )       opt_type,
        prf.plan_id,
        extractvalue(value(tab),'.')    hint
from
        (
        select
                /*+ no_eliminate_oby */
                *
        from
                sqlobj$data
        where
                comp_data is not null
        order by
                signature, obj_type, plan_id
        )       prf,
        table(
                xmlsequence(
                        extract(xmltype(prf.comp_data),'/outline_data/hint')
                )
        )       tab
;


This will report the hints associated with SQL Baselines, SQL Profiles, and SQL Patches – all three store the data in the same base table. As a minor variation I also have a query that will reported a named profile/baseline/patch, but this requires a join to the sqlobj$ table. As you can see from the substitution variable near the end of the text, the script will prompt you for an object name.


set pagesize 60
set linesize 180
set trimspool on

column  plan_name format a32
column  signature format 999,999,999,999,999,999,999
column  category  format a10
column  hint format a70 wrap word

break on plan_name skip 1 on signature skip 1 on opt_type skip 1 on category skip 1 on plan_id skip 1

spool sql_profile_baseline_11g

select
        prf.plan_name,
        prf.signature,
        decode(
                obj_type,
                1,'Profile',
                2,'Baseline',
                3,'Patch',
                  'Other'
        )       opt_type,
        prf.category,
        prf.plan_id,
        extractvalue(value(hnt),'.') hint
from
        (
        select
                /*+ no_eliminate_oby */
                so.name         plan_name,
                so.signature,
                so.category,
                so.obj_type,
                so.plan_id,
                sod.comp_data
                from
                        sqlobj$         so,
                        sqlobj$data     sod
                where
                        so.name = '&m_plan_name'
                and     sod.signature = so.signature
                and     sod.category = so.category
                and     sod.obj_type = so.obj_type
                and     sod.plan_id = so.plan_id
                order by
                        signature, obj_type, plan_id
        )       prf,
        table (
                select
                        xmlsequence(
                                extract(xmltype(prf.comp_data),'/outline_data/hint')
                        )
                from
                        dual
        )       hnt
;


Lagniappe:

One of the enhancements that appeared in 12c for SQL Baselines was that the plan the baseline was supposed to produce was stored in the database so that Oracle could check that the baseline would still reproduce the expected plan before applying it. These plans (also generated for Profiles and Patches) are stored in the table sqlobj$plan, and the dbms_xplan package has been enhanced with three new functions to report them:


FUNCTION DISPLAY_SQL_PATCH_PLAN RETURNS DBMS_XPLAN_TYPE_TABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 NAME                           VARCHAR2                IN
 FORMAT                         VARCHAR2                IN     DEFAULT

FUNCTION DISPLAY_SQL_PLAN_BASELINE RETURNS DBMS_XPLAN_TYPE_TABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SQL_HANDLE                     VARCHAR2                IN     DEFAULT
 PLAN_NAME                      VARCHAR2                IN     DEFAULT
 FORMAT                         VARCHAR2                IN     DEFAULT

FUNCTION DISPLAY_SQL_PROFILE_PLAN RETURNS DBMS_XPLAN_TYPE_TABLE
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 NAME                           VARCHAR2                IN
 FORMAT                         VARCHAR2                IN     DEFAULT

e.g.
SQL> select * from table(dbms_xplan.display_sql_profile_plan('SYS_SQLPROF_015c9bd3bceb0000'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL text: select        t1.id, t2.id from       t1, t2 where    t1.id between 10000 and
          20000 and     t2.n1 = t1.n1 and       t2.n1 = t2.v2
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
SQL Profile Name: SYS_SQLPROF_015c9bd3bceb0000
Status:           ENABLED
Plan rows:        From dictionary
--------------------------------------------------------------------------------

Plan hash value: 3683239666

-----------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
-----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |          | 10501 |   287K|   248   (4)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR         |          |       |       |     0   (0)|          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10002 | 10501 |   287K|   248   (4)| 00:00:01 |  Q1,02 | P->S | QC (RAND)  |
|*  3 |    HASH JOIN BUFFERED   |          | 10501 |   287K|   248   (4)| 00:00:01 |  Q1,02 | PCWP |            |
|   4 |     PX RECEIVE          |          | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,02 | PCWP |            |
|   5 |      PX SEND HASH       | :TQ10000 | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,00 | P->P | HASH       |
|   6 |       PX BLOCK ITERATOR |          | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,00 | PCWC |            |
|*  7 |        TABLE ACCESS FULL| T1       | 10002 |    97K|   123   (3)| 00:00:01 |  Q1,00 | PCWP |            |
|   8 |     PX RECEIVE          |          |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,02 | PCWP |            |
|   9 |      PX SEND HASH       | :TQ10001 |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,01 | P->P | HASH       |
|  10 |       PX BLOCK ITERATOR |          |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,01 | PCWC |            |
|* 11 |        TABLE ACCESS FULL| T2       |   104K|  1845K|   124   (4)| 00:00:01 |  Q1,01 | PCWP |            |
-----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T2"."N1"="T1"."N1")
   7 - filter("T1"."ID"<=20000 AND "T1"."ID">=10000)
  11 - filter("T2"."N1"=TO_NUMBER("T2"."V2"))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2

Disclaimer – I’ve checked only the SQL_PROFILE function call on 12.2, after creating a profile to check that my old 11g report still worked in 12c.

 


12.2 Partitions

Fri, 2017-06-09 04:13

At the end of my presentation to the UKOUG Database SIG yesterday I summed up (most) of points I’d made with a slide making the claim:

In 12.2 you can: Convert a simple table to partitioned with multi-column automatic list partitions, partially indexed, with read only segments, filtering out unwanted data, online in one operation.

 

Last night I decided I ought to demonstrate the claim – so here’s a little code, first creating a simple heap table:


rem
rem     Script:         122_features.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2017
rem
rem     Last tested
rem             12.2.0.1
rem

create table t1(
        date_start      not null,
        date_end        not null,
        id              not null,
        client_id,
        resort_code,
        uk_flag,
        v1,
        padding,
        constraint t1_range_ck check ((date_end - date_start) in (7, 14, 21))
)
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        trunc(sysdate,'yyyy') + 7 *  mod(rownum, 8)                                     date_start,
        trunc(sysdate,'yyyy') + 7 * (mod(rownum, 8) + trunc(dbms_random.value(1,4)))    date_end,
        rownum                                          id,
        trunc(dbms_random.value(1e5,2e5))               client_id,
        trunc(dbms_random.value(1e4,2e4))               resort_code,
        case when mod(rownum,275) = 0 then 1 end        uk_flag,
        lpad(rownum,10,'0')                             v1,
        lpad('x',100,'x')                               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e5 -- > "GT" inserted to avoid WordPress formatting issue
;

create index t1_client_idx on t1(client_id);
create index t1_resort_idx on t1(resort_code);
create index t1_ukflag_idx on t1(uk_flag);

alter table t1 add constraint t1_pk primary key(id);

I’ve got a table which models a travel company that arranges holidays that last one, two, or three weeks and (for convenience) they all start on the same day for the week. So I generate a start and end date for each row, making sure the start date is a multiple of seven days from a base date while the end date is 7, 14, or 21 days later. I’ve got a few indexes on the data, and a primary key constraint. There’s a special flag column on the table for holidays in the UK, which is a small parcentage of the holidays booked.

Eventually, when the data gets too big, I decide that I want to partition this data, and the obvious partitioning idea that springs to mind is to partition it so that holidays with the same start date and duration are all in the same partition and each partition holds a single start/duration.

I’ve also decided that I’m going to make old data read-only, and I’m not interested in the UK holidays once they gone into history so I’m going to get rid of some of them.

The index protecting the primary key will have to be global since it won’t contain the partition key; since the index on uk_flag covers a small amount of data I’m going to keep that global as well, but I want the other two indexes to be local – except for the older data I’m not really interested in keeping the index on client id.

And I don’t want to stop the application while I’m restructuring the data.

So here’s my one SQL statement:


alter table t1 modify 
partition by list (date_start, date_end) automatic (
        partition p11 values (to_date('01-Jan-2017'),to_date('08-Jan-2017')) indexing off read only,
        partition p12 values (to_date('01-Jan-2017'),to_date('15-Jan-2017')) indexing off read only,
        partition p13 values (to_date('01-Jan-2017'),to_date('22-Jan-2017')) indexing off read only,
        partition p21 values (to_date('08-Jan-2017'),to_date('15-Jan-2017')) indexing off read only,
        partition p22 values (to_date('08-Jan-2017'),to_date('22-Jan-2017')) indexing off read only,
        partition p23 values (to_date('08-Jan-2017'),to_date('29-Jan-2017')) indexing off read only,
        partition p31 values (to_date('15-Jan-2017'),to_date('22-Jan-2017')) indexing off read only,
        partition p32 values (to_date('15-Jan-2017'),to_date('29-Jan-2017')) indexing off read only,
        partition p33 values (to_date('15-Jan-2017'),to_date('05-Feb-2017')) indexing off read only
)
including rows where uk_flag is null or (date_start > to_date('01-feb-2017','dd-mon-yyyy'))
online
update indexes (
        t1_client_idx local indexing partial,
        t1_resort_idx local,
        t1_ukflag_idx indexing partial
)
;

Key Points
  • partition by list (date_start, date_end) — partitioned by a multi-column list
  • automatic — if data arrives for which there is on existing partition a new one will be created
  • indexing off — some of my partitions (the pre-defined (oldest) ones) will be subject to partial indexing
  • read only — some of my partitions (the pre-defined (oldest) ones) will be made read only
  • including rows where — some of my rows will disappear during copying [1]
  • online — Oracle will be journalling the data while I copy and apply the journey at the end
  • update indexes – specify some details about indexes [2]
  • local — some of the rebuilt indexes will be local
  • indexing partial — some of the rebuilt indexes will not hold data (viz: for the partitions declared “indexing off”)

I’ve footnoted a couple of the entries:

[1] – the copy is done read-consistently, so data inserted while the copy takes place will still appear in the final table, even if it looks as if it should have failed the including rows clause.

[2] – indexes which include the partition key will automatically be created as local indexes (and you can declare them here as global, or globally partitioned, if you want to). The manual has an error on this point; it suggests that prefixed indexes will be created as local indexes but then defines “prefixed” to mean contains the partition key” rather than the usual starts with the partition key”.

Job done – except for the exhaustive tests that it’s been done correctly, the load test to see how it behaves when lots of new holidays are being booked and current ones being modified, and a little bit of clearing up of “surprise” partitions that shouldn’t be there and changing some of the automatically generated table partitions to be “indexing off” (if and when necessary).

Here are a few queries – with results – showing the effects this one statement had:


select count(*) from t1;

/*
  COUNT(*)
----------
     99773

-- some rows (old UK) have disappeared from the original 10,000
*/


select
        index_name, partitioned, status, leaf_blocks, num_rows , indexing, orphaned_entries
from
        user_indexes
where   table_name = 'T1'
order by
        partitioned, index_name
;

/*
INDEX_NAME           PAR STATUS   LEAF_BLOCKS   NUM_ROWS INDEXIN ORP
-------------------- --- -------- ----------- ---------- ------- ---
T1_PK                NO  VALID            263      99773 FULL    NO
T1_UKFLAG_IDX        NO  VALID              1        136 PARTIAL NO
T1_CLIENT_IDX        YES N/A              149      62409 PARTIAL NO
T1_RESORT_IDX        YES N/A              239      99773 FULL    NO

-- Indexes: Local or global, full or partial.
*/

select
        segment_type, segment_name, count(*)
from
        user_segments
group by
        segment_type, segment_name
order by
        segment_type desc, segment_name
;

/*
SEGMENT_TYPE       SEGMENT_NAME                COUNT(*)
------------------ ------------------------- ----------
TABLE PARTITION    T1                                24
INDEX PARTITION    T1_CLIENT_IDX                     15
INDEX PARTITION    T1_RESORT_IDX                     24
INDEX              T1_PK                              1
INDEX              T1_UKFLAG_IDX                      1

-- One local index has fewer segments than the other
*/

set linesize 180
set trimspool on

column high_value format a85
break on index_name skip 1
set pagesize 200

select
        index_name, status, leaf_blocks, num_rows, partition_name, high_value
from
        user_ind_partitions
where
        index_name = 'T1_CLIENT_IDX'
--      index_name like 'T1%'
order by
        index_name, partition_position
;

/*
INDEX_NAME           STATUS   LEAF_BLOCKS   NUM_ROWS PARTITION_NAME         HIGH_VALUE
-------------------- -------- ----------- ---------- ---------------------- -------------------------------------------------------------------------------------
T1_CLIENT_IDX        UNUSABLE           0          0 P11                    ( TO_DATE(' 2017-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P12                    ( TO_DATE(' 2017-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P13                    ( TO_DATE(' 2017-01-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-22 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P21                    ( TO_DATE(' 2017-01-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P22                    ( TO_DATE(' 2017-01-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-22 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P23                    ( TO_DATE(' 2017-01-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-29 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P31                    ( TO_DATE(' 2017-01-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-22 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P32                    ( TO_DATE(' 2017-01-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-01-29 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     UNUSABLE           0          0 P33                    ( TO_DATE(' 2017-01-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                            , TO_DATE(' 2017-02-05 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')
                                                                             )

                     USABLE            10       4126 SYS_P1528              ( TO_DATE(' 2017-01-22 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4198 SYS_P1529              ( TO_DATE(' 2017-01-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4211 SYS_P1530              ( TO_DATE(' 2017-02-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4214 SYS_P1531              ( TO_DATE(' 2017-02-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-26 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4195 SYS_P1532              ( TO_DATE(' 2017-02-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-03-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4113 SYS_P1533              ( TO_DATE(' 2017-01-22 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-01-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE             9       4027 SYS_P1534              ( TO_DATE(' 2017-01-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4217 SYS_P1535              ( TO_DATE(' 2017-02-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4167 SYS_P1536              ( TO_DATE(' 2017-02-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-03-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4230 SYS_P1537              ( TO_DATE(' 2017-01-29 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4113 SYS_P1538              ( TO_DATE(' 2017-02-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-26 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4069 SYS_P1539              ( TO_DATE(' 2017-02-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-03-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4215 SYS_P1540              ( TO_DATE(' 2017-01-22 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4138 SYS_P1541              ( TO_DATE(' 2017-02-19 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-26 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )

                     USABLE            10       4176 SYS_P1542              ( TO_DATE(' 2017-02-05 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                            , TO_DATE(' 2017-02-12 00:00:00', 'syyyy-mm-dd hh24:mi:ss', 'nls_calendar=gregorian')
                                                                             )


*/

I’ve limited the index partition output to the index with partial indexing enabled so show that it’s the pre-defined partitions are marked as unusable and, as you can infer from the segement summary, those unusable index partition don’t have any segments space allocated to them.

Stress tests are left to the interested reader.


Histogram Upgrade – 2

Thu, 2017-06-01 12:00

While reading a blog post by Maria Colgan a couple of week ago I came across an observation about histograms that I’d not noticed before; worse still, it was a feature that seemed to make some “damage-limitation” advice I’d been giving for years a really bad idea! The threat appeared in these paragraphs:

Setting SIZE REPEAT ensures a histogram will only be created for any column that already has one. If the table is a partitioned table, repeat ensures a histogram will be created for a column that already has one on the global level.

What’s the down side to doing this?

The current number of buckets used in each histogram becomes the limit on the maximum number of buckets used for any histogram created in the future.

Unfortunately I’ve been saying for a very long time that you have to be very careful with histograms, and should probably create then through PL/SQL code; but if you have some frequency histograms that you’re sure are going to be well-behaved then using “for all columns size repeat” to gather the histogram is probably okay. But after making the claim above Maria’s blog posting demonstrated the truth of the claim, a demonstration that showed the highly undesirable consequences.

So imagine this: you create a frequency histogram which happens to produce 26 buckets on a particular column; from then on every time you run the gather with size repeat Oracle tries to generate 26 buckets. One day the data looks a little different, temporarily there are only 25 distinct values so on the next gather you get just 25 buckets – which means that when the “missing” value re-appears 12c will give you a Top-N histogram or even a hybrid histogram (11g would have to give you a height-balanced histogram if it noticed all 26 values). It is not safe to use size repeat if the number of distinct values that actually exist can vary from day to day.

I have to say that I was fairly shocked that I’d not come across this threat before – so obviously I created a simple model to check how nasty things could get. I had a copy of 11.2.0.4 handy and created a couple of tables cloning the data from all_objects because that’s got a couple of columns that are good for producing frequency histograms.


rem     Script:         histogram_repeat.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2017

drop table t2;
drop table t1;

create table t1 as select * from all_objects;
create table t2 as select * from t1;

delete from t1 where object_type = 'EDITION';
delete from t1 where object_type = 'EVALUATION CONTEXT';
commit;

pause ================  Baseline =======================================

select  count(distinct object_type), count(distinct owner) from t1;

execute dbms_stats.gather_table_stats(user,'t1',method_opt =>'for columns object_type owner')

select  column_name, count(*) 
from    user_tab_histograms 
where   table_name = 'T1' 
and     column_name in ( 'OBJECT_TYPE','OWNER') 
group by column_name 
order by column_name
;

select  column_name, num_buckets, histogram 
from    user_tab_columns 
where   table_name = 'T1' 
and     column_name in ( 'OBJECT_TYPE','OWNER') 
order by column_name
;

insert into t1 select * from t2 where object_type = 'EDITION';
insert into t1 select * from t2 where object_type = 'EVALUATION CONTEXT';
commit;

After creating the data I’ve deleted a few rows from t1, reported the number of distinct values in t1 for owner and object_type, then gathered stats on just those two columns using the default size. I’ve then reported the number of histogram buckets in two ways, by counting them in user_tab_histograms and by reporting them directly (with histogram type) from user_tab_columns. Then I’ve finished off by re-inserting (copying from t2) the rows I previously deleted, giving me a couple more object_type values in the table. Here are the results of the queries:


================  Baseline =======================================

COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOWNER)
-------------------------- --------------------
                        23                   11

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  17
OWNER                         7

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  17 FREQUENCY
OWNER                         7 FREQUENCY

I’m running on 11.2.0.4 – and I have two frequency histograms that have missed a few of the distinct values. But that’s because on the default settings 11g uses sampling (typically about 5,500 rows for smaller data sets) when creating histograms. So re-running the gather with size repeat shouldn’t allow the number of buckets to grow. Here’s what I got when I re-ran the gather (with size repeat) and two queries a further three times


method_opt =>'for columns object_type size repeat owner size repeat'

================  Repeat 1 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  16
OWNER                         9

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  16 FREQUENCY
OWNER                         9 FREQUENCY
================  Repeat 2 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  18
OWNER                         8

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  18 FREQUENCY
OWNER                         8 FREQUENCY
================  Repeat 3 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  13
OWNER                         9

COLUMN_NAME             Buckets HISTOGRAM
-------------------- ---------- ---------------
OBJECT_TYPE                  13 FREQUENCY
OWNER                         9 FREQUENCY

On the first repeat I got even fewer buckets; but on the second repeat the number of buckets bounced back up and even exceeded the original count; then on the third repeat the number of buckets dropped significantly. If you run the test your results will probably vary, but that’s the effect of the random selection of rows used to generate the histogram. Key point, though, the number of buckets generated by the gather is not limited by the current number of buckets.

But…

What happens with 12.1.0.2 – here are the results. Remember I deleted two sets of object_type before I gathered the first set of stats, then put them back in before doing the repeat gathers. (The number of distinct object_types in 12c is more than I had in 11g).


================  Baseline =======================================

COUNT(DISTINCTOBJECT_TYPE) COUNT(DISTINCTOWNER)
-------------------------- --------------------
                        27                   25

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 FREQUENCY
OWNER                         25 FREQUENCY

================  Repeat 1 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 TOP-FREQUENCY
OWNER                         25 FREQUENCY

================  Repeat 2 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 TOP-FREQUENCY
OWNER                         25 FREQUENCY

================  Repeat 3 =======================================

COLUMN_NAME            COUNT(*)
-------------------- ----------
OBJECT_TYPE                  27
OWNER                        25

COLUMN_NAME          NUM_BUCKETS HISTOGRAM
-------------------- ----------- ---------------
OBJECT_TYPE                   27 TOP-FREQUENCY
OWNER                         25 FREQUENCY

The number of distinct values for object_type is initially 27, but after gathering stats the first time I added back two more object_type values; but the subsequent gathers stuck with 27 buckets rather than extending to 29 buckets – so the histogram changed from frequency to Top-N. If you check Maria’s blog again you’ll see that this can make a big difference, particularly if the two new values happen to be the lowest and highest values for the column.

The number of buckets on a REPEAT is fixed by the number of existing buckets in 12c. That to me is a major change in behaviour and one you’ll have to watch out for on the upgrade. In 11g if the number of actual values stored dropped briefly the situation was self-correcting; if some new values were introduced the situation was self-correcting – although in both cases the histogram isn’t necessarily telling the truth the way you’d like it. In 12c the situation doesn’t self-correct. and may introduce a massive change in the arithmetic (as shown in Maria’s example).

The big difference, of course, is that 12c is gathering on a 100% sample using the variation of the approximate_ndv mechanism – so it will always find the right number of values if a frequency histogram is appropriate: presumably this is what was suppposed to make it okay to reproduce the number of buckets previously used. In 11g with its small sample size the number of buckets created couldn’t be guaranteed to match the number of distinct values, so I guess the code in 11g wasn’t written to be so rigorous in its assumption about the number of buckets to use next time.

tl;dr

When you upgrade from 11g to 12c think very carefully about whether or not you can still use a “table-level” size repeat to gather histograms – the upgrade may force you to identify specifically the columns that need histograms so that you can name with with an explicit (large enough) size in gather command.

 


Parallelism

Thu, 2017-05-25 09:48

Headline – if you don’t want to read the note – the /*+ parallel(N) */ hint doesn’t mean a query will use parallel execution, even if there are enough parallel execution server processes to make it possible. The parallel(N) hint tells the optimizer to consider the cost of using parallel execution for each path that it examines, but ultimately the optimizer will still take the lowest cost path (bar the odd few special cases) and that path could turn out to be a serial path.

The likelihood of parallelism appearing for a given query changes across versions of Oracle so you can be fooled into thinking you’re seeing bugs as you test new versions but it’s (almost certainly) the same old rule being applied in different circumstances. Here’s an example – which I’ll start off on 11.2.0.4:


create table t1
segment creation immediate
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 ; create index t1_i1 on t1(id); begin dbms_stats.gather_table_stats( ownname => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

set autotrace traceonly explain

select
        count(v1)
from    t1
where   id = 10
;

select
        /*+ parallel(4) */
        count(v1)
from    t1
where   id = 10
;

select
        /*+ parallel(4) full(t1) */
        count(v1)
from    t1
where   id = 10
;

set autotrace off

I haven’t declare the index to be unique, but it clearly could be; and it’s obvious that with 1M rows and about 120M of table a parallel full scan is probably a bad idea to acquire one row (even if you’re running Exadata!). So what do we get for the three plans – I’ll skip the predicate section – when we want to collect one row.


Base plan - unhinted
--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    16 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |    16 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    16 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Hinted parallel(4)
--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    16 |     4   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |    16 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |     1 |    16 |     4   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |     1 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Hinted parallel(4) and full(t1)
----------------------------------------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |     1 |    16 |   606   (2)| 00:00:02 |        |      |            |
|   1 |  SORT AGGREGATE        |          |     1 |    16 |            |          |        |      |            |
|   2 |   PX COORDINATOR       |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM) | :TQ10000 |     1 |    16 |            |          |  Q1,00 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE     |          |     1 |    16 |            |          |  Q1,00 | PCWP |            |
|   5 |      PX BLOCK ITERATOR |          |     1 |    16 |   606   (2)| 00:00:02 |  Q1,00 | PCWC |            |
|*  6 |       TABLE ACCESS FULL| T1       |     1 |    16 |   606   (2)| 00:00:02 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------

In 11.2.0.4 the optimizer did consider the parallel hint when it appeared on its own – but it has compared the parallel(4) cost of 606 with the serial index cost of 16 and chosen the indexed access path. This is not a case of ignoring the hint, it’s an example of being fooled if you don’t know how the hint is really supposed to work.

But here’s an interesting change that appeared in 12.2 – this time just the plan with the parallel(4) hint on its own:


---------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                               | Name     | Rows  | Bytes | Cost (%CPU)| Time     |    TQ  |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                        |          |     1 |    16 |     4   (0)| 00:00:01 |        |      |            |
|   1 |  SORT AGGREGATE                         |          |     1 |    16 |            |          |        |      |            |
|   2 |   PX COORDINATOR                        |          |       |       |            |          |        |      |            |
|   3 |    PX SEND QC (RANDOM)                  | :TQ10001 |     1 |    16 |            |          |  Q1,01 | P->S | QC (RAND)  |
|   4 |     SORT AGGREGATE                      |          |     1 |    16 |            |          |  Q1,01 | PCWP |            |
|   5 |      TABLE ACCESS BY INDEX ROWID BATCHED| T1       |     1 |    16 |     4   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX RECEIVE                        |          |     1 |       |     3   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   7 |        PX SEND HASH (BLOCK ADDRESS)     | :TQ10000 |     1 |       |     3   (0)| 00:00:01 |  Q1,00 | S->P | HASH (BLOCK|
|   8 |         PX SELECTOR                     |          |       |       |            |          |  Q1,00 | SCWC |            |
|*  9 |          INDEX RANGE SCAN               | T1_I1    |     1 |       |     3   (0)| 00:00:01 |  Q1,00 | SCWP |            |
---------------------------------------------------------------------------------------------------------------------------------

You get a parallel execution plan – although it starts with a serial index range scan which is operated for the new (12c) PX Selector operator that allocates a serial operation to one of the parallel execution slaves rather than running it through the query coordinator (QC). The serial range scan does a hash distribution (hashed by block address of the rowids it finds to avoid collisions between parallel execution slave as they do their table accesses) with the net effect that the cost of the index range scan drops from 16 to 4.

This is just one cute little trick that makes it worth looking at the upgrade to 12c – this new path is likely to be of benefit to people who had to create global (as opposed to globally partitioned) indexes on partitioned tables.

This note was prompted by a recent twitter comment by Timur Akhmadeev followed in short order by an OTN posting that added further confusion to the problem by running Siebel – which is just one of several 3rd party products that love to configure optimizer parameters with non-standard values like: optimizer_index_cost_adj = 1, or optimizer_mode = first_rows_10. (At the last update I’ve seen on the thread, there seemed to be some other reason why parallelism was being blocked.)


255 Again!

Tue, 2017-05-23 07:10

There’s so many things that can go wrong when you start using tables with more than 255 columns – here’s one I discovered partly because I was thinking about a client requirement, partly because I had a vague memory of a change in behaviour in 12c and Stefan Koehler pointed me to a blog note by Sayan Malakshinov when I asked the Oak Table if anyone remembered seeing the relevant note. Enough of the roundabout route, I’m going to start with a bit of code to create a table, stick a row in it, then update that row:

rem
rem     Script: wide_table_4.sql
rem     Author: Jonathan Lewis
rem     Dated:  May 2017
rem
rem     Last tested
rem             12.2.0.1
rem             12.1.0.2
rem             11.2.0,4
rem

set pagesize 0
set feedback off

spool temp.sql

prompt create table t1(

select
        'col' || to_char(rownum,'fm0000') || '  varchar2(10),'
from
        all_objects
where   rownum <= 320
;

prompt col0321 varchar2(10)
prompt )
prompt /

spool off

@temp

set pagesize 40
set feedback on

insert into t1 (col0010, col0280) values ('0010','0280');
commit;

update t1 set col0320 ='0320';
commit;

column file_no  new_value m_file_no
column block_no new_value m_block_no

select
        dbms_rowid.rowid_relative_fno(rowid)    file_no,
        dbms_rowid.rowid_block_number(rowid)    block_no,
        dbms_rowid.rowid_row_number(rowid)      row_no
from
        t1
;

alter system flush buffer_cache;
alter system dump datafile &m_file_no block &m_block_no;


So I’ve written one of those horrible scripts that write a script and then run it. The script creates a table with 320 columns and inserts a row that populates columns 10 and 280. That gets me two row pieces, one consisting of the 255 columns from columns 26 to 280 that goes in as row piece 0, the other consisting of the first 25 columns that goes in as row piece 1; the remaining 40 columns are not populated so Oracle “forgets” about them (“trailing nulls take no space”). The script then updates the row by setting column 320 to a non-null value.

For convenience I’ve then generated the file and block number (and row number, just to show its head piece went in as row 1 rather than row 0) of the row and done a symbolic block dump. The question is: what am I going to see in that block dump ?

Answer, and lots more, coming some time in the next 48 hours.


255 columns

Fri, 2017-05-19 11:49

This is one of my “black hole” articles – I drafted it six months ago, but forgot to publish it.

A recent post on OTN highlighted some of the interesting oddities that appear when you create tables with more than 255 columns. In fact this was a more subtle case than usual because it reminded us that it’s possible to have a partitioned table which appears to have less than the critical 255 columns while actually having more than 255 columns thanks to the anomaly of how Oracle handles dropping columns in a partitioned table.  (For a useful insight see this note from Dominic Brooks – and for a nice thought about preparing simple tables for an exchange with such a partitioned tables look at the 12.2 feature of “create table for exchange” in Maria Colgan’s article)

The thread took me down the path of trying to recreate some notes I wrote a long time ago and can no longer find, and the OP’s problem wasn’t the basic one I had assumed anyway, but I thought I’d publish a bit of the work I had done so that you can see another of the funny effects that appear when your table definition has too many columns (and you use them).

The OP told us about a table with more than 350 columns, so here’s a little script I wrote to generate a table with 365 columns and some data. (It turned out that the OP had more than 390 columns in the table, but 30+ had been “dropped”.)


rem
rem	Script:		wide_table_2.sql
rem	Author:		Jonathan Lewis
rem	Dated:		Nov 2016
rem	Purpose:	
rem
rem	Last tested 
rem		11.2.0.4
rem

create sequence s1;

declare
	m_statement_1	varchar2(32767) := 
		'create table t1(col0001 varchar2(10),';
	m_statement_2	varchar2(32767) := 
		'insert into t1 values(lpad(s1.nextval,10),';
begin
	for i in 2 .. 365 loop
		m_statement_1 := m_statement_1 ||
			'col' || to_char(i,'FM0000') || ' varchar2(100),'
		;
		m_statement_2 := m_statement_2 ||
			case when i in (2,3,4) 
			-- case when i in (122,123,124) 
			-- case when i in (262,263,264) 
				then 'dbms_random.string(''U'',ceil(dbms_random.value(0,100))),'
			     when i = 365
				then 'lpad(s1.currval,7))'
				else '''COL' || to_char(i,'FM0000') || ''','
			end
		;
	end loop;
	
	m_statement_1 := substr(m_statement_1, 1, length(m_statement_1) - 1);
	m_statement_1 := m_statement_1 || ') pctfree 25';

	execute immediate m_statement_1;

	for i in 1..10000 loop
		execute immediate m_statement_2;
	end loop;

end;
/


I’ve taken a fairly simple approach to building a string that creates a table – and it’s easy to adjust the number of columns – and a string to insert some values into that table. The insert statement will insert a row number into the first and last columns of the table and generate a random length string for a few of the columns. I’ve picked three possible sets of three columns for the random length string; one set is definitely going to be in the first row piece, one set is definitely going to be in the last row piece, and (since the row will split 110/255) one will be somewhere inside whichever is the larger row piece.

If I wanted to do something more sophisticated I’d probably have to switch to a PL/SQL array for the two statements strings – 32,767 characters doesn’t give me much freedom to play if I wanted to test a table with 1,000 columns.

Having created and populated my table, this is what I did the following three tests with it:


analyze table t1 compute statistics;

prompt	====
prompt	CTAS
prompt	====

create table t1a pctfree 25 as select * from t1;
analyze table t1a compute statistics;

select	table_name, num_rows, avg_row_len, blocks, chain_cnt
from	user_tables
where	table_name like 'T1%'
;

prompt	======
prompt	Insert
prompt	======

truncate table t1a;
insert into t1a select * from t1;
analyze table t1a compute statistics;

select	table_name, num_rows, avg_row_len, blocks, chain_cnt
from	user_tables
where	table_name like 'T1%'
;

prompt	=============
prompt	Insert append
prompt	=============

truncate table t1a;
insert /*+ append */ into t1a select * from t1;
analyze table t1a compute statistics;

select	table_name, num_rows, avg_row_len, blocks, chain_cnt
from	user_tables
where	table_name like 'T1%'
;

The first test creates a new table (t1a, at pctfree 25, matching the original) of the original table with a simple “create as select”.

The second test truncates the table and does a basic “insert as select” to repopulate the copy table.

Third test truncates the table again and does an “insert as select” with the /*+ append */ hint to repopulate the copy table.

In all three cases (and with three variations of where the longer random strings went) I used the analyze command to gather stats on the tables so that I could get a count of the number of chained rows; and I dumped a couple of blocks from the tables to see what the inserted rows looked like.

Here’s a summary of the results when the random-length columns are near the start of the row (the position didn’t really affect the outcome and the results for 12.1.0.2 were very similar):

====
CTAS
====

TABLE_NAME             NUM_ROWS AVG_ROW_LEN     BLOCKS  CHAIN_CNT
-------------------- ---------- ----------- ---------- ----------
T1                        10000        3062       6676       3313
T1A                       10000        3062       9504        237

======
Insert
======

TABLE_NAME             NUM_ROWS AVG_ROW_LEN     BLOCKS  CHAIN_CNT
-------------------- ---------- ----------- ---------- ----------
T1                        10000        3062       6676       3313
T1A                       10000        3062       6676       3287

=============
Insert append
=============

TABLE_NAME             NUM_ROWS AVG_ROW_LEN     BLOCKS  CHAIN_CNT
-------------------- ---------- ----------- ---------- ----------
T1                        10000        3062       6676       3313
T1A                       10000        3062       9504        237

As you can see we get two significantly different results: the CTAS and the “insert append” produce tables with 9,504 blocks and 237 chained rows reported, while the original table (single row inserts) and the regular “insert as select” produce tables with 6,676 blocks and 3,133 chained rows. It seems that the CTAS and direct path insert have minimised the number of chained rows at a cost of a dramatically increased number of blocks.

We know, of course, that every row in this table will consist of two row pieces, one of 110 columns and one of 255 columns; so every row is in some respects chained due to the potential for intra-block chaining of those two pieces, but the analyze command reports only those rows which start in one block and end in another block as chained rows – intra-block chaining doesn’t count (in this version of Oracle).

There are two questions to address in these results: the first is “What’s happening?”, the second, which we ask when we get the answer to the first, is “How come the direct path method still gives us some chained rows?”

I believe the answer to the first question is that the direct path method attempts to avoid chaining unavoidable row-pieces. Even if it means leaving a huge amount of empty space in a block Oracle starts a new row in a new block if there isn’t enough space for both of the anticipated row-pieces to fit in the current block. I think this may be a feature to help Exadata and direct path reads where a relatively small number of chained rows (which might be outside the current Exadata storage unit) could have a catastrophic impact on performance because the system would have to do a single block read to pick up the extra piece – which could have a devastating impact on the performance of the Exadata smart scan.

So why do some rows still see chaining under this strategy – I think it’s because there’s a small error in the arithmetic somewhere (possibly visible only in ASSM tablespaces, perhaps related to row-piece headers) where Oracle thinks there’s enough space for both row pieces but there isn’t quite so it tries to insert both pieces and has to chain the second one. (I’ll back this up with some analysis of block dumps in a moment).

So you have a choice – lots of wasted space and a little row-chaining, or maximum packing of data and (potentially) lots of row-chaining. But there’s more: I’ve said we get one row piece of 110 columns and one of 255 columns for each row, but the point at which the split occurs and the order in which the pieces are inserted depends on the method used.

  • Single row inserts (initial table creation): The split occurred at column 111 – so the leading 110 columns are in one row piece and the trailing 255 columns are in the other; and the row piece with the tail-end columns is inserted first.
  • Array inserts (normal): Exactly the same as the single row inserts.
  • Direct path inserts / CTAS: The split occurred at column 256, with the leading row-piece inserted first and the trailing 110 column row-piece inserted second.

I’m not sure that this really matters very much when you consider the dramatic difference in size that can appear in the comparison between direct path and normal inserts, but maybe there’s someone who will notice a performance (or even space) side effect because of this inconsistency. (And I haven’t even thought about the effects of basic or oltp compression yet!)

I said I’d come back to the row-chaining anomaly. One of the little details that I didn’t include in my code listing was the call to “analyze table report chained rows” that I did (after executing $OARCLE_HOME/rdbms/admin/utlchain.sql) to list the head rowids of the chained rows into the chained_rows table. After doing this I ran a simple pl/sql loop to dump all the relevant blocks to the trace file:

begin
	for r in (
		select 
			dbms_rowid.rowid_relative_fno(head_rowid) file#, 
			dbms_rowid.rowid_block_number(head_rowid) block# 
			from	chained_rows
		) loop
			execute immediate 'alter system dump datafile ' || r.file# || ' block ' || r.block#;
	end loop;
end;
/

Here’s a little extract from the resulting trace file showing you what the start of a row piece looks like when dumped:

tab 0, row 0, @0x1765
tl: 2075 fb: --H-F--- lb: 0x0  cc: 255
nrid:  0x01401bc4.1
col  0: [10]  20 20 20 20 20 20 20 33 36 32
col  1: [19]  41 52 43 4a 4a 42 4e 55 46 4b 48 4c 45 47 4c 58 4c 4e 56
col  2: [ 8]  59 4b 51 46 4a 50 53 55
col  3: [17]  53 58 59 4e 4a 49 54 4a 41 5a 5a 51 44 44 4b 58 4d
col  4: [ 7]  43 4f 4c 30 30 30 35
col  5: [ 7]  43 4f 4c 30 30 30 36

A convenient thing to check is the cc: entry (end of 2nd line). You can see that this row piece has 255 columns, and if you look at the first six columns dumped you can see that it’s the row numbered 362, then there are three columns of different length strings, then two columns with the values ‘COL00005’ and ‘COL0007’ respectively. It’s the “cc:” entry that’s useful though. I’m going to do a bit of simple unix hackery:

grep " cc: " test_ora_24398.trc | sed "s/^.*cc: //"  | sort | uniq -c | sort -n
      1 1
      5 2
    112 108
    125 109
    237 110
    474 255

In my 237 blocks with chained rows I had 474 row pieces of 255 columns and 237 row pieces of 110 columns; then I had 125 row pieces that had lost (and therefore chained) one column and 112 row pieces that had lost and therefore chained 2 columns. I also had a couple of small “tail-end” pieces from earlier blocks scattered in these blocks. These figures suggest that there’s a small error (actually no more than about 20 bytes) in the calculation Oracle does to decide if it can fit a whole row into the current block or whether it has to go on to the next empty block.

Conclusions

When copying a table defined with more than 255 columns there’s the potential for a huge variation in the space usage and chain count depending on whether you do a CTAS (or insert /*+ append */) or a simple insert. You have to decide which option is the biggest threat to your available resources.

There is a little anomaly with the way in which rows are split that is also dependent on the method used for copying – this may also have some effect, though perhaps small enough to be ignored when compared with the space/chaining difference.

Even though CTAS/direct path insert can eliminate a lot of row chaining it is still possible to find some row chaining in the resulting data. This may be the result of a calculation error (or possibly a deliberate space saving compromise).

Lagniappe

Oracle 12.2 may do things very differently.

 


Quantum Space

Wed, 2017-05-10 07:31

Here’s a not very serious note that makes a serious point.  I’ve got a small tablespace made up of 4 files, and here’s a little report I can run against the data dictionary for that tablespace:


select 'File space' What, nvl(sum(user_bytes)/1048576,0) MB from dba_data_files where tablespace_name = 'LOB_TEST'
union all
select 'Free space',      nvl(sum(bytes/1048576),0)         from dba_free_space where tablespace_name = 'LOB_TEST'
union all
select 'Extents',         nvl(sum(bytes/1048576),0)         from dba_extents    where tablespace_name = 'LOB_TEST'
union all
select 'Segments',        nvl(sum(bytes/1048576),0)         from dba_segments   where tablespace_name = 'LOB_TEST'
;

The name of the tablespace isn’t significant – it happens to be a tablespace I created to do some tests relating to space allocation with securefile LOBs, and it’s been hanging around ever since.

The query first reports the “user” space defined by the files – remember that there will be some space for header information and other metadata, and there may be some space near the end of the file which is smaller than the minimum extent allowed for that tablespace.

After the statement of total possible usable space we get: the free space, the space taken up by used extents, and the space taken up by segments.

Here’s the result I get currently:


WHAT               MB
---------- ----------
File space        196
Free space        196
Extents             0
Segments          196

The total usable space is 196MB, of which 196MB are free with no space allocated to extents — except 196MB HAS been allocated to segments. Depending how you look at it the tablespace is either full, or empty.

If you’re in the know the puzzle resolves itself if you query for the segment names, which look like: “BIN$TrDKUCvcVQbgUwEAAH9K2Q==$0” – I have one segment that is in the recycle bin, so I can recover it from the bin if I need it, but until I do Oracle can reuse the space if it wants it. Like Schrodinger’s cat the objects is both alive and dead until someone decides to peek.

After a call to “purge recyclebin” the result changes to:


WHAT               MB
---------- ----------
File space        196
Free space        196
Extents             0
Segments            0

On a production system you might need to issue “purge dba_recyclebin” (if you have the appropriate privilege) to resolve the apparent contradiction as the call to “purge recyclebin” applies only to objects in your own schema.

This note was prompted by a question on the OTN database forum about contradiction between a traditional SQL statement to report free and used space and a screen dump from Enterprise Manager.  I don’t think the thread reached a firm conclusion – but apart from the potential for the recyclebin to confuse the issue, there are extra possibilities thanks to auto-extensible data files, and the Enterprise Manager’s scope for querying a complete different set of views such as dba_tablespace_usage_metrics and v$filespace_usage. In fact the EM code clearly had at least one error in it (which makes any of its results suspect) because it managed to report the critical tablespace as 390% used!

 

 


opt_estimate

Mon, 2017-05-08 02:04

The opt_estimate hint is one of many that shouldn’t be used in end-user code and isn’t officially documented. Nevertheless – like so many other hints – it’s a hint that is hard to ignore when you see it floating around the code generated by the Oracle software. This note is prompted by a twitter question from fellow Oak Table member Stefan Koehler asking the about working of the hint’s index_filter parameter. Checking my library I knew the answer was yes – so after a quick exchange on twitter I said I’d write up a short note about my example, and this is it.

Although the hint is not one that you should use it’s worth writing this note as a reminder of the significance to index range scans of the access predicates and filter predicates that Oracle reports in the predicate section of an execution plan.

When a query does an index range scan it’s going to walk through a (logically) consecutive set of index leaf blocks, looking at each individual index entry in turn (and those index entries will be correctly “sorted” within the leaf block) to see if it should use the rowid it finds there to visit the table. For “perfect” use of an index Oracle may be able to identify the starting and ending positions it needs in the index and know that it should use every rowid in between to visit the table – there will no “wasted”examinations of index entries on the way; however in a query involving a multi-column index and multiple predicates Oracle might have to use predicates on the first column(s) of the index to identify the starting and ending positions, but use further predicates on later columns in the index to decide whether or not to use each index entry to visit the table.

The predicates that Oracle can use to identify the range of leaf blocks it should visit are called access predicates, and the predicates that Oracle can use to further eliminate rowids as it walks along the leaf blocks are called filter predicates.

The simplest way to demonstrate this is with a query of the form: “Index_Column1 = … and Index_Column3 = …”, and that’s what I’ll be using in my model:


rem
rem     Script:         opt_est_ind_filter.sql
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             11.2.0.4
rem             10.2.0.5
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        mod(rownum - 1,100)             n1,
        rownum                          n2,
        mod(rownum - 1, 100)            n3,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to bypass WordPress formatting issue
;

create index t1_i1 on t1(n1,n2,n3) nologging;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          => 'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

select leaf_blocks from user_indexes where index_name = 'T1_I1';

The number of leaf blocks in the index was 3,062.

I’ve defined n1 and n3 to match, and for any value between 0 and 99 there are 10,000 rows in the table where n1 and n3 hold that value. However, in the absence of a column group defined on (n1, n3), the optimizer is going to use its standard “no correlation” arithmetic to decide that there are 10,000 possible combinations of n1 and n3, and 100 rows per combination. Let’s see what this does for a simple query:


set autotrace traceonly explain

select  count(v1)
from    t1
where   n1 = 0 and n3 = 0
;

set autotrace off


--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    17 |   134   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE              |       |     1 |    17 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   100 |  1700 |   134   (1)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |   100 |       |    34   (3)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=0 AND "N3"=0)
       filter("N3"=0)

The plan shows an index range scan where n3=0 is used as a filter predicate and n1=0 (with a tiny bit of extra accuracy from the n3=0) predicate is used as the access predicate, and the optimizer has calculated that 100 rowids will be retrieved from the index and used to find 100 rows in the table.

The cost of the range scan is 34: The optimizer’s estimate is that the scale of the initial access to the index will be due to the predicate n1 = 0 which is responsible for 1% of the index – giving us 3,062/100 leaf blocks (rounded up). Added to that there will be a little extra cost for the trip down the blevel of the index and a little extra for the CPU usage.

Now let’s tell the optimizer that its cardinality estimate is out by a factor of 25 (rather than 100 we actually know it to be) in one of two different ways:

prompt  ============================
prompt  index_scan - scale_rows = 25
prompt  ============================

select
        /*+
                qb_name(main)
                index(@main t1(n1, n2, n3))
                opt_estimate(@main index_scan   t1, t1_i1, scale_rows=25)
        */
        count(v1)
from    t1 
where   n1 = 0 and n3 = 0
;

prompt  ==============================
prompt  index_filter - scale_rows = 25
prompt  ==============================

select
        /*+
                qb_name(main)
                index(@main t1(n1, n2, n3))
                opt_estimate(@main index_filter t1, t1_i1, scale_rows=25)
        */
        count(v1)
from    t1 
where   n1 = 0 and n3 = 0
;

In both examples I’ve hinted the index to stop the optimizer from switching to a tablescan; but in the first case I’ve told Oracle that the entire index range scan has to be scaled up by a factor of 25 while in the second case I’ve told Oracle that its estimate due to the final filter has to be scaled up by a factor of 25. How does this affect the costs and cardinalities of the plans:


============================
index_scan - scale_rows = 25
============================
--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    17 |  3285   (1)| 00:00:17 |
|   1 |  SORT AGGREGATE              |       |     1 |    17 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   100 |  1700 |  3285   (1)| 00:00:17 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |  2500 |       |   782   (2)| 00:00:04 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=0 AND "N3"=0)
       filter("N3"=0)



==============================
index_filter - scale_rows = 25
==============================
--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     1 |    17 |  2537   (1)| 00:00:13 |
|   1 |  SORT AGGREGATE              |       |     1 |    17 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| T1    |   100 |  1700 |  2537   (1)| 00:00:13 |
|*  3 |    INDEX RANGE SCAN          | T1_I1 |  2500 |       |    34   (3)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("N1"=0 AND "N3"=0)
       filter("N3"=0)

In both cases the cardinality estimate has gone up by a factor of 25 for the index range scan. Notice, though, that the optimizer is now suffering from cognitive dissonance – it “knows” that it’s got 2,500 rowids to use to visit the table, it “knows” there are no extra predicates to eliminate rows from the table when it gets there, but it also “knows” that it’s going to find only 100 rows. Messing around with opt_estimate() and cardinality() hints is difficult to get right.

More significantly for the purposes of this note, are the costs. When we use the index_filter parameter the optimizer still thinks it’s going to access the same number of leaf blocks and the only correction it has to make is the number of rowids it finds in those blocks – so the index range scan cost hasn’t changed (though I supposed in some cases it might change slightly due to increased CPU costs). When we use the index_scan parameter the optimizer scales up its estimate of the number of leaf blocks (hence cost), which we can see in the figures 782 / 25 = 31.28. (Without going into the trace file and checking exact details that’s close enough to the previously reported 34 for me to think it’s allowing for 25 times the number of leaf blocks plus a chunk more CPU)

Conclusion

As I said at the outset, opt_estimate() really isn’t a hint you should be playing with, but I hope that this note has helped shed some light on the significance of access predicates and filter predicates in relation to the costs of index range scans.

Footnote

There were two significant details in the notes I had in my script. First was the frequency of the expression “it looks as if” – which is my shorthand for “I really ought to do some more tests before I publish any conclusions”; second was that my most recent testing had been on 10.2.0.5 (where the results were slightly different thanks to sampling in the statistics). Given that Stefan Koehler had mentioned 11.2.0.3 as his version I ran up an instance of 11.1.0.7 – and found that the index_filter example didn’t scale up the cardinality – so maybe his problem is a version problem.

 


Aliases

Tue, 2017-05-02 03:23

Here’s a performance problem that came up on OTN recently. The following query (reforematted) takes “ages” to run – how do you address the problem:

SELECT
	COUNT(*) 
FROM
	smp_dbuser2.workflow_step_report
WHERE
	report_ID IN (
		SELECT	report_id
		FROM	smp_dbuser2.workflow_report
		WHERE	trunc(start_time) = '28-Apr-2017'
		AND	user_id = 'nbi_ssc'
	)
;


Various pieces of relevant information were supplied (the workflow_report table holds 1.4M rows the workflow_step_report table holds 740M rows and some indexes were described), but most significantly we were given the execution plan:

--------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                      |     1 |     6 |    10T  (1)|999:59:59 |       |       |
|   1 |  SORT AGGREGATE       |                      |     1 |     6 |            |          |       |       |
|*  2 |   FILTER              |                      |       |       |            |          |       |       |
|   3 |    PARTITION HASH ALL |                      |   731M|  4187M|  5363K  (1)| 17:52:47 |     1 |   128 |
|   4 |     TABLE ACCESS FULL | WORKFLOW_STEP_REPORT |   731M|  4187M|  5363K  (1)| 17:52:47 |     1 |   128 |
|*  5 |    FILTER             |                      |       |       |            |          |       |       |
|   6 |     PARTITION HASH ALL|                      |     2 |    38 | 14161   (1)| 00:02:50 |     1 |    32 |
|*  7 |      TABLE ACCESS FULL| WORKFLOW_REPORT      |     2 |    38 | 14161   (1)| 00:02:50 |     1 |    32 |
--------------------------------------------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter( EXISTS (SELECT 0 FROM "SMP_DBUSER2"."WORKFLOW_REPORT" "WORKFLOW_REPORT" WHERE :B1=:B2
              AND "USER_ID"='nbi_ssc' AND TRUNC(INTERNAL_FUNCTION("START_TIME"))=TO_DATE(' 2017-04-28 00:00:00',
              'syyyy-mm-dd hh24:mi:ss')))
   5 - filter(:B1=:B2)
   7 - filter("USER_ID"='nbi_ssc' AND TRUNC(INTERNAL_FUNCTION("START_TIME"))=TO_DATE(' 2017-04-28
              00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

You’ll notice that the optimizer has transformed the IN subquery into an EXISTS subquery – operation 2 is a FILTER operation, and you can see that the filter predicate at operation 2 shows the existence subquery that would be executed.

If you look carefully at the execution plan (all of it), what can you deduce from it ? What, then, should be your next step in dealing with this performance problem ?

Answers at the end of the (UK) day. Resist the temptation to examine my comments in the OTN thread.


RTFM

Mon, 2017-05-01 06:55

Imagine you’re fairly new to Oracle and don’t have a lot of background information at your fingertips; then one day someone tells you to read the manual pages for the view dba_free_space. Look carefully at this sentence:

Note that if a data file (or entire tablespace) is offline in a locally managed tablespace, you will not see any extent information.

Can you spot the error ? Did you spot the error when you first read the sentence – or did you fill in the gap without noticing what you were doing ?

Let’s demonstrate the accuracy of the statement (simple cut-n-paste from an SQL*Plus session on 12.1.0.2 running in archivelog mode, and with a locally managed tablespace consisting of 4 (oracle managed) files on a filesystem):


SQL> select * from dba_free_space where tablespace_name = 'LOB_TEST';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
LOB_TEST                                4        128   51380224       6272            4
LOB_TEST                                7        128   51380224       6272            7
LOB_TEST                                8        640   47185920       5760            8
LOB_TEST                                9        128   51380224       6272            9

4 rows selected.

SQL> select file#, ts#, name from v$datafile;

     FILE#        TS# NAME
---------- ---------- ----------------------------------------------------------------------
         1          0 /u02/app/oracle/oradata/OR32/datafile/o1_mf_system_cbcysq2o_.dbf
         2          9 /u02/app/oracle/oradata/OR32/datafile/o1_mf_undotbs_d84db0s2_.dbf
         3          1 /u02/app/oracle/oradata/OR32/datafile/o1_mf_sysaux_cbcyrmyd_.dbf
         4         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchn57_.dbf
         5          6 /u02/app/oracle/oradata/OR32/datafile/o1_mf_test_8k__cbd120yc_.dbf
         6          4 /u02/app/oracle/oradata/OR32/datafile/o1_mf_users_cbcyv47y_.dbf
         7         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf
         8         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpcho47_.dbf
         9         15 /u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchok1_.dbf

9 rows selected.

SQL> alter database datafile '/u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf' offline;

Database altered.

SQL> select * from dba_free_space where tablespace_name = 'LOB_TEST';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
LOB_TEST                                4        128   51380224       6272            4
LOB_TEST                                8        640   47185920       5760            8
LOB_TEST                                9        128   51380224       6272            9

3 rows selected.

SQL> recover datafile '/u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf';
Media recovery complete.
SQL> alter database datafile '/u02/app/oracle/oradata/OR32/datafile/o1_mf_lob_test_dhpchnnq_.dbf' online;

Database altered.

SQL> select * from dba_free_space where tablespace_name = 'LOB_TEST';

TABLESPACE_NAME                   FILE_ID   BLOCK_ID      BYTES     BLOCKS RELATIVE_FNO
------------------------------ ---------- ---------- ---------- ---------- ------------
LOB_TEST                                4        128   51380224       6272            4
LOB_TEST                                7        128   51380224       6272            7
LOB_TEST                                8        640   47185920       5760            8
LOB_TEST                                9        128   51380224       6272            9

4 rows selected.

SQL> spool off

See the bit in the middle where I have “3 rows selected” for the lob_test tablespace: the manual says I “will not see any extent information” – but the only change in the output is the absence of information about the one data file that I’ve put offline.

You may want to argue that “obviously” the statement was only about the data file that was offline – but is that a couple of years experience allowing you to interpret the text ? Some might assume (with a little prior experience and if they hadn’t done the experiment and given the parenthetical reference to “entire tablespace”) that the statement was about the effect on a single tablespace  – and maybe others would criticise them for making unwarranted assumptions.

But maybe you’re a novice and believed what the manual actually said.

It’s a fairly silly example, of course, but the point of this note is that when you tell someone to RTFM remember that they might actually do exactly that and not have the benefit of being able to know (unthinkingly) that the manual is wrong. If you go one step further and tell them to “stop making assumptions and RTFM” then just remember that you probably make a lot of assumptions without realising it when you read the manuals, and maybe it’s your assumptions that lead you to the correct interpretation of the manual.

Footnote:

If you’re feeling in the mood to split hairs, don’t forget that dba_free_space doesn’t usually give you any information about extents when it’s reporting locally managed tablespaces, it tells you about the space in which extents can be created; the one exception (that I know of) is when you have an object in the recyclebin and each extent of that object is listed as free space (see this article and the footnote here).  It’s only for dictionary managed tablespaces that dba_free_space reports extent information – the rows stored in the fet$ table.

 


Quiz Night

Thu, 2017-04-27 11:29

If this is the closing section of a trace file from a single end-user session that has a performance problem, what’s the most obvious deduction you can make about the cause of the problem, and what sort of action would you take next ?


OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse      296      0.06       0.10          0        651          0           0
Execute    440      1.39       2.24          7       4664          0         146
Fetch      345     29.38      48.27          0    1709081          0         346
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1081     30.84      50.63          7    1714396          0         492

Misses in library cache during parse: 5
Misses in library cache during execute: 7

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  SQL*Net message to client                     497        0.00          0.00
  SQL*Net message from client                   496       27.03         50.35
  direct path read                                1        0.00          0.00


OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse   128199      6.94      11.46          0       2740          0           0
Execute 2274845    371.25     605.60         30   10031162          0       68200
Fetch   2225314     10.94      18.17          5     879297          0      577755
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total   4628358    389.14     635.23         35   10913199          0      645955

Misses in library cache during parse: 9701
Misses in library cache during execute: 134

Elapsed times include waiting on following events:
  Event waited on                             Times   Max. Wait  Total Waited
  ----------------------------------------   Waited  ----------  ------------
  direct path read                                1        0.00          0.00
  latch: shared pool                              3        0.00          0.00

12666  user  SQL statements in session.
  495  internal SQL statements in session.
13161  SQL statements in session.
********************************************************************************
Trace file: {concealed file name}.trc
Trace file compatibility: 11.1.0.7
Sort options: default

       1  session in tracefile.
   12666  user  SQL statements in trace file.
     495  internal SQL statements in trace file.
   13161  SQL statements in trace file.
    5266  unique SQL statements in trace file.
 39046623  lines in trace file.
     742  elapsed seconds in trace file.


There’s no absoutely right answer to the last question, by the way – there are a couple of variations in approach that are likely to get to the same result in the same time, and some of those variations might have made looking at the tail end data the 2nd or 3rd step in the sequence.


Undo Understood

Fri, 2017-04-21 04:45

It’s hard to understand all the ramifications of Oracle’s undo handling, and it’s not hard to find cases where the resulting effects are very confusing. In a recent post on the OTN database forum resulted in one response insisting that the OP was obviously updating a table with frequent commits from one session while querying it from another thereby generating a large number of undo reads in the querying session.

It’s a possible cause of the symptoms that had been described – although not the only possible cause, especially since the symptoms hadn’t been described completely. It’s actually possible to see this type of activity when there are no updates and no outstanding commits taking place at all on the target table. Unfortunately it’s quite hard to demonstrate this with a quick, simple, script in recent versions of Oracle unless you do some insanely stupid things to make the problem appear – but I know how to do “insanely stupid” in Oracle, so here we go; first some data creation:

rem
rem     Script:         undo_rec_apply_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017
rem

create table t2(v1 varchar2(100));
insert into t2 values(rpad('x',100));
commit;

create table t1
nologging
pctfree 99 pctused 1
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        cast(rownum as number(8,0))                     id,
        cast(lpad(rownum,10,'0') as varchar2(10))       v1,
        cast(lpad('x',100,'x') as varchar2(100))        padding
from
        generator       v1,
        generator       v2
where
        rownum <= 8e4 -- > comment to bypass WordPress formatting issue
;

alter table t1 add constraint t1_pk primary key(id)
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

The t2 table is there as a target for a large of updates from a session other than the one demonstrating the problem. The t1 table has been defined and populated in a way that puts one row into each of 80,000 blocks (though, with ASSM and my specific tablespace definition of uniform 1MB extents, the total space is about 80,400 blocks). I’ve got a primary key declaration that allows me to pick single rows/blocks from the table if I want to.

At this point I’m going to do a lot of updates to the main table using a very inefficient strategy to emulate the type of thing that can happen on a very large table with lots of random updates and many indexes to maintain:


begin
        for i in 1..800 loop
                update t1 set v1 = upper(v1) where id = 100 * i;
                execute immediate 'alter system switch logfile';
                execute immediate 'alter system flush buffer_cache';
                commit;
                dbms_lock.sleep(0.01);
        end loop;
end;
/

set transaction read only;

I’m updating every 100th row/block in the table with single row commits, but before each commit I’m switching log files and flushing the buffer cache.

This is NOT an experiment to try on a production system, or even a development system if there are lots of busy developers or testers around – and if you’re running your dev/test in archivelog mode (which, for some of your systems you should be) you’re going to end up with a lot of archived redo logs. I have to do this switch to ensure that the updated blocks are unpinned so that they will be written to disc and flushed from the cache by the flush buffer cache. (This extreme approach would not have been necessary in earlier versions of Oracle, but the clever developers at Oracle Corp. keep adding “damage limitation” touches to the code that I have to work around to create small tests.) Because the block has been flushed from memory before the commit the session will record a “commit cleanout failures: block lost” on each commit. By the time this loop has run to completion there will be 800 blocks from the table on disc needing a “delayed block cleanout”.

Despite the extreme brute force I use in this loop, there is a further very important detail that has to be set before this test will work (at least in 11.2.0.4, which is what I’ve used in my test runs). I had to start the database with the hidden parameter _db_cache_pre_warm set to false. If I don’t have the database started with this feature disabled Oracle would notice that the buffer cache had a lot of empty space and would “pre-warm” the cache by loading a few thousand blocks from t1 as I updated one row – with the side effect that the update from the previous cycle of the loop would be cleaned out on the current cycle of the loop. If you do run this experiment, remember to reset the parameter and restart the instance when you’ve finished.

I’ve finished this chunk of code with a call to “set transaction read only” – this emulates the start of a long-running query: it captures a point in time (through the current SCN) and any queries that run in the session from now on have to be read-consistent with that point in time. After doing this I need to use a second session to do a bit of hard work – in my case the following:

execute snap_rollstats.start_snap

begin
        for i in 1..10000 loop
                update t2 set v1 = upper(v1);
                update t2 set v1 = lower(v1);
                commit;
        end loop;
end;
/

execute snap_rollstats.end_snap

The calls to the snap_rollstats package simply read v$rollstat and give me a report of the changes in the undo segment statistics over the period of the loop. I’ve executed 10,000 transactions in the interval, which was sufficient on my system to use each undo segment header at least 1,000 times and (since there are 34 transaction table slots in each undo segment header) overwrite each transaction table slot about 30 times. You can infer from these comments that I have only 10 undo segments active at the time, your system may have many more (check the number of rows in v$rollstat) so you may want to scale up that 10,000 loop count accordingly.

At this point, then, the only thing I’ve done since the start of my “long running query” is to update another table from another session. What happens when I do a simple count() from t1 that requires a full tablescan ?

alter system flush buffer_cache;

execute snap_filestat.start_snap
execute snap_my_stats.start_snap

select count(v1) from t1;

execute snap_my_stats.end_snap
execute snap_filestat.end_snap

I’ve flushed the buffer cache to get rid of any buffered undo blocks – again an unreasonable thing to do in production but a valid way of emulating the aging out of undo blocks that would take place in a production system – and surrounded my count() with a couple of packaged call to report the session stats and file I/O stats due to my query. (If you’re sharing your database then the file I/O stats will be affected by the activity of other users, of course, but in my case I had a private database.)

Here are the file stats:


--------------
Datafile Stats
--------------
file#       Reads      Blocks    Avg Size   Avg Csecs     S_Reads   Avg Csecs     M_Reads   Avg Csecs    Max      Writes      Blocks   Avg Csecs    Max
File name
-----       -----      ------    --------   ---------     -------   ---------     -------   ---------    ---      ------      ------   ---------    ---
    1          17          17       1.000        .065          17        .065           0        .000      6           0           0        .000     15
/u01/app/oracle/oradata/TEST/datafile/o1_mf_system_938s4mr3_.dbf
    3         665         665       1.000        .020         665        .020           0        .000      6           0           0        .000     15
/u01/app/oracle/oradata/TEST/datafile/o1_mf_undotbs1_938s5n46_.dbf
    5         631      80,002     126.786        .000           2        .045         629        .000      6           0           0        .000     17
/u01/app/oracle/oradata/TEST/datafile/o1_mf_test_8k__cz1w7tz1_.dbf

As expected I’ve done a number of multiblock reads of my data tablespace for a total of roughly 80,000 blocks read. What you may not have expected is that I’ve done 665 single block reads of the undo tablespace.

What have I been doing with all those undo blocks ? Check the session stats:


Session stats
-------------
Name                                                                     Value
----                                                                     -----
transaction tables consistent reads - undo records applied              10,014
transaction tables consistent read rollbacks                                10

We’ve been reading undo blocks so that we can create read-consistent copies of the 10 undo segment headers that were active in my instance. We haven’t (and you’ll have to trust me on this, I can’t show you the stats that aren’t there!) reported any “data blocks consistent reads – undo records applied”.

If you want to see a detailed explanation of what has happened you’ll need to read Oracle Core (UK source), chapter 3 (and possibly chapter 2 to warm yourself up for the topic). In outline the following type of thing happens:

  • Oracle gets to the first block updated in t1 and sees that there’s an ITL (interested transaction list) entry that hasn’t been marked as committed (we flushed the block from memory before the commit cleanout could take place so the relevant transaction is, apparently, still running and the row is still marked as locked).
  • Let’s say the ITL entry says the transaction was for undo segment 34, transaction table slot 11, sequence 999. Oracle reads the undo segment header block for undo segment 34 and checks transaction table slot 11, which is now at sequence 1032. Oracle can infer from this that the transaction that updated the table has committed – but can’t yet know whether it committed before or after the start of our “long running query”.
  • Somehow Oracle has to get slot 11 back to sequence 999 so that it can check the commit SCN recorded in the slot at that sequence number. This is where we see “undo records applied” to make the “transaction table read consistent”. It can do this because the undo segment header has a “transaction control” section in it that records some details of the most recent transaction started in that segment. When a transaction starts it updates this information, but saves the old version of the transaction control and the previous version of its transaction table slot in its first undo record, consequently Oracle can clone the undo segment header block, identify the most recent transaction, find its first undo record and apply it to unwind the transaction table information. As it does so it has also wound the transaction control section backwards one step, so it can use that (older) version to go back another step … and so on, until it takes the cloned undo segment header so far back that it takes our transaction table slot back to sequence 999 – and the job is done, we can now check the actual commit SCN.  (Or, if we’re unlucky, we might receive an ORA-01555 before we get there)

So – no changes to the t1 table during the query, but lots of undo records read because OTHER tables have been changing.

Footnote:

In my example the tablescan used direct path reads – so the blocks that went through delayed block cleanout were in private memory, which means they weren’t in the buffer cache and didn’t get written out to disc. When I flushed the buffer cache (again to emulate aging out of undo blocks etc.) and repeated the tablescan Oracle had to go through all that work of creating read consistent transaction tables all over again.

Footnote 2:

Despite the grand claim implied by the title I hope everyone realises that this is just one little detail of how undo handling can interfere with their expectations of performance.

 

 


Character selectivity

Fri, 2017-04-14 06:40

A recent OTN posting asked how the optimizer dealt with “like” predicates for character types quoting the DDL and a query that I had published some time ago in a presentation I had done with Kyle Hailey. I thought that I had already given a detailed answer somewhere on my blog (or even in the presentation) but found that I couldn’t track down the necessary working, so here’s a repeat of the question and a full explanation of the working.

The query is very simple, and the optimizer’s arithmetic takes an “obvious” strategy in the arithmetic. Here’s the sample query, with the equiavalent query that we can use to do the calculation:


select * from t1 where alpha_06 like 'mm%';

select * from t1 where alpha_06 >= 'mm' and alpha_06 < 'mn';

Ignoring the possible pain of the EBCDIC character set and multi-byte national-language character sets with “strange” collation orders, it should be reasonably easy to see that ‘mn’ is the first string in alphabetical order that fails to match ‘mm%’. With that thought in mind we can apply the standard arithmetic for range-based predicates assuming, to stick with the easy example, that there are no histograms involved. For a range closed at one end and and open at the other the selectivity is:


( ( 'mn' - 'mm') / (high_value - low_value) ) + 1/num_distinct

The tricky bits, of course, are how you subtract ‘mm’ from ‘mn’ and how you use the values stored in the low_value and high_value columns of view user_tab_cols. So let’s generate the orginal data set and see where we go (running on 12c, and eliminating redundant bits from the original presentation):


rem
rem     Script:         selectivity_like_char.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Sep 2013
rem

execute dbms_random.seed(0)

create table t1 nologging as
with generator as (
        select rownum id
        from dual
        connect by rownum <= 1000
)
select
        cast(dbms_random.string('l',6) as char(6))      alpha_06
from
        generator,
        generator
where
        rownum <= 1e6 -- > comment to avoid WordPress formatting issue
;

execute dbms_stats.gather_table_stats(user,'t1',method_opt=>'for all columns size 1')

column low_value  format a32
column high_value format a32

select
        column_name,
        num_distinct,
        density,
        low_value,
        high_value
from
        user_tab_cols
where
        table_name = 'T1'
order by
        column_name
;

select min(alpha_06), max(alpha_06) from t1;

set autotrace traceonly explain

select
        *
from
        t1
where
        alpha_06 like 'mm%'
;

set autotrace off

It will probably take a couple of minutes to generate the data – it’s 1M random strings, lower-case, 6 characters fixed – and will take up about 12MB of space. Here are the results from the stats and min/max queries, with the execution plan for the query we are testing:


COLUMN_NAME          NUM_DISTINCT    DENSITY LOW_VALUE                  HIGH_VALUE
-------------------- ------------ ---------- -------------------------- --------------------------
ALPHA_06                  1000000    .000001 616161616E72               7A7A7A78747A


MIN(AL MAX(AL
------ ------
aaaanr zzzxtz


Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   157 |  1099 |   265  (20)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |   157 |  1099 |   265  (20)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("ALPHA_06" LIKE 'mm%')


Given that there are power(26,6) = 308,915,776 different combinations available for lower-case strings of 6 charactgers it’s not too surprising that Oracle generated 1M different strings, nor is it particularly surprising that the lowest value string started with ‘aaa’ and the highest with ‘zzz’.

So how do we get 157 as the cardinality for the query or, to put it another way, how do we get 0.000157 as the selectivity of the predicate. We need to refer to a note I wrote a few years ago to help us on our way (with a little caveat due to a change that appeared in 11.2.0.4) – what number would Oracle use to represent ‘mm’ and the other three strings we need to work with ?

According to the rules supplied (and adjusted in later versions) we have to:

  1. pad the strings with ASCII nulls (zeros) up to 15 bytes
  2. treat the results as a hexadecimal number and convert to decimal
  3. round off the last 21 decimal digits

We can model this in SQL with a statement like:


SQL> column dec_value format 999,999,999,999,999,999,999,999,999,999,999,999
SQL> select round(to_number(utl_raw.cast_to_raw(rpad('aaaanr',15,chr(0))),'xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'),-21) dec_val from dual;

DEC_VAL
------------------------------------------------
505,627,904,294,763,000,000,000,000,000,000,000

1 row selected.

As an alternative, or possibly a cross-check, I created a table with a varchar2(6) column, inserted the four values I was interested in and created a histogram of 4 buckets on the column (there’s a suitable little demo at this URL) and got the following endpoint values:


ENDPOINT_NUMBER                                   ENDPOINT_VALUE
--------------- ------------------------------------------------
              1  505,627,904,294,763,000,000,000,000,000,000,000
              2  568,171,140,227,094,000,000,000,000,000,000,000
              3  568,191,422,636,698,000,000,000,000,000,000,000
              4  635,944,373,827,734,000,000,000,000,000,000,000

Once we’ve got these numbers we can slot them into the standard formula (not forgetting the 1/1,000,000 for the closed end of the predicate) – and to save typing I’m going to factor out 10^21 across the board in the division:

Selectivity = (568,191,422,636,698 – 568,171,140,227,094) / (635,944,373,827,734 – 505,627,904,294,763) + 1/1,000,000

Selectivity = 20,282,409,604 / 130,316,469,532,971 + 1/1,000,000

Selectivity = 0.00015564 + 0.000001 = 0.00015664

From which the cardinality = (selectivity * num_rows) = 156.64, which rounds up to 157. Q.E.D.


Ask Jonathan

Mon, 2017-04-10 14:01

Oracle Scene, the magazine of the UK Oracle User Group is piloting a new regular feature called Ask Jonathan,  a chance to get an answer to any question you may have about how the Oracle database engine works.

I’m aiming to answer two or three questions per issue over the course of the year, using a format similar to the one Tom Kyte used in Oracle Magazine:  so if you have a question about the mechanisms, strategies, or mathematics of how Oracle does its thing then attach it as a comment to this posting.

Ideally the questions will have to be quite short (no 20MB trace files, massive schema definitions, or convoluted and exotic setup requirements or it will be too long), and I’ll aim to write something like half a page of in response.  I may summarise the question, or pick out the most interesting feature if it’s a bit too long to publish and answer in its entirety.

 


Parallel First_rows()

Mon, 2017-04-10 09:53

A recent posting on OTN raised the question of whether or not the “parallel” hint and the “first_rows(n)” hint were mutually incompatible. This reminded me that from time to time other posters on OTN (copying information from various websites, perhaps) have claimed that “parallel doesn’t work with first rows” or, conversely, “first rows doesn’t work with parallel”. This is one of those funny little myths that is so old that the script I’ve got to demonstrate the misconception is dated 2003 with a first test version of 8.1.7.4.

Since I haven’t run the test on any version of Oracle newer than 9.2.0.4 I thought it was time to dust it down, modernise it slightly, and run it again. So here’s the bit that creates a sample data set:


create table t1 (
        id      number,
        v1      varchar2(10),
        padding varchar2(100),
        constraint      t_pk primary key(id) using index local
)
partition by range(id) (
        partition p1000 values less than (1000),
        partition p2000 values less than (2000),
        partition p3000 values less than (3000),
        partition p4000 values less than (4000),
        partition p5000 values less than (5000)
)
;

insert into t1
select
        rownum - 1,
        rpad(rownum-1,10),
        rpad('x',100)
from
        all_objects
where
        rownum <= 5000 -- > hint to avoid WordPress formatting issue
order by 
        dbms_random.value
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1', 
                method_opt       => 'for all columns size 1'
        );
end;
/

Now I’m going to run a simple query, hinted in 4 different ways:

  • no hints
  • parallel hint only: /*+ parallel */
  • first_rows(1) hint only: /*+ first_rows(1) */
  • parallel and first_rows(1): /*+ parallel first_rows(1) */

Here’s the version of the query that has both hints in place:


set serveroutput off
set linesize 156
set pagesize 60
set trimspool on

select
        /*+ parallel first_rows(1) */
        v1
from
        t1
where
        id between 1500 and 2000
;

select * from table(dbms_xplan.display_cursor(null,null,'cost outline'));

I’ve actually run the query and used the display_cursor() option to pull the plan from memory – in the original (8i) script I used autotrace and the old (deprecated, backwards compatibility only) first_rows hint. To do any other tests just clone and edit. Here are the 4 outputs from the call to display_cursor() – with a little cosmetic editing:


SQL_ID  63qnzam9b8m9g, child number 0
=====================================
select  /*+ */  v1 from  t1 where  id between 1500 and 2000

Plan hash value: 277861402

-------------------------------------------------------------------------------------------------
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |      |       |       |    15 (100)|          |       |       |
|   1 |  PARTITION RANGE ITERATOR|      |   502 |  7530 |    15   (0)| 00:00:01 |     2 |     3 |
|*  2 |   TABLE ACCESS FULL      | T1   |   502 |  7530 |    15   (0)| 00:00:01 |     2 |     3 |
-------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - filter(("ID"<=2000 AND "ID">=1500))


SQL_ID  ahary3u8q88mq, child number 1
=====================================
select  /*+ parallel */  v1 from  t1 where  id between 1500 and 2000

Plan hash value: 9959369

------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |       |       |     8 (100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR      |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)| :TQ10000 |   502 |  7530 |     8   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX BLOCK ITERATOR |          |   502 |  7530 |     8   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWC |            |
|*  4 |     TABLE ACCESS FULL| T1       |   502 |  7530 |     8   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWP |            |
------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      ALL_ROWS
      SHARED(2)
      OUTLINE_LEAF(@"SEL$1")
      FULL(@"SEL$1" "T1"@"SEL$1")
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - access(:Z>=:Z AND :Z<=:Z)
       filter(("ID"<=2000 AND "ID">=1500))

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2


SQL_ID  3m6mnk9b337dd, child number 0
=====================================
select  /*+ first_rows(1) */  v1 from  t1 where  id between 1500 and
2000

Plan hash value: 1044541683

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                          | Name | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                   |      |       |       |     6 (100)|          |       |       |
|   1 |  PARTITION RANGE ITERATOR          |      |     4 |    60 |     6   (0)| 00:00:01 |     2 |     3 |
|   2 |   TABLE ACCESS BY LOCAL INDEX ROWID| T1   |     4 |    60 |     6   (0)| 00:00:01 |     2 |     3 |
|*  3 |    INDEX RANGE SCAN                | T_PK |       |       |     2   (0)| 00:00:01 |     2 |     3 |
-----------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      FIRST_ROWS(1)
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID">=1500 AND "ID"<=2000) -- > needs edit to avoid WordPress formatting issue


SQL_ID  9asm7t1zbv4q8, child number 1
=====================================
select  /*+ parallel first_rows(1) */  v1 from  t1 where  id between
1500 and 2000

Plan hash value: 4229065483

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name     | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |    TQ  |IN-OUT| PQ Distrib |
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |          |       |       |     3 (100)|          |       |       |        |      |            |
|   1 |  PX COORDINATOR                      |          |       |       |            |          |       |       |        |      |            |
|   2 |   PX SEND QC (RANDOM)                | :TQ10000 |     4 |    60 |     3   (0)| 00:00:01 |       |       |  Q1,00 | P->S | QC (RAND)  |
|   3 |    PX PARTITION RANGE ITERATOR       |          |     4 |    60 |     3   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWC |            |
|   4 |     TABLE ACCESS BY LOCAL INDEX ROWID| T1       |     4 |    60 |     3   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWP |            |
|*  5 |      INDEX RANGE SCAN                | T_PK     |       |       |     1   (0)| 00:00:01 |     2 |     3 |  Q1,00 | PCWP |            |
----------------------------------------------------------------------------------------------------------------------------------------------

Outline Data
-------------
  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      FIRST_ROWS(1)
      SHARED(2)
      OUTLINE_LEAF(@"SEL$1")
      INDEX_RS_ASC(@"SEL$1" "T1"@"SEL$1" ("T1"."ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("ID">=1500 AND "ID"<=2000)

Note
-----
   - automatic DOP: Computed Degree of Parallelism is 2

Critically we get four different execution plans from the four different strategies – so clearly the optimizer is perfectly happy to accept the parallel and first_rows() hints simultaneously. Note, particularly, how the first_rows(1) hint when combined with the parallel hint moved us from a parallel full tablescan to a parallel index range scan.

Whether or not it’s sensible to use the hint combination in this way is a matter for careful consideration, of course, but there could be circumstances where the combination really is the best way to get the starting row(s) from a query that otherwise has to return a large amount of data.


Block Names

Mon, 2017-04-03 07:04

There are a number of tiny details that I can never remember when I’m sketching out models to test ideas, and one of those is the PL/SQL block name. Virtually every piece of PL/SQL I write ends up with variables which have one of two prefixes in their names “M_” or “G_” (for memory or global, respectively) but I probably ought to be formal than that, so here’s an example of labelling blocks – specifically, labelling anonymous blocks from SQL*Plus using a trivial and silly bit of code:


rem
rem     Script:         plsql_block_names.sql
rem     Author:         Jonathan Lewis
rem     Dated:          March 2017
rem

create table t1
nologging
as
select * from all_objects
;


begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index t1_i1 on t1(object_id) nologging;

--
--        Anonymous PL/SQL block with label, and a couple of 
--        uses of the label as the qualifier for variables
--

<<my_block>>
declare
        data_object_id  t1.data_object_id%type;
        object_id       t1.object_id%type;
begin
        select  data_object_id
        into    my_block.data_object_id
        from    t1
        where
                data_object_id is not null
        and     rownum = 1
        ;
        select  object_id
        into    my_block.object_id
        from    t1
        where
                data_object_id = my_block.data_object_id
        and     rownum = 1
        ;

        dbms_output.put_line('Object: ' || object_id || ' - ' || my_block.object_id);
        dbms_output.put_line('Data Object: ' || data_object_id || ' - ' || my_block.data_object_id);

end;
/

Object: 16 - 16
Data Object: 6 - 6

The important point, of course, is that with a qualified variable name you eliminate the risk of a problem that appears occasionally on the public fora where someone has lost track of duplicated variable names, or used (as I have above) a variable name that matches a column name, and doesn’t notice that a little further down the code Oracle has captured the “wrong” interpretation of the name.

You’ll note that the block name has to be surrounded by doubled angle brackets – and now that I’ve written about it I might actually remember that for a couple of weeks ;)

 

Footnote:

This isn’t the only use for labels, by the way, it’s just one that probably ought to be used more frequently in production code.

The other thing I can never remember is how to escape quote marks – so I invariably end up using the old “double the quotes” method when I want to quote quotes.


Adding Indexes

Thu, 2017-03-30 02:31

The following question came up on the OTN database forum recently:


We have below table with columns,

Table T1
Columns:
-----------
Col_1, Col_2, Col_3, Col_4, Col_5, Col_6, Col_7, Col_8, Col_9, Col_10, Col_11, Col_12, Col_13, Col_14, Col_15

on which below indexes are created.

XXTEST_Col_1    Col_1
XXTEST_Col_2    Col_2
XXTEST_Col_3    Col_3
XXTEST_Col_5    Col_5
XXTEST_Col_6    Col_6
XXTEST_Col_7    Col_7
XXTEST_Col_8    Col_8
XXTEST_Col_8    (Col_4, Col_10, Col_11)

I have requirement to update table T1 as below and it’s taking really long. [JPL: I’m assuming that the naming of the second xxtest_col_8 index is a trivial error introduced while the OP was concealing names.)

UPDATE T1
SET Col_3 = x_value,
    Col_6 = y_value
where Col_4='N'
and Col_3 IS NULL;

To improve performance, created below index and it improved the performance of the update.

 
New Index:  XXTEST_Col_4    (Col_4, Col_3)

My question is it good idea to create the above index with already existing indexes? Is there any other better way to handle this?

There is insufficient information to answer these questions properly but if I had been given this problem there are several things I’d want to check before I came up with any answers. (In the absence of complete information the default answers are “probably not”, “probably”).

What Happened

The first question, of course, is “why did the update take a really long time?”, and the second is “why did the performance improve after the index creation?”.

There are (generally) two execution paths for the update – full tablescan or index access through the index starting with col_4. Whichever path Oracle took would the update have been faster (and fast enough) if the alternative path had been taken ?  (I have to include the “generally” caveat since it’s just possible that with silly settings for a couple of parameters and statistics the optimizer could have been doing an amazingly stupid “index full scan” on any index that contained a column declared as not null.)

Then consider the fact that the update is changing two columns that are indexed – and one of those columns (col_3) is changing from null to a value, so it’s going to increase the size of the index. Is the slow performance due to the time spent maintaining the indexes, and is the improved performance due to the fact that there are no rows left to update because the first run of the test updated all of them ?

It does seem reasonably likely that Oracle would use the new index for the update and that the better precision would be the cause of the improved performance; but a side effect of the index creation is that Oracle would now have some statistics about the combination of (col_4, col_3) that wasn’t previously available. Perhaps gathering stats on the column group (col_4, col_3) would have been sufficient to result in a change in execution path that would have given the required performance without introducing the extra  maintenance costs.

What’s the requirement

Before deciding whether or not an index should be used to improve the performance of a statement it’s important to have some idea of the cost/benefit/risk analysis. This often boils down to “how much data should change, how often does it happen, what could possibly go wrong”.

As far as “what could possibly go wrong” – by adding the index (col_4, col_3) we introduce 4 groups of risk:

  • any SQL currently using the col_3 index might change plans
  • any SQL currently using the (col_4, col_10, col_11) index might change plans
  • any query that has a predicate “col_3 is null” might change plans because that predicate could be resolved in the index if col_4 had been declared not null,
  • any query with equality predicates on both col_3 and col_4 might change plans because the new index supplies some statistics about the combination

We don’t have enough information to evaluate the degree of risk – and we haven’t been given any figures about the volume of data that changes or the frequency with which the update runs, so we can’t really say much beyond raising the warning flags.

What does the data look like.

Despite my closing comment above, we have two little clues about the data. First is that one of the predicates is col_4 = ‘N’ – to me this always raises the suspicion that we’re looking at a column which is a Yes/No, or Yes/No/NULL column; secondly the OP has marked the following response as “Correct Answer”:

Depends on your data.
Check count of both queries

select count(*) from T1 where Col_4='N' and Col_3 IS NULL;
select count(*) from T1 where Col_4='N';

If there is a significant difference in count, and delete is executed frequently, it might be good to have

If we assume that this means (a) there is a significant difference in the counts and (b) the delete IS executed frequently then we we have some (though not really enough) useful information about what the data looks like and how it is handled. Perhaps over a short period of time a small volume of new data appears where col_3 is null and there is a scheduled mechanism that looks for that data and modifies it.

If that’s the case then an index that gives use high precision when we’re searching for a (very) small amount of data from a very large table would be very helpful, while an index that starts with the first of our search columns (col_4 is the leading column of an existing index) might actually be a liability if we do a large range scan, visit the table for lots of rows, and then discard them because they fail our second predicate.

However, if this IS the case then it’s not a good idea to create a large index that is constantly subject to change – especially if the update from NULL to something else results in updates that scatter the col_3 values randomly around the index. (The NULL rows will be well packed in the index, and may be in a small collection of recent table blocks; but the update will delete one index entry and we have no idea about the impact that inserting the replacement could have.) So, if we’re also allowed to modify the code, let’s create an index that is as small as possible, won’t be used accidentally by any other code, and minimises the workload when updated. Here’s the code to create the index and one option for gathering stats on the resulting hidden column, and the modified update code:


create index xxtext_fbi1 on t1(
        case when col_4='N' and col_3 is null then 0 end
)
;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for columns (case when col_4=''N'' and col_3 is null then 0 end) size 1'
        );
end;
/

update t1 set
        col_3 = {x_value},
        col_6 = {y_value}
where
        case when col_4='N' and col_3 is null then 0 end = 0
;

This index will contain entries for just those rows that we need to update, and when we update them the index entries will simply disappear. The index is the smallest that we could have, with the minimum workload imposed, and no risk of being hi-jacked by any other process.

As a variation on a theme, and depending on version, we could create a virtual column with this definition (making it invisible in 12c) and creating an index on the virtual column. For preference I think I’d use the FBI in 11g, and the invisible virtual column in 12c.


alter table t1
add (
        wanted_flag invisible
        generated always as (
                case when col_4='N' and col_3 is null then 0 end
        ) virtual
)
;

begin
        dbms_stats.gather_table_stats(
                user,
                't1',
                method_opt=>'for columns wanted_flag size 1'
        );
end;
/

create index xxtest_virt1 on t1(wanted_flag);

update t1 set
        col_3 = {x_value},
        col_6 = {y_value}
where
        wanted_flag = 0
;

The thread raised a couple of other worthwhile points – one post pointed out that if they were going to create a new index they would want to make it as useful for as many other processes as possible: this, of course, is a different view on the cost/benefit/risk assessment, especially (I would say) if there is no option to create a very small, low-cost, index that is guaranteeably safe. The point also came up that there are two or three variations on how one could produce an expression that produced index entries only for the rows where col_4 = ‘N’ and col_3 is null; and a point about further possibilities if there were a few different values for col_4 that resulted in similar queries.


Index out of range

Mon, 2017-03-27 02:42

I’ve waxed lyrical in the past about creating suitable column group statistics whenever you drop an index because even when the optimizer doesn’t use an index in its execution path it might have used the number of distinct keys of the index (user_indexes.distinct_keys) in its estimates of cardinality.

I’ve also highlighted various warnings (here (with several follow-on URLs) and here) about when the optimizer declines to use column group statistics. One of those cases is when a predicate on one of the columns goes “out of  range” – i.e. is below the column low_value or above the column high_value. Last night it suddenly crossed my mind that if we drop an index and replace it with a column group we might see an example of inconsistent behaviour: what happens when the index exists but the predicate is out of range – would you find that dropping the index and replacing it with a column group would give you different cardinality estimates for out of range predicates ?

Here’s the demonstration of what happened when I created a simple test on 12.1.0.2:


rem
rem     Script:         index_v_colgrp.sql
rem     Author:         Jonathan Lewis
rem
rem     Last tested
rem             12.1.0.2
rem

create table t1
nologging
as
with generator as (
        select
                rownum id
        from dual
        connect by
                level <= 1e4
)
select
        rownum                          id,
        mod(rownum-1,100)               n1,
        mod(rownum-1,100)               n2,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e6 -- > comment to avoid WordPress format problem
;

begin dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for all columns size 1'
        );
end;
/

set autotrace traceonly explain

I’ve created a table with 1M rows, where n1 and n2 are perfectly correlated – there are 100 distinct pairs of values (ranging from (0,0) to (99,99)). Now with autotrace enabled I’ll execute three queries – two with an index on the table of which one will be the baseline plan for predicates that are “in-range” and the other will take the predicates out of range, and the third after I’ve dropped the index and substituted a matching column group to see what I get for the “out of range” plan. The plans may produce different paths as the index disappears, of course, but what we’re only interested in the cardinality estimates in this experiment.

Here’s the code to run the three queries:


select  padding
from    t1
where
        n1 = 50
and     n2 = 50
;

select  padding
from    t1
where
        n1 = 110
and     n2 = 110
;

drop index t1_i1;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'T1',
                method_opt       => 'for columns (n1, n2) size 1'
        );
end;
/

select  padding
from    t1
where
        n1 = 110
and     n2 = 110
;

And the three execution plans:


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 10000 |  1044K|  2142   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   | 10000 |  1044K|  2142   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=50 AND "N2"=50)


---------------------------------------------------------------------------------------------
| Id  | Operation                           | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |       |    79 |  8453 |    83   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    79 |  8453 |    83   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | T1_I1 |    79 |       |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("N1"=110 AND "N2"=110)


--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |    79 |  8453 |  2142   (4)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1   |    79 |  8453 |  2142   (4)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("N1"=110 AND "N2"=110)

In summary:

  • With the index in place and the predicates in range the optimizer used user_indexes.distinct_keys to calculate cardinality.
  • With the index in place and the predicates (or just one of them, in fact) out of range the optimizer used the individual column selectivities with linear decay.
  • With a column group instead of an index the optimizer behaved exactly as it used to with the index in place.

So my concern that substituting column groups for indexes was unfounded – the optimizer was being silly (legal disclaimer: that’s just my opinion) with indexes, and the silly (ditto) behaviour with column groups hasn’t changed anything.

I’ll have to go back a couple of versions of Oracle to repeat these tests – maybe this behaviour with user_indexes.distinct_keys in place is relatively recent, but it’s another reason why execution plans may change suddenly and badly as time passes when “nothing changed”.

 


min/max Upgrade

Thu, 2017-03-23 03:53

A question came up on the OTN database forum a little while ago about a very simple query that was taking different execution paths on two databases with the same table and index definitions and similar data. In one database the plan used the “index full scan (min/max)” operation while the other database used a brute force “index fast full scan” operation.

In most circumstances the starting point to address a question like this is to check whether some configuration details, or some statistics, or the values used in the query are sufficiently different to result in a significant change in costs; and the first simple procedure you can follow is to hint each database to use the plan from the opposite database to see if this produces any clues about the difference – it’s a good idea when doing this test to use one of the more verbose formatting options for the call to dbms_xplan.

In this case, though, the OP discovered a note on MoS reporting exactly the problem he was seeing:

Doc ID 2144428.1: Optimizer Picking Wrong ‘INDEX FAST FULL SCAN’ Plan vs Correct ‘INDEX FULL SCAN (MIN/MAX)’

which referred to

Bug 22662807: OPTIMIZER PICKING INDEX FFS CAN INSTEAD OF MIN/MAX

Conveniently the document suggested a few workarounds:

  • alter session set optimizer_features_enable = ‘11.2.0.3’;
  • alter session set “_fix_control” = ‘13430622:off’;
  • delete object stats [Ed: so that dynamic sampling takes place … maybe a /*+ dynamic_sampling(alias level) */ hint would suffice].

Of the three options my preference would (at least in the short term) be the _fix_control one. Specifically, from the v$system_fix_control view, we can see that it addresses the problem very precisely with the description: “index min/max cardinality estimate fix for filter predicates”.

The example in the bug note showed a very simple statement (even more simple than the OP’s query which was only a single table query anyway), so I thought I’d build a model and run a few tests to see what was going on. Luckily, before I’d started work, one of the other members of the Oak Table network sent an email to the list asking if anyone knew how the optimizer was costing an example he’d constructed – and I’ve finally got around to looking at his example, and here’s the model and answer(s), starting with the data set:


rem
rem     Script:         test_min_max.sql
rem     Dated:          March 2017
rem
rem     Last tested
rem             12.1.0.2
rem             11.2.0.4
rem             11.2.0.3
rem

create table min_max_test nologging
as
with ids as (
        select /*+ Materialize */ rownum  id from dual connect by rownum <= 50000 -- > comment to protect formatting
),
line_nrs as (
        select /*+ Materialize */  rownum line_nr from dual connect by rownum <= 20 -- > comment to protect formatting
)
select
        id, line_nr ,rpad(' ', 800, '*') data
from
        line_nrs, ids
order by
        line_nr, id
;

begin
        dbms_stats.gather_table_stats(
                ownname          => user,
                tabname          =>'min_max_test',
                method_opt       => 'for all columns size 1'
        );
end;
/

create index mmt_ln_id on min_max_test (line_nr, id) nologging;
create index mmt_id    on min_max_test (id)          nologging;

The table has two critical columns: each id has 20 line_nr values associated with it, but the way the data was generated means that the line numbers for a given id are scattered across 20 separate table blocks.

There are two indexes – one on the id which will allow us to find all the rows for a given id as efficiently as possible, and one (slightly odd-looking in this context) that would allow us to find a specific row for a given line_nr and id very efficiently. Two things about these indexes – in a live application they should both be compressed on the first (only, in the case of index mmt_id) column, and secondly the necessity of the mmt_id index is questionable and it might be an index you could drop if you reversed the order of the columns in mmt_ln_id. The thing about these indexes, though, is that they allow us to demonstrate a problem. So let’s query the data – twice, hinting each index in turn:


set serveroutput off

select
        /*+ index(t(id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

select
        /*+ index(t(line_nr, id)) */
        min(line_nr)
from
        min_max_test t
where
        id = :b1
;

select * from table(dbms_xplan.display_cursor);

It’s fairly safe to make a prediction about the execution plan and cost of the first query – it’s likely to be a range scan that accesses a couple of branch blocks, a leaf block and 20 separate table blocks followed by a “sort aggregate” – with a cost of about 23.

It’s a little harder to make a prediction about the second query. The optimizer could infer that the min(line_nr) has to be close to the left hand section of the index, and could note that the number of rows in the table is the same as the product of the number of distinct values of the two separate columns, and it might note that the id column is evenly distributed (no histogram) across the data, so it might “guess” that it need only range scan all the entries for the first line_nr to find the appropriate id. So perhaps the optimizer will use the index min/max range scan with a cost that is roughly 2 branch blocks plus total leaf blocks / 20 (since there are 20 distinct values for line_nr); maybe it would divide the leaf block estimate by two because “on average” – i.e. for repeated random selections of value for id – it would have to scan half the leaf blocks. There were 2,618 leaf blocks in my index, so the cost should be close to either 133 or 68.

Here are the two plans – range scan first, min/max second:


select  /*+ index(t(id)) */  min(line_nr) from  min_max_test t where id = :b1
-----------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |              |       |       |    23 (100)|          |
|   1 |  SORT AGGREGATE                      |              |     1 |     8 |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| MIN_MAX_TEST |    20 |   160 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | MMT_ID       |    20 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID"=:B1)


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |    22 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |    22   (0)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |    22   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Spot on with the estimate for the simple range scan – but what did we do wrong with the estimate for the min/max scan ? You might notice in the first example the “table access by rowid batched” and realise that this is running on 12c. Here’s the plan if I get if I set the optimizer_features_enable back to 11.2.0.3 before running the second query again:


select  /*+ index(t(line_nr, id)) */  min(line_nr) from  min_max_test t where  id = :b1
-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |       |       |   136 (100)|          |
|   1 |  SORT AGGREGATE             |           |     1 |     8 |            |          |
|   2 |   FIRST ROW                 |           |     1 |     8 |   136   (1)| 00:00:01 |
|*  3 |    INDEX FULL SCAN (MIN/MAX)| MMT_LN_ID |     1 |     8 |   136   (1)| 00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("ID"=:B1)

Using the 11.2.0.3 optimizer model the plan has a cost that’s very close to our prediction – we’ll see why there’s a slight difference in a moment. If we set the optimizer_features_enable to 11.2.0.4 the cost drops back to 22. So for our example 11.2.0.3 will use the simple “index range scan” and an upgrade to 11.2.0.4 (or higher) will switch to the “index full scan (min/max)”. If you look at the OTN posting the impact of the change in costing is exactly the other way around – 11.2.0.3 uses the min/max path, 11.2.0.4 uses the simple index range scan.

The techy bit

You really don’t need to know this – experimenting with the optimizer_features_enable (or _fix_control) will give you plans that show you all the numbers you need to see to check whether or not you’ve run into this particular problem – but if you’re interested here’s a little bit from the two 10053 trace files. We need only look at a few critical lines. From the 11.2.0.3 costing for the min/max scan:


Index Stats::
  Index: MMT_ID  Col#: 1
  LVLS: 2  #LB: 2202  #DK: 50000  LB/K: 1.00  DB/K: 20.00  CLUF: 1000000.00  NRW: 1000000.00
  Index: MMT_LN_ID  Col#: 2 1
  LVLS: 2  #LB: 2618  #DK: 1000000  LB/K: 1.00  DB/K: 1.00  CLUF: 125000.00  NRW: 1000000.00

SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 135.000000  resc_cpu: 961594
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 135.697679  Resp: 135.697679  Degree: 1

I was running 12.1.0.2 so there were a few extra bits and pieces that I’ve deleted (mostly about SQL Plan Directives and in-memory). Critically we can see that the stats collection has a small error for the ID column – 50,536 distinct values (NDV) instead of exactly 50,000. This seems to have given us a cost for the expected index range of: 2 (blevel) + ceiling(2618 (leaf blocks) * 50536 / 1000000) = 2 + ceil(132.3) = 135, to which we add a bit for the CPU and get to 136. (Q.E.D.)

Then we switch to costing for 11.2.0.4:


SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for MIN_MAX_TEST[T]
  Column (#1): ID(NUMBER)
    AvgLen: 5 NDV: 50536 Nulls: 0 Density: 0.000020 Min: 1.000000 Max: 50000.000000
  Table: MIN_MAX_TEST  Alias: T
    Card: Original: 1000000.000000  Rounded: 20  Computed: 19.787874  Non Adjusted: 19.787874

 ****** Costing Index MMT_LN_ID
  Access Path: index (Min/Max)
    Index: MMT_LN_ID
    resc_io: 21.787874  resc_cpu: 156872
    ix_sel: 1.000000  ix_sel_with_filters: 1.9788e-05
    Cost: 22.324608  Resp: 22.324608  Degree: 1

We still have the small error in the number of distinct values for id, so the estimated number of rows that we need to access from the table for a given id (before “aggregating” to find its minimum line_nr) is 19.787874 (Computed: / Non Adjusted:) rather than exactly 20. Notice, then, that the cost of using the index is 19.787874 + 2 which looks suspiciously like adding the blevel to the number of table blocks to get a cost and forgetting that we might have to kiss a lot of frogs before we find the prince. Basically, in this example at least, it looks like the costing algorithm has NOTHING to do with the mechanics of what actually has to happen at run-time.

Footnote

This is only an initial probe into what’s going on with the min/max scan; there are plenty more patterns of data that would need to be tested before we could have any confidence that we had produced a generic model of how the optimizer does its calculations – the only thing to note so far is that there IS a big change as  you move from 11.2.0.3 to later versions: the case on OTN showed the min/max scan disappearing on the upgrade, the example above shows the min/max disappearing on the downgrade – either change could be bad news for parts of a production system.

There are a couple of related bugs that might also be worth reviewing.

  • Bug 11834402 : CBO CHOOSES A SLOW INDEX FULL SCAN OVER A MUCH FASTER INDEX RANGE SCAN
  • Bug 13430622 : INDEX SCAN IN VERY SLOW FOR ONE PREDICATE AND FAST FOR OTHERS

There is a note, though that this last bug was fixed in 12.1

Footnote 2

When experimenting, one idea to pursue as the models get more complex and you’re using indexes with more than two columns is to test whether the presence of carefully chosen column group statistics might make a difference to the optimizer’s estimates of cardinality (hence cost) of the min/max scan.

 


Pages