DBA Blogs

Permissions to create triggers over several schemas

Tom Kyte - Thu, 2017-03-09 08:06
I have a 'normal' schema S, and a schema that stores history H (it contains all tables of S, added with some audit columns, but stripped of any keys). Whenever a DML happens on S, it has to be stored within H (If S contains a table T, and we insert ...
Categories: DBA Blogs

Deterministic function

Tom Kyte - Thu, 2017-03-09 08:06
Hi Tom, Recently I created a function with DETERMINISTIC definition to get performance on my query, but I check that the function is called every time even that it receive the same input. Here the script that I used to check this function: ...
Categories: DBA Blogs

Transaction commit when exiting SQL*Plus

Tom Kyte - Thu, 2017-03-09 08:06
if the case of exit without commit from sqlplus, the running transaction commit or rollback??
Categories: DBA Blogs

List out external databases using DB link

Tom Kyte - Thu, 2017-03-09 08:06
Hello Gurus, Have basic knowledge in Oracle DB Admin or Profiling. I am unable to profiling for below requirement. "Want to identify the list of other database which are usign my Database by created as DB Link." <b>Suppose my company have 10 ...
Categories: DBA Blogs

Exadata questions

Tom Kyte - Thu, 2017-03-09 08:06
Hello Tom, We have acquired an Exadata Server and would like to clarify some issues that I believe are product myths. 1 - After removing indexes, can queries get faster? 2- I have always used indexes on the columns identified as FKs. Is th...
Categories: DBA Blogs

Array Size for pl/sql delete and db link

Tom Kyte - Wed, 2017-03-08 13:46
Dear Team, If i execute a delete statement within a procedure what will be array size it will work with? create or replace procedure del_test as begin delete from big_table; commit; end; If i execute this procedure from a job is ...
Categories: DBA Blogs

Question on recreating a primary key on a partitioned table

Tom Kyte - Wed, 2017-03-08 13:46
What happens to the unique index that gets created when we recreate a primary key which is composite on a partitioned table, please note that we are using the "enable novalidate" clause to recreate so that existing rows are not checked for the validi...
Categories: DBA Blogs

12cR1 RAC Posts -- 8a : Setting up SingleInstance DG Standby for RAC

Hemant K Chitale - Wed, 2017-03-08 09:08
This is the first of a small series of subposts on setting up SingleInstance DataGuard Standby for my 12cR1 RAC environment.

Primary Database : 12cR1 MultiTenant RAC : 2 nodes, database on ASM
Standby Database : 12cR1 MultiTenant SingleInstance : 1 node, database on FileSystem

The "difficulties" are :
a.  The Primary Database is MultiTenant with 1 PDB besides the SEED
b.  The Primary Database is on ASM but the Standby is to be built on FileSystem
c.  The Primary is a 2-node Cluster while the Standby is a SingleInstance installatin

This post is on the initial setup.

RAC Nodes
collabn1.racattack   192.168.78.51
collabn2.racattack   192.168.78.52

SingleInstance Node 
oem132.racattack     192.168.78.100


Database Changes :
SQL> alter database force logging;

Database altered.

SQL> alter database flashback on;

Database altered.

SQL>
SQL> alter database add standby logfile size 50M;

Database altered.

SQL> alter database add standby logfile size 50M;

Database altered.

SQL> alter database add standby logfile size 50M;

Database altered.

SQL> alter database add standby logfile size 50M;

Database altered.

SQL> alter database add standby logfile size 50M;

Database altered.

SQL>
SQL>select  group# , status, type, member from v$logfile order by 1;


Listener on Standby Server :
LISTENER =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS = (PROTOCOL = TCP)(HOST = oem132.racattack)(PORT = 1521))
      (ADDRESS = (PROTOCOL = IPC)(KEY = EXTPROC1521))
    )
  )

SID_LIST_LISTENER=
(SID_LIST =
  (SID_DESC =
    (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
    (SID_NAME = STBY)
   )
  (SID_DESC =
    (GLOBAL_NAME = STBY_DGMGRL)
    (ORACLE_HOME = /u01/app/oracle/product/12.1.0/dbhome_1)
    (SID_NAME = STBY)
   )
)


tnsnames.ora on first RAC Node :
STBY =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oem132)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STBY)
    )
  )

STBY_DGB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = oem132)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = STBY_DGMGRL)
    )
  )


STBY Instance Parameter File :
compatible=12.1.0.2.0
db_name=RAC
db_unique_name=STBY
enable_pluggable_database=TRUE
sga_target=800M
db_file_name_convert='+DATA/RAC','/u01/app/oracle/oradata/STBY','+FRA/RAC','/u01/app/oracle/fast_recovery_area/STBY'
log_file_name_convert='+DATA/RAC','/u01/app/oracle/oradata/STBY','+FRA/RAC','/u01/app/oracle/fast_recovery_area/STBY'
db_create_file_dest='/u01/app/oracle/oradata'
db_recovery_file_dest='/u01/app/oracle/fast_recovery_area'
db_recovery_file_dest_size=16G
remote_login_passwordfile=EXCLUSIVE


Copying the Password File :
ASMCMD> pwcopy +DATA/RAC/PASSWORD/pwdrac.277.931824933 /var/tmp/RACPasswordFile
copying +DATA/RAC/PASSWORD/pwdrac.277.931824933 -> /var/tmp/RACPasswordFile
ASMCMD> exit
scp /var/tmp/RACPasswordFile oracle@oem132:/u01/app/oracle/product/12.1.0/dbhome_1/dbs/orapwSTBY


to be continued in post 8b on the filesystem directories for the STBY instance and the execution of the DUPLICATE DATABASE command.

Categories: DBA Blogs

shuttle that lists files in a specified directory

Tom Kyte - Tue, 2017-03-07 19:26
I need to make a shuttle that lists files in a specified directory. The user could choose what ones they wanted to process. When they press GO there would be a list of files to process. Is there an example of this that I could look at? Thanks. Apex...
Categories: DBA Blogs

How to insert multiple insert stmt to a table

Tom Kyte - Tue, 2017-03-07 19:26
Hi Team, I've a script where i have 100K insert into stmts , i need to execute these inserts, Please suggest me a faster way to insert Insert insert insert insert ......soo on 100k records Thanks Dnesh.
Categories: DBA Blogs

Martian wants to import XML table from Access

Tom Kyte - Tue, 2017-03-07 19:26
I am so very lost. Thinking I might finally try to figure out how to use MySQL as a replacement for Access2007; I think I downloaded everything I needed, including VB support for C++. I'm pretty sure, but not certain. But for the sake of thi...
Categories: DBA Blogs

Daily Insert, Delete on table

Tom Kyte - Tue, 2017-03-07 19:26
I have a table tb_card_data CARD_NO TYPE EMAIL MODULE CREATED_ON SEQ_NO ------- ---- ----- ------ ---------- --------- 12345 C x@a.com MOD_1 06/03/2017 20170306000001 (create_on + sequence number) .. so on around 500k records for MOD...
Categories: DBA Blogs

Error in Import of Schema containing table(s) having DEFAULT Sequence.nextval values.

Tom Kyte - Tue, 2017-03-07 19:26
Hey TOM, Consider the following test scenario : 1. Create a sequence Connected to: Oracle Database 12c Standard Edition Release 12.1.0.2.0 - 64bit Production SQL> create sequence test_seq start with 100 increment by 10; 2. Use the...
Categories: DBA Blogs

Query runs faster with stats deleted, slower with up to date stats

Tom Kyte - Tue, 2017-03-07 19:26
<code>Hi team AskTOM, Sorry, wasn't sure how to use livesql.oracle.com for my case. I have a situation where (no stats as in dynamic sampling (11.2.04)) is producing a better plan (and faster elapsed time) than with gather_table_stats. Please s...
Categories: DBA Blogs

Ref Cursors - types.cursor_type vs sys_refcursor

Tom Kyte - Tue, 2017-03-07 19:26
There are few stored procedure written which sending OUT REF CURSORS to calling event. They are using types.cursor_type as datatype. IS there any difference in types.cursor_type & SYS_REFCURSOR?
Categories: DBA Blogs

JSON in PL/SQL Oracle 12c

Tom Kyte - Tue, 2017-03-07 19:26
Hello, I have a procedure with below structure: <code> procedure p_json (i_json clob) is l_firstname vacrchar2(10); begin --some code here end;</code> Input parameter i_json will receive Json_object. Below is structure of json object ...
Categories: DBA Blogs

The Side Effects of Drop ‘Unused’ Index

Pythian Group - Tue, 2017-03-07 10:54

In this blog post I’ll demonstrate a scenario where dropping a potentially ‘unused’ index can have a negative influence on the optimizer’s cardinality estimation. Having columns with logical correlation and skewed data can cause difficulties for the optimizer when calculating cardinality. This post will also address what we can do to help the optimizer with better cardinality estimates.

The inspiration for this post was derived from a recent index usage review. One of the requirements was to determine how to confirm which unused indexes qualify to be dropped. I decided to do some tests regarding extended statistics and the effect of potentially dropping an ‘unused’ index. You will observe what kind of result may be seen from the drop of an index which has not been used. It’s important to remember that it does not apply in all cases.  Occasionally, even if the index is used, it doesn’t mean that it’s needed.

This is more or less linked to columns with skewed data and which might have logical relationship.
Hopefully, it can help you answer some of the following questions:

  1. Is the optimizer using the indexes behind scenes?
  2. While there are methods to determine if an index has been used in an execution plan, can an index be dropped on this basis only?
  3. If we drop composite index (constructed from correlated columns), can we do anything to avoid performance degradation?

Before we start with the use case, let’s briefly review some concepts.

The basic formula for selectivity is 1/NDV. The cardinality (CDN) is calculated as selectivity * total number of rows.

The selectivity of a join is defined as the selectivity of the most selective join column adjusted by the proportion of not null values in each join column.

Join Selectivity:
Sel = 1/max[NDV(t1.c1),NDV(t2.c2)] *
           ( (Card t1 - # t1.c1 NULLs) / Card t1) *
           ( (Card t2 - # t2.c2 NULLs) / Card t2)

Join Cardinality:
Card(Pj) = Card(T1) * Card(T2) * Sel(Pj)

In Oracle’s Doc ID 68992.1 you can find a more detailed explanation about different selectivity calculations based on different predicates. For simplicity, I will use equality predicate.

This blog post is divided in three sections.

  1. Use case where we demonstrate how drop of an “unused” index can mess up optimizer cardinality calculation.
  2. How to help optimizer for better cardinality estimation with extended statistics.
  3. More explanation on column correlation (CorStregth).
Preparation

For the first and second part I’ll be using the following configuration.

drop table t1 purge;
drop table t2 purge;
drop table t3 purge;

create table t1
as
select rownum pk,
    mod(level,  20) n1,
    mod(level, 200) n2,
    mod(level,1000) n3,
    lpad('x',42,'x') c1
from dual
connect by level <= 10000
/

create table t2
as
select rownum pk,
    mod(level,  20) n1,
    mod(level, 200) n2,
    mod(level,1000) n3,
    lpad('x',42,'x') c1
from dual
connect by level <= 10 
/ 

CREATE TABLE t3
AS
SELECT rownum pk, lpad('x',rownum,'x') c1
FROM dual
CONNECT BY level <= 2000 / exec dbms_stats.gather_table_stats(user,'T1',method_opt=>'for all columns size 1');
exec dbms_stats.gather_table_stats(user,'T2',method_opt=>'for all columns size 1');
exec dbms_stats.gather_table_stats(user,'T3',method_opt=>'for all columns size 1');


alter table t1 add constraint t1pk primary key(pk);
alter table t2 add constraint t2pk primary key(pk);
alter table t3 add constraint t3pk primary key(pk);

create index t1ix on t1(n1,n2);
create index t2ix on t2(n1,n2);

It creates two tables (T1 and T2) with composite index on N1 and N2 columns for both tables.
The statistics are as follows:

--Column statistics
TABLE_NAME	     COLUMN_NAME	  NUM_DISTINCT	  COL_SEL
-------------------- -------------------- ------------ ----------
T1		     C1 			     1		1
T1		     N1 			    20	      .05
T1		     N2 			   200	     .005
T1		     N3 			  1000	     .001
T2		     C1 			     1		1
T2		     N1 			    10	       .1
T2		     N2 			    10	       .1
T2		     N3 			    10	       .1

--Index statistics
TABLE_NAME	     INDEX_NAME 	    NUM_ROWS DISTINCT_KEYS    IND_SEL
-------------------- -------------------- ---------- ------------- ----------
T1		     T1IX		       10000	       200	 .005
T2		     T2IX			  10		10	   .1

Table T1 has 10000 rows and T2 only 10 rows. I’ve created two composite indexes for T1 and T2 since I am using both (N1,N2) columns in selection. As you can see (from the data dictionary) we have 200 distinct values for T1IX index and 10 for T2IX index.
e.g. Based on the formula we can easily derive selectivity as 1/NDV, for T1IX index we have 1/200 = 0.005.

Drop of an “unused” index can mess up optimizer’s cardinality estimation

The query I’m going to use in this case is:

select t1.c1,t2.c1
from t1,t2
where t1.n1 = t2.n1
and t1.n2 = t2.n2
/

Let’s first check the amount of the cardinality (CDN) before we do any changes. The column we’re interested in is ROWS, that’s where cardinality estimation is shown.

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |   500 | 49500 |    29   (0)| 00:00:01 |
|*  1 |  HASH JOIN	   |	  |   500 | 49500 |    29   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T2   |    10 |   490 |	3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   488K|    26   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")

Based on the formula provided, you may have expected to see cardinality (CDN) of 25. Since, we don’t have nulls we can do the calculation as 1/max(t1.n1,t2.n1) * 1/max(t1.n2,t2.n2) = 1/20 * 1/200 = 0.05 * 0.005 = 0.00025. We came up with selectivity of 0.00025.
Now, if we now do the join cardinality calculation 10 (T1) * 10000 (T2) * selectivity 0.00025 = 1000000 * 0.00025 = 25

But instead of 25, the optimizer calculated cardinality (CDN) of 500. In order to better understand the calculation, let’s review the 10053 event trace file.

  ColGroup (#1, Index) T1IX
    Col#: 1 2    CorStregth: 20.00
  ColGroup (#1, Index) T2IX
    Col#: 1 2    CorStregth: 10.00
Join ColGroups for T1[T1] and T2[T2] : (#1, #1)

What’s interesting here is the Join ColGroups analysis. Since, we have full match on the join with the indexed columns, the optimizer is considering Join ColGroups between T1 and T2. It will use ColGroups statistics to calculate join cardinality (CDN).

Col#: column position (self explanatory)
CorStregth: correlation strength. Will come back later on correlation strength (CorStregth).

Within the trace file in Nested Loop (NL) section (remember join cardinality is only listed in NL section) you should see something like:

Join selectivity using 1 ColGroups: 0.005000 (sel1 = 0.000000, sel2 = 0.000000)
Join Card:  500.000000 = outer (10.000000) * inner (10000.000000) * sel (0.005000)
Join Card - Rounded: 500 Computed: 500.000000

Based on the join selectivity formula, the optimizer chooses T1IX index (ColGroup) statistics for selectivity calculation, hence 1/NDV (T1IX index) = 1/200 = 0.005.
Afterwards, join cardinality is calculated as num_rows (T1) * num_rows (T2) * selectivity (0.005) or 10 * 10000 * 0.005 = 100000 * 0.005 = 500.

If you’re not aware of such cases, you may decide to drop an index based solely on the execution plan information. Even worse, the developers can easily decide to drop an index without knowing the possible effect.

Now, let’s observe the cardinality estimation after T1IX index is dropped.

SQL> drop index t1ix;

Index dropped.

...

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |    25 |  2475 |    29   (0)| 00:00:01 |
|*  1 |  HASH JOIN	   |	  |    25 |  2475 |    29   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T2   |    10 |   490 |	3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   488K|    26   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")

After we dropped T1IX the cardinality dropped from 500 to 25! This can have big impact on the performance. If we add additional tables to the join the optimizer can possibly re-order tables, switch Hash Join to Nested Loops or Merge Joins.

So, why did the cardinality (CDN) change after “unused” index was dropped?

Even if the index was not used, it doesn’t mean that its statistics were also not used. That’s the case here. The optimizer used the NDV of the column group created by the T1IX index to calculate the join selectivity.

Lets again check the 10053 event trace file, now where the query has been executed without T1IX index.

  
ColGroup (#1, Index) T2IX
    Col#: 1 2    CorStregth: 10.00
Join ColGroups for T1[T1] and T2[T2] : Using cdn sanity check

...

ColGroup cardinality sanity check: ndv for  T1[T1] = 4000.000000  T2[T2] = 10.000000
Join selectivity using 1 ColGroups: 2.5000e-04 (sel1 = 0.000000, sel2 = 0.000000)
Join Card:  25.000000 = outer (10.000000) * inner (10000.000000) * sel (2.5000e-04)
Join Card - Rounded: 25 Computed: 25.000000

We don’t have full match for Join ColGroups (we’re missing the statistics from T1IX index). The optimizer decided to perform cardinality (CDN) sanity check where (after the sanity check) it ended up with a number of distinct values (NDV) of 4000.

How did the optimizer calculate 4000 ?

Since we don’t have column group, the optimizer did a basic calculation of N1 (sel) * N2 (sel) = 0.00025. Having the (N1,N2) selectivity, we can extract the NDV as 1/0.00025 = 4000.
The representation in the trace file is slightly different as 2.5000e-04 = 2.5000 * (1/10000) = 0.00025

Having incorrect cardinality estimate the optimizer can make wrong decision for join methods, as we can see in the following example when we join T3 table.

--query
select t1.c1,t2.c1,t3.c1
from t1,t2,t3
where t1.n1 = t2.n1
and t1.n2 = t2.n2
and t1.pk = t3.pk
/

Execution Plan
----------------------------------------------------------
Plan hash value: 1184213596

----------------------------------------------------------------------------
| Id  | Operation	    | Name | Rows  | Bytes | Cost (%CPU)| Time	   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |	   |   500 |   541K|   120   (0)| 00:00:01 |
|*  1 |  HASH JOIN	    |	   |   500 |   541K|   120   (0)| 00:00:01 |
|*  2 |   HASH JOIN	    |	   |   500 | 51500 |	30   (0)| 00:00:01 |
|   3 |    TABLE ACCESS FULL| T2   |	10 |   490 |	 3   (0)| 00:00:01 |
|   4 |    TABLE ACCESS FULL| T1   | 10000 |   527K|	27   (0)| 00:00:01 |
|   5 |   TABLE ACCESS FULL | T3   |  2000 |  1964K|	90   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."PK"="T3"."PK")
   2 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")

SQL> drop index t1ix; 

Index dropped.

Execution Plan
----------------------------------------------------------

Plan hash value: 142045272

-------------------------------------------------------------------------------------
| Id  | Operation		     | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	     |	    |	 25 | 27725 |	 35   (0)| 00:00:01 |
|   1 |  NESTED LOOPS		     |	    |	 25 | 27725 |	 35   (0)| 00:00:01 |
|   2 |   NESTED LOOPS		     |	    |	 25 | 27725 |	 35   (0)| 00:00:01 |
|*  3 |    HASH JOIN		     |	    |	 25 |  2575 |	 30   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL	     | T2   |	 10 |	490 |	  3   (0)| 00:00:01 |
|   5 |     TABLE ACCESS FULL	     | T1   | 10000 |	527K|	 27   (0)| 00:00:01 |
|*  6 |    INDEX UNIQUE SCAN	     | T3PK |	  1 |	    |	  0   (0)| 00:00:01 |
|   7 |   TABLE ACCESS BY INDEX ROWID| T3   |	  1 |  1006 |	  1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")
   6 - access("T1"."PK"="T3"."PK")

Another example with different data set where there is a change in the row source used for building a hash table.

--query
SELECT t1.c1, t2.c1
FROM t1,t2,t3
WHERE t1.n1 = t2.n1
AND t1.n2 = t2.n2
AND t1.n1 = t3.pk
/

Execution Plan
----------------------------------------------------------
Plan hash value: 2855555999

-------------------------------------------------------------------------------
| Id  | Operation	       | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |      |  5000K|  1525M|  2766	(1)| 00:00:01 |
|*  1 |  HASH JOIN	       |      |  5000K|  1525M|  2766	(1)| 00:00:01 |
|   2 |   TABLE ACCESS FULL    | T2   |  1000 |   251K|    14	(0)| 00:00:01 |
|*  3 |   HASH JOIN	       |      |  1000K|    59M|  2739	(1)| 00:00:01 |
|   4 |    INDEX FAST FULL SCAN| T3PK |  2000 |  8000 |     3	(0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL   | T1   |  1000K|    55M|  2734	(1)| 00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")
   3 - access("T1"."N1"="T3"."PK")

SQL> drop index t1ix; 

Index dropped.

...

Execution Plan
----------------------------------------------------------
Plan hash value: 1993272941

------------------------------------------------------------------------------
| Id  | Operation	      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |      |	 250K|	  76M|	2754   (1)| 00:00:01 |
|*  1 |  HASH JOIN	      |      |	 250K|	  76M|	2754   (1)| 00:00:01 |
|   2 |   INDEX FAST FULL SCAN| T3PK |	2000 |	8000 |	   3   (0)| 00:00:01 |
|*  3 |   HASH JOIN	      |      |	 250K|	  75M|	2750   (1)| 00:00:01 |
|   4 |    TABLE ACCESS FULL  | T2   |	1000 |	 251K|	  14   (0)| 00:00:01 |
|   5 |    TABLE ACCESS FULL  | T1   |	1000K|	  55M|	2734   (1)| 00:00:01 |
------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."N1"="T3"."PK")
   3 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")

Although this will not apply in all cases, be careful before deciding to drop an index. Although it may seem like a good idea initially, it can have big impact with wrong cardinality estimation. This could then lead to incorrect execution plan decisions, poor performance and unhappy customers.

So, what is the solution?
We don’t want huge indexes to be (DML) maintained if not used, right?

This brings me to the second part, Extended Statistics.

Extended Statistics

Oracle introduced extended statistics with Oracle 11g. Its main functionality is to highlight column group relation. The extended statistics helps the optimizer become more familiar with logical relation between column groups, otherwise unknown information. The extended statistics are represented as virtual (hidden) columns on which statistics are gathered.

When we had T1IX index, the optimizer used its NDV to calculate correct cardinality. The index statistics were used for the column group of N1 and N2 columns. However, after we dropped the index (even though it was never used), we’ve made a mess of the optimizer with its cardinality estimates.

Extended statistics are maintained by DBMS_STATS package, function CREATE_EXTENDED_STATS. Let’s create extended statistics and see the cardinality estimation.

SQL> select dbms_stats.create_extended_stats(null,'T1','(N1,N2)') from dual
  2  /

DBMS_STATS.CREATE_EXTENDED_STATS(NULL,'T1','(N1,N2)')
------------------------------------------------------------
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS

SQL> set autotrace trace exp
SQL> 
SQL> select t1.c1,t2.c1
  2  from t1,t2
  3  where t1.n1 = t2.n1
  4  and t1.n2 = t2.n2
  5  /

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |    25 |  2475 |    29   (0)| 00:00:01 |
|*  1 |  HASH JOIN	   |	  |    25 |  2475 |    29   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T2   |    10 |   490 |	3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   488K|    26   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")

I know, you’re thinking “It didn’t help” and you’re right. The optimizer did not use the column group (extension). If you look a little deeper, USER_TAB_COL_STATISTICS you will see that the virtual column is not created, only the extension definition was created.

SQL> SELECT column_name, num_distinct, histogram
FROM   user_tab_col_statistics
WHERE  table_name = 'T1'
/

COLUMN_NAME				 NUM_DISTINCT HISTOGRAM
---------------------------------------- ------------ ---------------
N1						   20 NONE
N2						  200 NONE
N3						 1000 NONE
C1						    1 NONE

SQL> col extension_name for a40
SQL> SELECT extension_name, extension
FROM   user_stat_extensions
WHERE  table_name = 'T1'
/

EXTENSION_NAME				 EXTENSION
---------------------------------------- --------------------
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS		 ("N1","N2")

This behavior is a bit expected, because DBMS_STATS.CREATE_EXTENDED_STATS will only create an extension. It won’t gather statistics for the extension. If you check the 10053 event trace file you will see NO STATISTICS (using defaults).

Column (#5): SYS_STUBZH0IHA7K$KEBJVXO5LOHAS(NUMBER)  NO STATISTICS (using defaults)

In order to fix this, we need to re-gather statistics on the table.

SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> SELECT column_name, num_distinct, histogram
FROM   user_tab_col_statistics
WHERE  table_name = 'T1'
/

COLUMN_NAME				 NUM_DISTINCT HISTOGRAM
---------------------------------------- ------------ ---------------
N1						   20 NONE
N2						  200 NONE
N3						 1000 NONE
C1						    1 NONE
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS			  200 NONE


SQL> select t1.c1,t2.c1
from t1,t2
where t1.n1 = t2.n1
and t1.n2 = t2.n2
/

Execution Plan
----------------------------------------------------------
Plan hash value: 2959412835

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |   500 | 49500 |    29   (0)| 00:00:01 |
|*  1 |  HASH JOIN	   |	  |   500 | 49500 |    29   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T2   |    10 |   490 |	3   (0)| 00:00:01 |
|   3 |   TABLE ACCESS FULL| T1   | 10000 |   488K|    26   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."N1"="T2"."N1" AND "T1"."N2"="T2"."N2")

Now the optimizer calculated the correct cardinality. If you check the trace file more extensively, you should see the same join cardinality information comparable to when we had the index in place. The difference would only be that this time it’s using the extension statistics.

...
  ColGroup (#1, VC) SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
    Col#: 1 2    CorStregth: 20.00
  ColGroup (#1, Index) T2IX
    Col#: 1 2    CorStregth: 10.00

...

Join selectivity using 1 ColGroups: 0.005000 (sel1 = 0.000000, sel2 = 0.000000)
Join Card:  500.000000 = outer (10.000000) * inner (10000.000000) * sel (0.005000)
Join Card - Rounded: 500 Computed: 500.000000

With extended statistics we were able to help the optimizer to calculate cardinality (CDN) of 500. It used extension NDV in the same way it used T1IX index statistics.

However, what if T1 table was a big table and you already gathered statistics before creating the extension (column group)? Re-gathering can be time-consuming and have negative impact on the performance. To answer this question, we have to employ the METHOD_OPT approach from DBMS_STATS.GATHER_TABLE_STATS.
Creating extension via METHOD_OPT will automatically gather statistics on the column group.

SQL>
SQL> SELECT column_name, num_distinct, histogram
FROM   user_tab_col_statistics
WHERE  table_name = 'T1'
/  

COLUMN_NAME	     NUM_DISTINCT HISTOGRAM
-------------------- ------------ ---------------
C1				1 NONE
N3			     1000 NONE
N2			      200 NONE
N1			       20 NONE

SQL> exec dbms_stats.gather_table_stats(null,'T1',method_opt=>'for all columns size 1 for columns size 1 (N1,N2)');

PL/SQL procedure successfully completed.

SQL> col column_name for a40
SQL> SELECT column_name, num_distinct, histogram
FROM   user_tab_col_statistics
WHERE  table_name = 'T1'
/

COLUMN_NAME				 NUM_DISTINCT HISTOGRAM
---------------------------------------- ------------ ---------------
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS			  200 NONE
C1						    1 NONE
N3						 1000 NONE
N2						  200 NONE
N1						   20 NONE

In this case an extension was automatically created and in the same time statistics were gathered.

If you find yourself in a situation were you used DBMS_STATS.CREATE_EXTENDED_STATS after gathering statistics, there are multiple ways to gather statistics on already existing extensions.

1) DBMS_STATS.GATHER_TABLE_STATS
(it will include all virtual columns)

2) METHOD_OPT=>’for columns size … (column group)’
(you can list the actual column group as you do for CREATE_EXTENDED_STATS)

3) METHOD_OPT=>’for columns size … extension_name’
(you can specify the extension name obtained from DBA|ALL|USER_STAT_EXTENSIONS dictionary view)

4) METHOD_OPT=>’for all hidden columns’
(hidden columns are by default included in default DBMS_STATS.GATHER_TABLE_STATS configuration)

What’s the difference between METHOD_OPT and DBMS_STATS.CREATE_EXTENDED_STATS?

I believe it behaves in that manner because METHOD_OPT is a part of a procedure that is designed for statistics gathering. Extended statistics (column groups) were introduced with 11g and are additional functionality to METHOD_OPT. The main functionality of it is to gather statistics, hence gathering statistics via METHOD_OPT creates the extension and gathers statistics at the same time.
Perhaps Oracle could have used different naming, something like CREATE_COLUMN_GROUP instead of CREATE_EXTENDED_STATS (which is misleading).

Column Correlation (CorStreght)

Column correlation represents how much the columns within a given group have mutual relation. It’s calculated as product of NDV of all columns in the group divided with the NDV of the column group. This is clearly explained in MultiColumn/Column Group Statistics – Additional Examples (Doc ID 872406.1).

Product of individual NDVs / NDV for the column group.

If the product of all columns NDV is smaller than CG NDV, the optimizer will ignore CG (extended) stats. Based on the tests I performed, the CorStregth is taken into consideration only when the optimizer has to decide between multiple column groups for which Full Match can be constructed. In the other cases, when Partial Match is used the optimizer it picked the column group with smaller NDV.

Here are two different cases; one where Full Match is constructed and another for Partial Match:

Column Group – Full Match

Let’s examine the first case where we can construct Full Match.

COLUMN_NAME		NUM_DISTINCT
---------------------------------------- ------------
N1						   20
N2						  200
N3						 1000
C1						    1
SYS_STUBZH0IHA7K$KEBJVXO5LOHAS			  200 ---> (N1,N2)
SYS_STUXQEJ8WOFTK5R5EYM4DF_60V			 1000 ---> (N2,N3)

select *
from t1
where t1.n1 = 10
and t1.n2 = 40
and t1.n3 = 80
/

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |     5 |   270 |   228   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1	 |     5 |   270 |   228   (1)| 00:00:01 |
--------------------------------------------------------------------------

Trace file
...
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1): N1(NUMBER)
    AvgLen: 3 NDV: 20 Nulls: 0 Density: 0.050000 Min: 0.000000 Max: 19.000000
  Column (#2): N2(NUMBER)
    AvgLen: 4 NDV: 200 Nulls: 0 Density: 0.005000 Min: 0.000000 Max: 199.000000
  Column (#3): N3(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 0.000000 Max: 999.000000
  Column (#6): SYS_STUXQEJ8WOFTK5R5EYM4DF_60V(NUMBER)
    AvgLen: 12 NDV: 1000 Nulls: 0 Density: 0.001000
  Column (#5): SYS_STUBZH0IHA7K$KEBJVXO5LOHAS(NUMBER)
    AvgLen: 12 NDV: 200 Nulls: 0 Density: 0.005000
  ColGroup (#1, VC) SYS_STUXQEJ8WOFTK5R5EYM4DF_60V
    Col#: 2 3    CorStregth: 200.00
  ColGroup (#2, VC) SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
    Col#: 1 2    CorStregth: 20.00
  ColGroup Usage:: PredCnt: 3  Matches Full: #1  Partial:  Sel: 0.001000

We have T1 table with two column groups on (N1,N2) and (N2,N3).
In this case the product of the individual columns N1,N2,N3 NDV is much higher than column groups NDV, hence the optimizer will calculate the cardinality (CDN) based on the extensions. Having a selection with (N1,N2,N3) the optimizer can perform Full Match for both of the column groups. In this scenario, the cardinality can be expressed as:

1) Using CG #1 (N2,N3): 1/CG(NDV) * 1/N1(NDV) = 1/1000 * 1/20 = 0.001 * 0.05 = 0.00005 * 100000 = 5

2) Using CG #2 (N1,N2): 1/CG(NDV) * 1/N3(NDV) = 1/200 * 1/1000 = 0.005 * 0.0001 = 0.000005 * 100000 = 0.5 (rounded to 1)

The following instance demonstrates where the optimizer decided to use CG (N2,N3) or SYS_STUXQEJ8WOFTK5R5EYM4DF_60V because it has higher CorStregth of 200 (vs. 20 for CG (N1,N2)).
To show how CorStregth can influence the optimizer, I’ve changed N1 distinct value to 210 to increase CG (N1,N2) CorStregth.
Here, case column group (N1,N2) was used in cardinality estimates.

exec dbms_stats.set_column_stats(null,'T1','N1',distcnt=>210);

Execution Plan
----------------------------------------------------------

Plan hash value: 3724264953

---------------------------------------------------------------------------
| Id  | Operation	   | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |	1 |    11 |   228   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE    |	  |	1 |    11 |	       |	  |
|*  2 |   TABLE ACCESS FULL| T1   |	1 |    11 |   228   (1)| 00:00:01 |
---------------------------------------------------------------------------



--Trace file
...
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1): N1(NUMBER)
    AvgLen: 3 NDV: 210 Nulls: 0 Density: 0.004762 Min: 0.000000 Max: 19.000000
  Column (#2): N2(NUMBER)
    AvgLen: 4 NDV: 200 Nulls: 0 Density: 0.005000 Min: 0.000000 Max: 199.000000
  Column (#3): N3(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 0.000000 Max: 999.000000
  Column (#6): SYS_STUXQEJ8WOFTK5R5EYM4DF_60V(NUMBER)
    AvgLen: 12 NDV: 1000 Nulls: 0 Density: 0.001000
  Column (#5): SYS_STUBZH0IHA7K$KEBJVXO5LOHAS(NUMBER)
    AvgLen: 12 NDV: 200 Nulls: 0 Density: 0.005000
  ColGroup (#2, VC) SYS_STUBZH0IHA7K$KEBJVXO5LOHAS
    Col#: 1 2    CorStregth: 210.00
  ColGroup (#1, VC) SYS_STUXQEJ8WOFTK5R5EYM4DF_60V
    Col#: 2 3    CorStregth: 200.00
  ColGroup Usage:: PredCnt: 3  Matches Full: #2  Partial:  Sel: 0.005000

We can see that the optimizer chose CG #2 based on CorStregth.

Column Group – Partial Match

This second case shows what happens when Full Match can’t be constructed and the optimizer chooses the column group with smaller NDV. I’ve built the same T1 table, but now with Column Groups (N1,N2,N3) and (N1,N2,C1).

COLUMN_NAME		 NUM_DISTINCT
---------------------------------------- ------------
N1						   20
N2						  200
N3						 1000
C1						    1
SYS_STU2NGW2O#$ZX2CDZYOMNMFP64			  200
SYS_STUOYQUEIAZ7FI9DV53VLN$$$0			 1000


--Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |   500 | 27000 |   228   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1	 |   500 | 27000 |   228   (1)| 00:00:01 |
--------------------------------------------------------------------------



--Trace file
...
Access path analysis for T1
***************************************
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1): N1(NUMBER)
    AvgLen: 3 NDV: 20 Nulls: 0 Density: 0.050000 Min: 0.000000 Max: 19.000000
  Column (#2): N2(NUMBER)
    AvgLen: 4 NDV: 200 Nulls: 0 Density: 0.005000 Min: 0.000000 Max: 199.000000
  Column (#6): SYS_STUOYQUEIAZ7FI9DV53VLN$$$0(NUMBER)
    AvgLen: 12 NDV: 1000 Nulls: 0 Density: 0.001000
  Column (#5): SYS_STU2NGW2O#$ZX2CDZYOMNMFP64(NUMBER)
    AvgLen: 12 NDV: 200 Nulls: 0 Density: 0.005000
  Column (#3): N3(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 0.000000 Max: 999.000000
  Column (#4): C1(VARCHAR2)
    AvgLen: 43 NDV: 1 Nulls: 0 Density: 1.000000
  ColGroup (#1, VC) SYS_STUOYQUEIAZ7FI9DV53VLN$$$0
    Col#: 1 2 3    CorStregth: 4000.00
  ColGroup (#2, VC) SYS_STU2NGW2O#$ZX2CDZYOMNMFP64
    Col#: 1 2 4    CorStregth: 20.00
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: #2 (1 2 )  Sel: 0.005000
  Table: T1  Alias: T1

As CG #1 has way higher CorStregth of 4000 (vs. 20) the optimizer used CG #2. Let’s decrease CG #2 NDV to 100 to see whether it will influence the optimizer to use CG #1.

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation	  | Name | Rows  | Bytes | Cost (%CPU)| Time	 |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	 |  1000 | 50000 |   228   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T1	 |  1000 | 50000 |   228   (1)| 00:00:01 |
--------------------------------------------------------------------------



--Trace file
...
SINGLE TABLE ACCESS PATH
  Single Table Cardinality Estimation for T1[T1]
  SPD: Return code in qosdDSDirSetup: NOCTX, estType = TABLE
  Column (#1): N1(NUMBER)
    AvgLen: 3 NDV: 20 Nulls: 0 Density: 0.050000 Min: 0.000000 Max: 19.000000
  Column (#2): N2(NUMBER)
    AvgLen: 4 NDV: 200 Nulls: 0 Density: 0.005000 Min: 0.000000 Max: 199.000000
  Column (#6): SYS_STUOYQUEIAZ7FI9DV53VLN$$$0(NUMBER)
    AvgLen: 12 NDV: 100 Nulls: 0 Density: 0.010000
  Column (#5): SYS_STU2NGW2O#$ZX2CDZYOMNMFP64(NUMBER)
    AvgLen: 12 NDV: 200 Nulls: 0 Density: 0.005000
  Column (#3): N3(NUMBER)
    AvgLen: 4 NDV: 1000 Nulls: 0 Density: 0.001000 Min: 0.000000 Max: 999.000000
  Column (#4): C1(VARCHAR2)
    AvgLen: 43 NDV: 1 Nulls: 0 Density: 1.000000
  ColGroup (#1, VC) SYS_STUOYQUEIAZ7FI9DV53VLN$$$0
    Col#: 1 2 3    CorStregth: 40000.00
  ColGroup (#2, VC) SYS_STU2NGW2O#$ZX2CDZYOMNMFP64
    Col#: 1 2 4    CorStregth: 20.00
  ColGroup Usage:: PredCnt: 2  Matches Full:  Partial: #1 (1 2 ) Sel: 0.010000
  Table: T1  Alias: T1

Since CG #1 has smaller NDV (100) (vs. #2, 200), it calculated the cardinality (CDN) based on CG #1 as:
1/NDV (CG) * Original Cardinality = 1/100 * 100000 = 0.01 * 100000 = 1000

Summary

There are some situations where dropping a potentially ‘unused’ index can have a negative influence on the optimizer’s cardinality estimation. Consider using extended statistics to help the optimizer in order to become more familiar with columns logical relationship. The extended statistics are represented with virtual (hidden) column. Regarding statistics gathering, note that DBMS_STATS.CREATE_EXTENDED_STATS won’t gather statistics automatically, use METHOD_OPT instead. If you have skewed data, you may want to consider creating proper histograms. When histograms are on base columns, you’ll also need to create histograms on the column group in order to be taken into consideration. This is because Oracle gives priority to histograms. You can use column groups extended statistics to preserve cardinality stability as consequence from dropping an index. While it might not be appropriate for all cases, it’s a viable option for the optimizers help in good cardinality estimation.

Lastly, you can use DBMS_STATS.SEED_COL_USAGE procedure to monitor columns relation in filter predicates, join predicates and group by clauses. A report can be generated with DBMS_STATS.REPORT_COL_USAGE where its information can be used for appropriate column groups identification. I look forward to exploring more on that in a future post.

Categories: DBA Blogs

Partner Webcast – Enabling Oracle Database High Availability and Disaster Recovery with Oracle Cloud

The Database High Availability and Disaster Recovery needs of customers traditionally have required significant capital investment in the infrastructure that provides the redundant capabilities that...

We share our skills to maximize your revenue!
Categories: DBA Blogs

Statistics In-Database Archiving

Tom Kyte - Tue, 2017-03-07 01:06
Hi, I am curious about the table statistics if I use In-Database Archiving. Are the statistics calculated including or excluding invisible rows? Best regards, Mathias
Categories: DBA Blogs

Pages

Subscribe to Oracle FAQ aggregator - DBA Blogs