Re: extended statistics and non-existent combined values

From: <l.flatz_at_bluewin.ch>
Date: Wed, 1 Aug 2018 13:37:26 +0200 (CEST)
Message-ID: <740477377.8287.1533123446418.JavaMail.webmail_at_bluewin.ch>





Hi,
I think the issue is that 2 potenial buckets of the histogram do not shop up because there is no data for it. In theory we have 4 combinations YY, NN, NY, YN. When we looks at stats we see only tow buckets of course:

COLUMN_NAME                      NUM_DISTINCT    DENSITY      NUM_BUCKETS HISTOGRAM     

-------------------------------- ------------ ---------- ----------- ---------------
C1 222988 ,00000448455128 256 HEIGHT BALANCED C2 2 ,00000112113908 2 FREQUENCY C3 2 ,00000112113908 2 FREQUENCY SYS_STUOXVZ1C2WGW4DRVBD89VDEO_ 2 ,00000112113908 2 FREQUENCY
select num_rows from user_tables where table_name='T20'; 445975
The estimate looks like roughly 10% of the rows..
| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |

----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 160 (100)| |
|* 1 | TABLE ACCESS STORAGE FULL| T20 | 44332 | 389K| 160 (5)| 00:00:01 |
 

Predicate Information (identified by operation id):


 

   1 - storage(("C2"='N' AND "C3"='N'))

       filter(("C2"='N' AND "C3"='N'))  

Actually the estimate for a missing bucket is not supposed to be zero, but 10% seems far to high. I wonder if extended stats are used at all for this estimate. When we check for an existing combination the estimate is spot on.  Regards
Lothar
----Ursprüngliche Nachricht----

Von : exriscer_at_gmail.com
Datum : 31/07/2018 - 18:03 (GMT)
An : oracle-l_at_freelists.org
Betreff : extended statistics and non-existent combined values Hi
I have a problematic query which is not doing index scan. I tried to fix the issue using extended statistics but without success. I have following test case, the idea is the estimated cardinality combining column C2 and C3 should return as few rows as possible because the predicate C2 = N and C3 = N does not return any rows but with extended statistics it is actually estimating 915 rows. Anyone can think of an workaround? C2 has only two distinct values, Y and N C3 has only two distinct values, Y and N C2 and C3 with both values N returns no rows. Thanks
create table t20
(
 c1 number,
 c2 varchar2(5),
 c3 varchar2(5)
);
insert into t20
select rownum,

       case when object_type = 'TABLE' then 'N' else 'Y' end c2,
       case when object_type = 'TABLE' then 'Y' else 'N' end c3
  from dba_objects;
create index t20_i1 on t20(c2, c3);
select c2, c3, count(*) from t20 group by c2, c3; C2 C3 COUNT(*)
----- ----- ----------
N     Y           1994
Y     N          71482

exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR ALL COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2') -- c2 = 'N' and c3 = 'N' returns cero rows select * from t20 where c2 = 'N' and c3 = 'N'; no rows selected
Execution Plan

Plan hash value: 287249393
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 2121 | 19089 | 11 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T20 | 2121 | 19089 | 11 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T20_I1 | 2121 | | 5 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

   2 - access("C2"='N' AND "C3"='N')
-- create extended statistics for C2 and C3 select
dbms_stats.create_extended_stats('LSC', 'T20','(C2, C3)') from dual;
exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR ALL COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2 FOR COLUMNS (C2, C3) SIZE 2') -- c2 = 'N' and c3 = 'N' returns cero rows but even with extended statistics it estimates 915 rows select * from t20 where c2 = 'N' and c3 = 'N'; no rows selected
Execution Plan



Plan hash value: 287249393
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |

--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 915 | 8235 | 5 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID| T20 | 915 | 8235 | 5 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | T20_I1 | 915 | | 2 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
Predicate Information (identified by operation id):

   2 - access("C2"='N' AND "C3"='N')

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Aug 01 2018 - 13:37:26 CEST

Original text of this message