Re: extended statistics and non-existent combined values

From: Ls Cheng <exriscer_at_gmail.com>
Date: Tue, 31 Jul 2018 20:29:43 +0200
Message-ID: <CAJ2-Qb-bpGrxELd_NCYhi6UBUer6xYcZxZnHe5WgX5GWv5_T=g_at_mail.gmail.com>



Hi

I think my problem is raleted to out of range values. I will look into store low selectivity and faking histograms to solve this.

I wonder why even with column statistics Oracle cannot determine that the rows that satisfies non-existent values (or out of range) is simply 0?

Thanks

On Tue, Jul 31, 2018 at 6:36 PM, Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk
> wrote:

>
> That's Oracle doing what it does.
> Your example 1: If you have a histogram on any of the individual columns
> of a column group but NOT on the column group Oracle uses the individual
> selectivities and multiplies (standard mechanism).
>
> Your example 2: If you have a frequency histogram and request a value that
> does appear in the histogram Oracle uses half the selectivity of the least
> popular item in the histogram. In your case 1994/2 = 915 (approximately).
>
> I'm guessing your on 11g since the histogram figures appear to be sampled
> rather than 100%.
>
>
> You may find that if you use dbms_stats.get_column_stats();
> dbms_stats.set_column_stats() you can set the stored selectivity of the
> column group to something very small so that the estimated cardinality is
> 1. (Depending on version this may not have any effect, or it may only work
> if you also create a stat table then export and re-import the column stats).
>
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
> behalf of Ls Cheng <exriscer_at_gmail.com>
> Sent: 31 July 2018 17:03:26
> To: Oracle Mailinglist
> Subject: extended statistics and non-existent combined values
>
> Hi
>
> I have a problematic query which is not doing index scan.
>
> I tried to fix the issue using extended statistics but without success. I
> have following test case, the idea is the estimated cardinality combining
> column C2 and C3 should return as few rows as possible because the
> predicate C2 = N and C3 = N does not return any rows but with extended
> statistics it is actually estimating 915 rows. Anyone can think of an
> workaround?
>
> C2 has only two distinct values, Y and N
> C3 has only two distinct values, Y and N
>
> C2 and C3 with both values N returns no rows.
>
> Thanks
>
>
>
>
> create table t20
> (
> c1 number,
> c2 varchar2(5),
> c3 varchar2(5)
> );
>
> insert into t20
> select rownum,
> case when object_type = 'TABLE' then 'N' else 'Y' end c2,
> case when object_type = 'TABLE' then 'Y' else 'N' end c3
> from dba_objects;
>
> create index t20_i1 on t20(c2, c3);
>
> select c2, c3, count(*) from t20 group by c2, c3;
>
> C2 C3 COUNT(*)
> ----- ----- ----------
> N Y 1994
> Y N 71482
>
> exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR ALL
> COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2')
>
> -- c2 = 'N' and c3 = 'N' returns cero rows
> select * from t20 where c2 = 'N' and c3 = 'N';
>
> no rows selected
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 287249393
>
> ------------------------------------------------------------
> --------------------------
> | Id | Operation | Name | Rows | Bytes | Cost
> (%CPU)| Time |
> ------------------------------------------------------------
> --------------------------
> | 0 | SELECT STATEMENT | | 2121 | 19089 | 11
> (0)| 00:00:01 |
> | 1 | TABLE ACCESS BY INDEX ROWID| T20 | 2121 | 19089 | 11
> (0)| 00:00:01 |
> |* 2 | INDEX RANGE SCAN | T20_I1 | 2121 | | 5
> (0)| 00:00:01 |
> ------------------------------------------------------------
> --------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 2 - access("C2"='N' AND "C3"='N')
>
> -- create extended statistics for C2 and C3
> select
> dbms_stats.create_extended_stats('LSC', 'T20','(C2, C3)')
> from dual;
>
> exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR ALL
> COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2 FOR COLUMNS
> (C2, C3) SIZE 2')
>
>
> -- c2 = 'N' and c3 = 'N' returns cero rows but even with extended
> statistics it estimates 915 rows
> select * from t20 where c2 = 'N' and c3 = 'N';
>
> no rows selected
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 287249393
>
> ------------------------------------------------------------
> --------------------------
> | Id | Operation | Name | Rows | Bytes | Cost
> (%CPU)| Time |
> ------------------------------------------------------------
> --------------------------
> | 0 | SELECT STATEMENT | | 915 | 8235 | 5
> (0)| 00:00:01 |
> | 1 | TABLE ACCESS BY INDEX ROWID| T20 | 915 | 8235 | 5
> (0)| 00:00:01 |
> |* 2 | INDEX RANGE SCAN | T20_I1 | 915 | | 2
> (0)| 00:00:01 |
> ------------------------------------------------------------
> --------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 2 - access("C2"='N' AND "C3"='N')
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue Jul 31 2018 - 20:29:43 CEST

Original text of this message