Re: extended statistics and non-existent combined values

From: Mladen Gogala <gogala.mladen_at_gmail.com>
Date: Wed, 1 Aug 2018 15:16:13 -0400
Message-ID: <67070696-2f56-8809-0f32-4737929d595e_at_gmail.com>



A slightly lighter sledgehammer would be using /*+ DYNAMIC_SAMPLING(table,11) */.

On 7/31/2018 12:39 PM, Mark W. Farnham wrote:
>
> The sledge hammer approach is to select count(*) from t20 where c2 =
> 'N' and c3 = 'N';
>
> and then generate your query with a cardinality hint.
>
> You might want to hint the use of the index for the count(*) query.
>
> Of course you already know this is the sort of thing that the CBO is
> supposed to get right routinely and I believe you’ve already done
> everything correctly to give it the best possible chance. Sigh.
>
> Now I do take it from your one is Y, one is N values that this is
> actually an either or in your database. IF I’m correct about that and
> C2=Y literally implies that C3=N, they you might want to code it up
> that way, leaving C3 out of the database entirely and out of queries
> as a predicate. If someone wants it instantiated for them in a query,
> I supposed you could make C3 a virtual column. I’m presuming a **lot**
> for that to be true, and the CBO should be getting this right as
> you’ve done it. (C2 could actually be N for N and null for Y if you
> really want to get the index as small as possible.)
>
> mwf
>
> *From:*oracle-l-bounce_at_freelists.org
> [mailto:oracle-l-bounce_at_freelists.org] *On Behalf Of *Ls Cheng
> *Sent:* Tuesday, July 31, 2018 12:03 PM
> *To:* Oracle Mailinglist
> *Subject:* extended statistics and non-existent combined values
>
> Hi
>
> I have a problematic query which is not doing index scan.
>
> I tried to fix the issue using extended statistics but without
> success. I have following test case, the idea is the estimated
> cardinality combining column C2 and C3 should return as few rows as
> possible because the predicate C2 = N and C3 = N does not return any
> rows but with extended statistics it is actually estimating 915 rows.
> Anyone can think of an workaround?
>
> C2 has only two distinct values, Y and N
>
> C3 has only two distinct values, Y and N
>
> C2 and C3 with both values N returns no rows.
>
> Thanks
>
>
>
> create table t20
> (
>  c1 number,
>  c2 varchar2(5),
>  c3 varchar2(5)
> );
>
> insert into t20
> select rownum,
>        case when object_type = 'TABLE' then 'N' else 'Y' end c2,
>        case when object_type = 'TABLE' then 'Y' else 'N' end c3
>   from dba_objects;
>
> create index t20_i1 on t20(c2, c3);
>
> select c2, c3, count(*) from t20 group by c2, c3;
>
> C2    C3      COUNT(*)
> ----- ----- ----------
> N     Y           1994
> Y     N          71482
>
> exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR
> ALL COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2')
>
> *-- c2 = 'N' and c3 = 'N' returns cero rows*
> select * from t20 where c2 = 'N' and c3 = 'N';
>
> no rows selected
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 287249393
>
> --------------------------------------------------------------------------------------
> | Id  | Operation                   | Name   | Rows  | Bytes | Cost
> (%CPU)| Time     |
> --------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT            |        |  2121 | 19089 |    11  
> (0)| 00:00:01 |
> |   1 |  TABLE ACCESS BY INDEX ROWID| T20    |  2121 | 19089 |    11  
> (0)| 00:00:01 |
> |*  2 |   INDEX RANGE SCAN          | T20_I1 |  2121 |       |     5  
> (0)| 00:00:01 |
> --------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
>    2 - access("C2"='N' AND "C3"='N')
>
> -- create extended statistics for C2 and C3
>
> select
> dbms_stats.create_extended_stats('LSC', 'T20','(C2, C3)')
> from dual;
>
> exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR
> ALL COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2 FOR
> COLUMNS (C2, C3) SIZE 2')
>
>
> *-- c2 = 'N' and c3 = 'N' returns cero rows but even with extended
> statistics it estimates 915 rows*
>
> select * from t20 where c2 = 'N' and c3 = 'N';
>
> no rows selected
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 287249393
>
> --------------------------------------------------------------------------------------
> | Id  | Operation                   | Name   | Rows  | Bytes | Cost
> (%CPU)| Time     |
> --------------------------------------------------------------------------------------
> |   0 | SELECT STATEMENT            |        |   915 | 8235 |     5  
> (0)| 00:00:01 |
> |   1 |  TABLE ACCESS BY INDEX ROWID| T20    |   915 | 8235 |     5  
> (0)| 00:00:01 |
> |*  2 |   INDEX RANGE SCAN          | T20_I1 |   915 |       |     2  
> (0)| 00:00:01 |
> --------------------------------------------------------------------------------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
>    2 - access("C2"='N' AND "C3"='N')
>
>

-- 
Mladen Gogala Database Consultant Tel: (347) 321-1217

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Aug 01 2018 - 21:16:13 CEST

Original text of this message