Re: extended statistics and non-existent combined values

From: Ls Cheng <exriscer_at_gmail.com>
Date: Thu, 2 Aug 2018 18:22:56 +0200
Message-ID: <CAJ2-Qb99jJ_br6ZEK-D601=h5zmPM4M_e4+h51oMeTfcsuu8wQ_at_mail.gmail.com>

I think I will opt the solution of adding a dummy row which satisfies N-N condition.

Thanks all

On Thu, Aug 2, 2018 at 6:08 PM, Mark W. Farnham <mwf_at_rsiz.com> wrote:

> Of course the test for more buckets disambiguating is simple: Pick some
> power of 2 like 512. If that gets you the correct estimate, cut it in half
> until it goes wrong. Since Y+Y is another possible error search, you
> probably want to test the estimates for that also.
>
> If there is no joy just using more buckets, then insert a dummy error row
> and do enough buckets so you find 1 each for N,N and Y,Y as estimates.
>
> Or it could work differently than I think it should. JL might know in his
> head.
>
> mwf
>
> -----Original Message-----
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> On Behalf Of Mark W. Farnham
> Sent: Thursday, August 02, 2018 11:20 AM
> To: 'Jonathan Lewis'; exriscer_at_gmail.com; 'Oracle Mailinglist'
> Subject: RE: extended statistics and non-existent combined values
>
> Now if LS were to add those constraints, then the need for the N and N
> query
> to look for errors would evaporate, the ultimate optimization of not doing
> it at all.
>
> I *think* the more buckets issue also might have legs to disambiguate the
> *possibility* of N and N rows. With only 2 buckets, they get populated
> (presumably) one of the buckets with rows presumably is consulted for the
> quantity of the non-existant row combinations.
>
> But I really like the constraint idea, even if the plan generation idea
> doesn't yet work, for the no need to run result. I wonder if the aggregate
> work to maintain the constraint exceeds even a bad plan occasional check
> for
> errors, and also whether adding the constraint would require "n" points in
> the application of change to absorb the (correct) error produced by
> attempts
> to insert N+N rows or update to the N+N result. (Or Y+Y, similarly, left
> out
> for brevity.)
>
> mwf
>
> -----Original Message-----
> From: Jonathan Lewis [mailto:jonathan_at_jlcomp.demon.co.uk]
> Sent: Wednesday, August 01, 2018 4:00 PM
> To: exriscer_at_gmail.com; 'Oracle Mailinglist'; mwf_at_rsiz.com
> Subject: Re: extended statistics and non-existent combined values
>
>
> Mark,
>
> Interesting point - if it's legal to have the constraints:
>
> alter table t1 modify c2 not null;
> alter table t1 modify c3 not null;
> alter table t1 add constraint c_check check((c2 = 'Y' and c3 = 'N') or (c2
> =
> 'N' and c3 = 'Y'));
>
> Then you might hope that the optimizer would take
>
> explain plan for
> select * from t1 where c2 = 'N' and c3 = 'N'
> ;
>
> select * from t1 where c2 = 'N' and c3 = 'N' and
> ((c2 = 'Y' and c3 = 'N') or (c2 = 'N' and c3 = 'Y')) ;
>
> And recognise the contradiction and produce the plan:
> ------------------------------------------------------------
> ---------------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
> ------------------------------------------------------------
> ---------------
> | 0 | SELECT STATEMENT | | 1 | 4 | 0 (0)| |
> |* 1 | FILTER | | | | | |
> |* 2 | TABLE ACCESS FULL| T1 | 18369 | 73476 | 21 (20)| 00:00:01 |
> ------------------------------------------------------------
> ---------------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
> 1 - filter(NULL IS NOT NULL)
> 2 - filter("C2"='N' AND "C3"='N')
>
>
> i.e. not actually do anything at all.
> Unfortunately this doesn't work - even in 18.1 (LiveSQL).
>
> Regards
> Jonathan Lewis
>
> ________________________________________
> From: oracle-l-bounce_at_freelists.org <oracle-l-bounce_at_freelists.org> on
> behalf of Mark W. Farnham <mwf_at_rsiz.com>
> Sent: 31 July 2018 17:39:05
> To: exriscer_at_gmail.com; 'Oracle Mailinglist'
> Subject: RE: extended statistics and non-existent combined values
>
> The sledge hammer approach is to select count(*) from t20 where c2 = 'N'
> and
> c3 = 'N'; and then generate your query with a cardinality hint.
>
> You might want to hint the use of the index for the count(*) query.
>
> Of course you already know this is the sort of thing that the CBO is
> supposed to get right routinely and I believe you've already done
> everything
> correctly to give it the best possible chance. Sigh.
>
> Now I do take it from your one is Y, one is N values that this is actually
> an either or in your database. IF I'm correct about that and C2=Y literally
> implies that C3=N, they you might want to code it up that way, leaving C3
> out of the database entirely and out of queries as a predicate. If someone
> wants it instantiated for them in a query, I supposed you could make C3 a
> virtual column. I'm presuming a *lot* for that to be true, and the CBO
> should be getting this right as you've done it. (C2 could actually be N for
> N and null for Y if you really want to get the index as small as possible.)
>
> mwf
>
> From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org]
> On Behalf Of Ls Cheng
> Sent: Tuesday, July 31, 2018 12:03 PM
> To: Oracle Mailinglist
> Subject: extended statistics and non-existent combined values
>
> Hi
> I have a problematic query which is not doing index scan.
>
> I tried to fix the issue using extended statistics but without success. I
> have following test case, the idea is the estimated cardinality combining
> column C2 and C3 should return as few rows as possible because the
> predicate
> C2 = N and C3 = N does not return any rows but with extended statistics it
> is actually estimating 915 rows. Anyone can think of an workaround?
>
> C2 has only two distinct values, Y and N
> C3 has only two distinct values, Y and N
>
> C2 and C3 with both values N returns no rows.
>
> Thanks
>
>
>
>
> create table t20
> (
> c1 number,
> c2 varchar2(5),
> c3 varchar2(5)
> );
>
> insert into t20
> select rownum,
> case when object_type = 'TABLE' then 'N' else 'Y' end c2,
> case when object_type = 'TABLE' then 'Y' else 'N' end c3
> from dba_objects;
>
> create index t20_i1 on t20(c2, c3);
>
> select c2, c3, count(*) from t20 group by c2, c3;
>
> C2 C3 COUNT(*)
> ----- ----- ----------
> N Y 1994
> Y N 71482
>
> exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR ALL
> COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2')
>
> -- c2 = 'N' and c3 = 'N' returns cero rows select * from t20 where c2 = 'N'
> and c3 = 'N';
>
> no rows selected
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 287249393
>
> ------------------------------------------------------------
> ----------------
> ----------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
> Time |
> ------------------------------------------------------------
> ----------------
> ----------
> | 0 | SELECT STATEMENT | | 2121 | 19089 | 11 (0)|
> 00:00:01 |
> | 1 | TABLE ACCESS BY INDEX ROWID| T20 | 2121 | 19089 | 11 (0)|
> 00:00:01 |
> |* 2 | INDEX RANGE SCAN | T20_I1 | 2121 | | 5 (0)|
> 00:00:01 |
> ------------------------------------------------------------
> ----------------
> ----------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 2 - access("C2"='N' AND "C3"='N')
> -- create extended statistics for C2 and C3 select
> dbms_stats.create_extended_stats('LSC', 'T20','(C2, C3)') from dual;
>
> exec dbms_stats.gather_table_stats('LSC', 'T20', method_opt => 'FOR ALL
> COLUMNS SIZE 1, FOR COLUMNS C2 SIZE 2, FOR COLUMNS C3 SIZE 2 FOR COLUMNS
> (C2, C3) SIZE 2')
>
>
> -- c2 = 'N' and c3 = 'N' returns cero rows but even with extended
> statistics
> it estimates 915 rows select * from t20 where c2 = 'N' and c3 = 'N';
>
> no rows selected
>
> Execution Plan
> ----------------------------------------------------------
> Plan hash value: 287249393
>
> ------------------------------------------------------------
> ----------------
> ----------
> | Id | Operation | Name | Rows | Bytes | Cost (%CPU)|
> Time |
> ------------------------------------------------------------
> ----------------
> ----------
> | 0 | SELECT STATEMENT | | 915 | 8235 | 5 (0)|
> 00:00:01 |
> | 1 | TABLE ACCESS BY INDEX ROWID| T20 | 915 | 8235 | 5 (0)|
> 00:00:01 |
> |* 2 | INDEX RANGE SCAN | T20_I1 | 915 | | 2 (0)|
> 00:00:01 |
> ------------------------------------------------------------
> ----------------
> ----------
>
> Predicate Information (identified by operation id):
> ---------------------------------------------------
>
> 2 - access("C2"='N' AND "C3"='N')
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Aug 02 2018 - 18:22:56 CEST

This message: [ Message body ]
Next message: Chris Taylor: "Timezone on Exadata set to UTC - How about a login trigger to set session timezone?"
Previous message: niall.litchfield_at_gmail.com: "Re: state vs status"
In reply to Mark W. Farnham: "RE: extended statistics and non-existent combined values"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message