Re: Optimizer estimated cardinality very low

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Wed, 28 Oct 2020 13:02:57 +0000
Message-ID: <CAGtsp8nSrOE8d0RDK6Ry1zE-kr9Jzfdod2zAd-N9xoKEWKNsBA_at_mail.gmail.com>



I've got to say that I think there's a logic error in the way this whole thing works.

A very small change in statistics shouldn't result in a massive change in estimated outcome. (This is different from the nlj/hj switch which can look as if it's an analogy, but the nlj/hj switch happens when the statistics say that both should cost the same).

For this existence to go (effectively) from 100% to 0% on the basis of a tiny difference really does suggest that someone hasn't thought through the implications (or, possibly, two independent pieces of code have been hijacked as the A/B options without any consideration of consistency).

Regards
Jonathan Lewis

On Wed, 28 Oct 2020 at 12:40, Jonathan Lewis <jlewisoracle_at_gmail.com> wrote:

>
>
> I've modelled this problem on a pair of non-partitioned tables. (Since the
> join is exactly across an exact matching pair of partitions I think this is
> probably valid).
> I've got 100,000 rows in claim_header and 110,000 rows in claim_line.
> The actual number of distinct trnsct_cntl_nbr in both tables is exactly
> 100,000, with 4 dates
> The basic stats collection for tables reports 4 dates and 99,848 distinct
> values for trnsctl_ctl_nbr.
> Any set of tests below starts with no indexes, no column groups and no
> histograms
>
> a) Baseline test - this gets a "suitable" cardinality estimate (100K)
> a') create a histogram on either (or both) date columns - still gets the
> "proper" cardinality
>
>
> b) Adding indexes on (trnsctl_cntl_nbr, clm_hdr_pd_dt) at both ends I get
> cardinality 1.
> b') Adding hdr_sts_cd to either of the indexes gets "proper" cardinality
> b'') With 2-column indexes adding a histogram on date to either table gets
> cardinality 1
>
>
> c) Add a column group to just one table, ndv = 98,832, in both cases
> proper cardinality
> c') Add a column group to BOTH tables at once get cardinality 1
> c'') Add histogram on date to ONE table (when both have column groups) get
> cardinality 1
> c''') Add histograms on date to BOTH tables (when both have column groups)
> get proper cardinality
> c'''') As for c''', but increase the NDV (in stages) on the claim_header
> column groups - get cardinality 1 at 99,001
> c''''') As for c''', but increase the NDV (in stages) on the claim_line
> column groups - get cardinality 1 at 108,901
>
> From c''' and c''''' it appears that there's a break point for the sanity
> check at 99% of the rows in the table.
> Trying to untangle this to get something useful is going to be a pain -
> and it's still not a complete set of tests.
>
>
> Question - For the two tables, for the two columns in the join/existence,
> what indexes, column groups and histograms do you have.
>
>
> Regards
> Jonathan Lewis
>
>
>
>
>
>
>
>
> On Tue, 27 Oct 2020 at 13:05, Noveljic Nenad <nenad.noveljic_at_vontobel.com>
> wrote:
>
>> Hi Daniel,
>>
>>
>>
>> *“NUM_DISTINCT for the column_group in July are lower than
>> ORDR_RQST_NUMBER instead of being equal or higher.”*
>>
>>
>>
>> This is exactly the reason that sanity check kicks in.
>>
>>
>>
>>
>>
>> *“ This happens across other partitions but the estimation doesn't seem
>> impacted by this alone.”*
>>
>>
>>
>> There is some tolerance range. Set num_distinct for SYS_STU4S$4GK1UDMC2M8$MI#W6QS_
>> of the June partition to a sufficiently low value for sanity check to kick
>> in. Then you should get sel = 1.0 and a “good” cardinality estimate for
>> your case.
>>
>>
>>
>>
>>
>> “*but the estimation doesn't seem impacted by this alone.”*
>>
>>
>>
>> Histogram on ORDR_RQST_DATE is apparently another prerequisite for the
>> sanity check.
>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 28 2020 - 14:02:57 CET

Original text of this message