Re: improve cardinality estimate

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Thu, 18 Mar 2021 14:30:57 +0000
Message-ID: <CAGtsp8=qx0+hmcq3_vG_qmCKoaJ1fizoiBbfEUUD0xhk=9c6pg_at_mail.gmail.com>



Are all these dates date-only, or is there a time component allowed ? Is this one partition per day.

From what you've said, and assuming YES as the answer to the questions above, creating a virtual column on (order_processing_date - business_processing_date) looks like an obvious thing to do. And then create a (frequency) histogram on the virtual column so that the optimizer can see there are (usually?) only 3 distinct values and the one you're interested in is rare.

Regards
Jonathan Lewis

On Thu, 18 Mar 2021 at 13:52, Laurentiu Oprea <laurentiu.oprea06_at_gmail.com> wrote:

> Regarding parallel I`m relaying my affirmation of the fact that I tested
> with hint /*+ dynamic_sampling(8) */ (the DS level with parallel hint) and
> gives the same estimate.
>
> Yes there is a relation order_procesing_date between
> business_procesing_date -1 and business_procesing_date + 1 and there is no
> fooling like null values or year 1900/9999
>
> The pattern is constant , grouping by partition_date for the last 7 days
> it ranges from 0 to 130k (out of a total of 24M) rows filtered by
> condition order_procesing_date > = business_procesing_date .
>
> Also the columns are not virtual columns (strange expressions)
>
> În joi, 18 mar. 2021 la 15:14, Jonathan Lewis <jlewisoracle_at_gmail.com> a
> scris:
>
>> The change for parallel processing is one I'd have to think about - it
>> may be something to do with the way the predicates are rewritten for the
>> parallel query slaves - so I'll postpone thinking about that bit.
>>
>> The questions to ask first of all are:
>> a) what's the business logic. is there an approximate relationship
>> between order_processing_date and business_processing date (e.g. the former
>> is usually about 3 days before the latter).
>> b) has the optimizer been fooled by a "silly null" effect - like "not yet
>> processed" = "31-Dec-9999"
>> c) how much does the pattern change over time - e.g. if the partition
>> date is sysdate - 1 is this a partition where a lot of rows have not yet
>> had business_processing_date set, but if partition date = sysdate - 7 then
>> maybe lots of rows now have a business_processing_date set.
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>> On Thu, 18 Mar 2021 at 12:26, Laurentiu Oprea <
>> laurentiu.oprea06_at_gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I recently saw a query with a wrong execution plan and looks to be
>>> caused by an overestimate of a particular where clause. Ex:
>>>
>>> select
>>> count(*)
>>> from
>>> orders
>>> where
>>> partition_date = sysdate-1 and
>>> order_procesing_date > = business_procesing_date
>>>
>>> the count itself is = 42K
>>> the count itself without second condition ~ 24M
>>> the cardinality estimate ~ 24M
>>> the cardinality estimate if I add parallel hint: 1M (this is an
>>> interesting one and looks to be related with the fact that parallel queries
>>> get DS level 8)
>>> the cardinality estimate if I rewrite the condition :
>>> order_procesing_date - business_procesing_date >= interval '0' minute
>>> :1.4M
>>>
>>> I haven't tried yet but I suspect that if I create extended stats on
>>> expression ( order_procesing_date - business_procesing_date) can get me
>>> better results
>>>
>>> The question is what is the logic behind beter cardinality estimates if
>>> I rewrite the where clause?
>>> What will be the best approach for this query?
>>>
>>> Thank you.
>>>
>>

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Mar 18 2021 - 15:30:57 CET

Original text of this message