Re: Improving query performance further

From: Jonathan Lewis <jlewisoracle_at_gmail.com>
Date: Mon, 17 Oct 2022 11:07:58 +0100
Message-ID: <CAGtsp8ktZGvAxZYojoKvC7Cu7JZpsJ8vXQYbdBBzKAem4AeAgg_at_mail.gmail.com>



I should have predicted that problem with bitmap AND before you posted the result. Given the pattern of the data the indexes were going to be roughly the same size and the query would have to access a total number of index blocks that was roughly the same as the size of one complete index - and do a big CPU intensive merge. I forgot to allow for the large number of block accesses and was only thinking about the CPU cost of the merge.

Essentially it's very similar to optimizing "see if there's a town anywhere south and east of here" - but I think you said you'd already tried modelling it through spatial techniques and couldn't get it to run any faster.

Regards
Jonathan Lewis

On Sun, 16 Oct 2022 at 20:14, yudhi s <learnerdatabase99_at_gmail.com> wrote:

> Yes its truncate +load table with no Update and delete happening on it. We
> tried it with a sample table and similar data pattern on a
> lower environment. The bitmap index with pct_free "0" performance
> appears to be worse compared to the composite b-tree index on (part_col,
> min_val,max_val). Tried both by using index_join and without it also.
>
> INDEX_TYPE LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS
> NORMAL 1116 150001 4929
> 150001
> BITMAP 754 126350
> 149999 149999
> BITMAP 754 126352
> 150001 150001
>
> Below is the sql monitor comparison of both the plans.
>
> https://gist.github.com/databasetech0073/b576a301943ecf6f04bc2ab3249e316a
>
>
>
> On Sun, Oct 16, 2022 at 6:08 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
> wrote:
>
>>
>> Another idea that occurs to me following on from your comment about the
>> daily truncate and reload.
>>
>> Do you update the min_val or max_val columns of the table once you've
>> loaded the data, do you insert or delete rows. (Maybe you've answer the
>> last question already, but I haven't gone back to re-read all the posts).
>>
>> If the key components of the data are effectively read-only as soon as
>> you've loaded it then you could create two (local) bitmap indexes, one on
>> (min_val) one on (max_val). Since you average about 2 rows per value the
>> size of the indexes would be about the same whether they're B-tree
>> (compreessed) or bitmap, but the bitmap might be a little smaller.
>>
>> With those indexes in place the reduction in block visits for range scans
>> might be more than enough to offset the CPU time needed to handle the
>> bitmap merges and OR's.
>>
>> (The indexes should also be built with pctfree zero if they don't change
>> during the day),
>>
>> Regards
>> Jonathan Lewis
>>
>>
>>
>>
>>
>>
>> On Sun, 16 Oct 2022 at 10:41, yudhi s <learnerdatabase99_at_gmail.com>
>> wrote:
>>
>>> Thank you so much for the suggestions.
>>> Not sure how it will impact histograms but yes, as Lok mentioned, all
>>> the data stored in this column min_val and max_val appears to be Number
>>> only and the bind values i saw were also not holding any characters in
>>> them, but the input bind coming from application to the query , is of data
>>> type varchar2 and also the datatype of the columns are defined varchar2 in
>>> the table. Also, checked the input bind values captured and the data
>>> pattern, almost 60% of the input binds are closer to min value and ~40% are
>>> closer to max value but again most of the binds are still resulting >50K
>>> rows in either of those filters.
>>>
>>> And if i get it correct, individual histograms on these two columns
>>> should suffice (e.g. 'for all columns size repeat, for columns size
>>> 254/2048 min_val, max_val') rather than a column group histogram for the
>>> optimizer. With regards to caching , as here the majority of io/time is
>>> spent only while accessing the index partition only, so putting the table
>>> in cache may not help much but index should. Please correct me if my
>>> understanding is wrong. Also I was looking for a quick fix as to hint it
>>> some way to cache the objects , but it looks like we have cache hints
>>> available for tables but not for Indexes or index partitions. It has to be
>>> put in "keep" cache which is not possible through any hints etc.
>>>
>>> On Sat, Oct 15, 2022 at 2:04 PM Jonathan Lewis <jlewisoracle_at_gmail.com>
>>> wrote:
>>>
>>>>
>>>> I did point out that the strategy would be fairly fragile, but there
>>>> are a couple of points in its favour if the optimizer decides to use
>>>> adaptive cursor sharing.
>>>>
>>>> a) This is 19c so the histogram could have 2,048 buckets (though the
>>>> default of 254 might be sufficient anyway)
>>>>
>>>> b) When the low_value and high_value both look like numbers and the
>>>> input is numeric the optimizer behaves as it's dealing with numbers and
>>>> that will help with the arithmetic. I've got a note about that on my blog,
>>>> I think, but can't find it at present.
>>>>
>>>> c) Histograms model skew - and in this case the skew is in the size of
>>>> range for a given number of rows; histograms don't have to be about
>>>> capturing "rows per value" (see:
>>>> https://jonathanlewis.wordpress.com/2016/09/26/pk-histogram/ ). In
>>>> this case I think the OP has indicated that the data is more dense round
>>>> the extreme values and sparse in the middle.
>>>>
>>>>
>>>> Regards
>>>> Jonathan Lewis
>>>>
>>>>
>>>> On Fri, 14 Oct 2022 at 22:33, Lok P <loknath.73_at_gmail.com> wrote:
>>>>
>>>>> Also Jonathan, as the OP mentioned there exists ~120K distinct values
>>>>> in both the min and max columns and that to those are varchar2 type
>>>>> holding numbers. So the histograms on each of those with max ~254 buckets
>>>>> may not be able to give 100% correct estimation for these ranges prdicates
>>>>> for all the binds and thus endup impacting the query response.
>>>>>
>>>>>
>>>>>

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Oct 17 2022 - 12:07:58 CEST

Original text of this message