Re: Periodic Stats Collection -- CBO Stats Myth?
Date: Wed, 1 Oct 2008 09:48:21 -0400
Thanks for your valuable inputs.
I totally agree with you william on the potential impact of outdated low_value and high_value on date columns and columns that are generated using sequences.
I have a follow up question on how this data is populated. If I had done an estimate percent say 10%, how does CBO guess the high value?.
On Wed, Oct 1, 2008 at 6:54 AM, William Robertson < william_at_williamrobertson.net> wrote:
> I've found that dates are more likely to present a problem because
> sequentially generated keys are usually highly selective anyway. Say a
> financial application loads three million trades a day. If the stats are not
> updated the optimizer will think a given TRADE_ID has 0 occurrences (since
> it's after the column's high endpoint) and round that up to 1, which may
> well be correct anyway. However when you run a query for this week's trades
> on a Friday and it thinks there are none when in fact there are 15 million
> then you have a much bigger problem. That's my experience anyway.
> 2008/10/1 Niall Litchfield <niall.litchfield_at_gmail.com>
>> dates and sequence driven key columns both exhibit this behaviour.
>> On 30/09/2008, William Robertson <william_at_williamrobertson.net> wrote:
>> > I'm not sure how using or not using histograms makes any fundamental
>> > difference to your approach.
>> > One thing that comes to mind is the use of date predicates. If the
>> > optimizer knows the highest sale date you have (for example) is 30 Sept
>> > 2008, you are fine today and maybe some of next week, but in a few
>> > months' time any queries for "last week's sales" may get unrealistically
>> > low cardinality estimates leading to inappropriate index/nested loop
>> > access paths, Cartesian joins and so on.
>> > If you use partitioning and you have date columns as
>> > partition/subpartition keys you could have even more fun in store.
>> > -----Original message-----
>> > From: Sriram Kumar
>> > Date: 30/9/08 22:41
>> >> Hi Folks,
>> >> We had migrated a OLTP/Batch hybrid 9i RBO application to 10g
>> >> and after rounds of tuning the application now is working at its best.
>> >> We had experimented with various sampling rates for statistics for
>> >> various tables and now I feel we are in a optimal point of performance
>> >> where CBO is picking up the right Indexes.
>> >> 1) We do not use histograms
>> >> 2) From here on the application volumes are bound increase
>> >> 3) No new tables would be added in production.
>> >> 4) As of now, We have locked the schema stats for the application
>> >> schemas. The default scheduler job collects stats for all other schemas
>> >> Since the increase in volume of data is quite proportional to the
>> >> existing data, what would be the compelling reason to periodically
>> >> recollect statistics for the tables that already have good execution
>> >> plans?
>> >> e.g lets say that there is a table with million records and with few
>> >> indexes. We have collected the stats for this table and indexes with
>> >> million records and all the access paths to this table has been
>> >> validated that the CBO is picking up the correct optimal index in
>> >> every case.
>> >> Now the volume of the table increases to 2 million and from
>> >> application point of view, the access paths that were valid for 1
>> >> million would be the valid for 2 million as well.
>> >> Given this scenario, is it required to collect statistics for 2
>> >> million as well?. I opine that it would not be required but some of my
>> >> colleagues feel that we should collect statistics periodically but I
>> >> am not able to get a valid reason for periodically recollecting the
>> >> stats?. Is this one of the myths?
>> >> I would agree to periodically recollect stats if we are using
>> >> histograms but we are not using histograms. Any other reason that
>> >> would need a periodic statistic gathering?