Re: Periodic Stats Collection -- CBO Stats Myth?

From: William Robertson <william_at_williamrobertson.net>
Date: Wed, 1 Oct 2008 11:54:22 +0100
Message-ID: <a3edf8ae0810010354lb48f27cif78dd3019940884a@mail.gmail.com>


I've found that dates are more likely to present a problem because sequentially generated keys are usually highly selective anyway. Say a financial application loads three million trades a day. If the stats are not updated the optimizer will think a given TRADE_ID has 0 occurrences (since it's after the column's high endpoint) and round that up to 1, which may well be correct anyway. However when you run a query for this week's trades on a Friday and it thinks there are none when in fact there are 15 million then you have a much bigger problem. That's my experience anyway.

2008/10/1 Niall Litchfield <niall.litchfield_at_gmail.com>

> dates and sequence driven key columns both exhibit this behaviour.
>
> On 30/09/2008, William Robertson <william_at_williamrobertson.net> wrote:
> > I'm not sure how using or not using histograms makes any fundamental
> > difference to your approach.
> >
> > One thing that comes to mind is the use of date predicates. If the
> > optimizer knows the highest sale date you have (for example) is 30 Sept
> > 2008, you are fine today and maybe some of next week, but in a few
> > months' time any queries for "last week's sales" may get unrealistically
> > low cardinality estimates leading to inappropriate index/nested loop
> > access paths, Cartesian joins and so on.
> >
> > If you use partitioning and you have date columns as
> > partition/subpartition keys you could have even more fun in store.
> >
> >
> > -----Original message-----
> > From: Sriram Kumar
> > Date: 30/9/08 22:41
> >> Hi Folks,
> >>
> >> We had migrated a OLTP/Batch hybrid 9i RBO application to 10g
> >> and after rounds of tuning the application now is working at its best.
> >> We had experimented with various sampling rates for statistics for
> >> various tables and now I feel we are in a optimal point of performance
> >> where CBO is picking up the right Indexes.
> >>
> >> 1) We do not use histograms
> >> 2) From here on the application volumes are bound increase
> proportionately
> >> 3) No new tables would be added in production.
> >> 4) As of now, We have locked the schema stats for the application
> >> schemas. The default scheduler job collects stats for all other schemas
> >>
> >> Since the increase in volume of data is quite proportional to the
> >> existing data, what would be the compelling reason to periodically
> >> recollect statistics for the tables that already have good execution
> >> plans?
> >>
> >> e.g lets say that there is a table with million records and with few
> >> indexes. We have collected the stats for this table and indexes with
> >> million records and all the access paths to this table has been
> >> validated that the CBO is picking up the correct optimal index in
> >> every case.
> >>
> >> Now the volume of the table increases to 2 million and from
> >> application point of view, the access paths that were valid for 1
> >> million would be the valid for 2 million as well.
> >>
> >> Given this scenario, is it required to collect statistics for 2
> >> million as well?. I opine that it would not be required but some of my
> >> colleagues feel that we should collect statistics periodically but I
> >> am not able to get a valid reason for periodically recollecting the
> >> stats?. Is this one of the myths?
> >>
> >> I would agree to periodically recollect stats if we are using
> >> histograms but we are not using histograms. Any other reason that
> >> would need a periodic statistic gathering?
>

--
http://www.freelists.org/webpage/oracle-l
Received on Wed Oct 01 2008 - 05:54:22 CDT

Original text of this message