Re: Performance impact of MONITORING and GATHER

From: Stephen Barr <ascaroth969_at_yahoo.co.uk>
Date: Wed, 16 Mar 2005 11:25:44 +0000 (GMT)
Message-ID: <20050316112545.4092.qmail@web86902.mail.ukl.yahoo.com>

We've actually gone down the route of nor gathering statistics at all on the majority of our tables once they have been "baselined".

This is a DSS environment and our structures mainly fall in to three categories -

Partitioned, time series
Non-partitioned, time series
Non-partitioned, non-time series

For the partitioned tables we do analyze the most recent partition on a nightly basis post ETL. The global statistics on the table are updated through a package we've put together to update specific stats such as num_rows, num_blks, null_cnt, max/min values, etc.

For the other tables we have a few different requirements, but the process is similar - we manually update the stats using this package and the data from the ETL process.

All of our stats are also under change control, so we backup the statistics each night, just in case, and keep a months worth of statistics off-line in a change management program.

Histograms are collected and refreshed only when required. The profile of our skewed data doesn't usually change, so we create the histogram once and then leave it alone until it is identified as a problem through pro-active monitoring.

Our analyze run now completes in ~10 minutes on our 4TB DSS. I suppose the main question may be WHY take this approach? Well, we have a small subset of our data being queried by the majority of our users...this is recent data (last 6 months). However, we have 8 years of data in the warehouse. Most of the data was migrated from an older source system and is therefore has a very different profile from the data being loaded from the new source systems.

Essentially, we need to skew the statistics towards the data we will actually be querying...something we couldn't do with traditional gather_stale options. We completed this during a "baselining" period.

Of course there are disadvantages to this approach - probably a little extra maintenace, much more proactive monitoring is required and the statistics will undoubtedly slowly get out of synch and require manual intervention...but the benefits in our situation I believe out-weigh the disadvantages.

I would be interested to hear from anyone else who has tried this approach - or anyone who thoroughly disagrees!

Many thanks,

Steve.

Niall Litchfield <niall.litchfield_at_gmail.com> wrote:
> On Wed, 16 Mar 2005 12:32:55 +1100, Leng Kaing
> <Leng.Kaing_at_hsntech.com> wrote:
> > Hi Mladen,
> >
> > Sorry, I don't quite understand. What do you mean
> when you say "It also
> > takes away any purpose from gathering statistics
> based on STALE status."
> >
> > I was going to turn on schema MONITORING and then
> use GATHER_STALE to
> > speed up the analyse command. Is this not the
> correct approach?
>
> I think it rather depends what your goal is.
> Certainly using
> GATHER_STALE will speed up the length of time stats
> gathering takes,
> I'm not convinced that this is a good goal.
>
> I think that what Mladen was on about was that the
> idea of monitoring
> and GATHER_STALE is that you only need to collect
> statistics on
> objects that have been subject to a significant
> amount of DML. The
> trouble with not subjecting the modification
> tracking to be part of
> the transaction is that you can radically overstate
> the amount of DML
> that has actually occurred leading to unnecessary
> analysis.
>
> My view is that whilst gather_stale is a step in the
> right direction,
> it is predicated upon a bad assumption (or rather a
> rule of thumb)
> that changes of 10% are significant for all tables
> and changes of less
> than 10% aren't. If you could do ALTER TABLE T
> MONITORING THRESHOLD x%
> then I'd be more inclined to go along with it.
>
> Incidentally I'm wondering how a strategy of never
> gathering stats at
> all, but relying on dynamic sampling at a high level
> (probably 4)
> would work, my guess is it would break down in
> high-load systems, but
> it might help more normal systems with specific
> problem sqls because
> of the differences between dbms_stats and dynamic
> sampling. If anyone
> has looked at this idea it would be interesting to
> know what you
> found.
>
>
> --
> Niall Litchfield
> Oracle DBA
> http://www.niall.litchfield.dial.pipex.com
> --
> http://www.freelists.org/webpage/oracle-l
>

Send instant messages to your online friends http://uk.messenger.yahoo.com

--
http://www.freelists.org/webpage/oracle-l

Received on Wed Mar 16 2005 - 06:29:34 CST

Re: Performance impact of MONITORING and GATHER_STALE