RE: DBMS_STATS [resend chomped version]

From: Mercadante, Thomas F <thomas.mercadante_at_labor.state.ny.us>
Date: Mon, 21 Jun 2004 07:53:31 -0400
Message-ID: <DE8A21F8F1D0254EA4A9214D52AB2FEDAD5C4A@exchsen0a1ma>

I use the AUTO variable in all my stats gathering processes. Call me naive, but if the Oracle docs says to use AUTO, then I will use it. I get the feeling that AUTO means complete, because the "sample size" variable in the USER_TABLES view always has the total number of rows. Maybe a future release will be different, but I'm happy with how it's working.

Tom Mercadante
Oracle Certified Professional

-----Original Message-----
From: Niall Litchfield [mailto:niall.litchfield_at_gmail.com] Sent: Monday, June 21, 2004 6:58 AM
To: oracle-l_at_freelists.org
Subject: Re: DBMS_STATS [resend chomped version]

Comments in-line and thinking aloud as well

On Sun, 20 Jun 2004 21:20:41 -0500, DENNIS WILLIAMS <dwilliams_at_lifetouch.com> wrote:
>
> Jonathan
> Very provocative ideas.
> 1. I'm assuming you are referring to the AUTO sample size when you say
> don't let the database work out a sample size? Does anybody know how
> that works? I hate to trust something if I have no idea how it works.

I haven't done this (I tend to get a blind spot with the word AUTO I like automating things, but only if I can do it manually :) )

> 2. The study of statistics has a branch related to sample size.
> Pollsters use that to figure out how many random samples can provide a
> valid estimate. I'm searching for my old college statistics textbook
> right now.

There are a number of factors that affect the ideal random sample size, 1 is the size of the population, 1 is the confidence interval that you want (the +- bit that says table SALES_HISTORY has 7million rows <b>+-</b> 5% ). and a thrid is how confident you wish to be about the accuracy of your stats (typically 95% or 99% accurate).

> 3. I like your idea of creating a table to hold the sample size for
> each table in the schema. Other columns could hold the reanalyze
> interval, date of next analyze, etc. This would ensure each table was
> analyzed appropriately.

I like the idea as well, with the following caveats.

I'd be more than a bit worried that it might turn out like Mike Hordilla's package for index reorgs that was on OTN a while back to be a highly complicated, flexible and efficient way of doing something that might well be best left undone :(. Oh and it might suffer in a similar way from a lack of understanding by end-users as to what it was doing and why.

For example what my comments in 1 above mean - if correct and its a while since I was at college - is that the package would need to answer the following question about each analyzed segment. Do the current stats meet my accuracy requirements? Are the stats 'Good Enough' in other words. This is because we wouldn't wish to calculate an 'interval' for running stats for each table, but calculate the degree of confidence that we have that the stats are 'good enough'. Unfortunately there is rather a 'gotcha' with this approach (unless there is a smart stats way around this) which is that in order to answer the 'are my stats' good enough?' question we need to know how large our population is (to at least a good degree of accuracy) - in order to to this we need to collect stats :(. We can of course project population deltas for any given segment from past history, but frequently the past is no good guide to the future - in particular changes in busines process, application software or economic environment can and do affect the rate of change of data in our commercial databases.

I think what I am saying that if we buy the argument that one should not sample every x days (hours, years), because we care about the appropriateness of the stats and not their 'freshness' then building an automated system to gather appropriate stats becomes a non-trivial task.

Some googling discovered http://www.surveysystem.com/sscalc.htm which gives a good coverage of the relevant stats.

> 4. My boss heard some Oracle expert say that 30% was the best sample
> size. Does anyone have an idea of the source?

someone who wasn't a statistician :)

-- 
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request_at_freelists.org put
'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request_at_freelists.org
put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------

Received on Mon Jun 21 2004 - 06:50:43 CDT