Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Re: DBMS_STATS [resend chomped version]

Re: DBMS_STATS [resend chomped version]

From: Niall Litchfield <niall.litchfield_at_gmail.com>
Date: Mon, 21 Jun 2004 11:57:52 +0100
Message-ID: <7765c89704062103574c288c77@mail.gmail.com>


Comments in-line and thinking aloud as well

On Sun, 20 Jun 2004 21:20:41 -0500, DENNIS WILLIAMS <dwilliams_at_lifetouch.com> wrote:
>
> Jonathan
> Very provocative ideas.
> 1. I'm assuming you are referring to the AUTO sample size when you say don't
> let the database work out a sample size? Does anybody know how that works? I
> hate to trust something if I have no idea how it works.

I haven't done this (I tend to get a blind spot with the word AUTO I like automating things, but only if I can do it manually :) )

> 2. The study of statistics has a branch related to sample size. Pollsters
> use that to figure out how many random samples can provide a valid estimate.
> I'm searching for my old college statistics textbook right now.

There are a number of factors that affect the ideal random sample size, 1 is the size of the population, 1 is the confidence interval that you want (the +- bit that says table SALES_HISTORY has 7million rows <b>+-</b> 5% ). and a thrid is how confident you wish to be about the accuracy of your stats (typically 95% or 99% accurate).

> 3. I like your idea of creating a table to hold the sample size for each
> table in the schema. Other columns could hold the reanalyze interval, date
> of next analyze, etc. This would ensure each table was analyzed
> appropriately.

I like the idea as well, with the following caveats.

I'd be more than a bit worried that it might turn out like Mike Hordilla's package for index reorgs that was on OTN a while back to be a highly complicated, flexible and efficient way of doing something that might well be best left undone :(. Oh and it might suffer in a similar way from a lack of understanding by end-users as to what it was doing and why.

For example what my comments in 1 above mean - if correct and its a while since I was at college - is that the package would need to answer the following question about each analyzed segment. Do the current stats meet my accuracy requirements? Are the stats 'Good Enough' in other words. This is because we wouldn't wish to calculate an 'interval' for running stats for each table, but calculate the degree of confidence that we have that the stats are 'good enough'. Unfortunately there is rather a 'gotcha' with this approach (unless there is a smart stats way around this) which is that in order to answer the 'are my stats' good enough?' question we need to know how large our population is (to at least a good degree of accuracy) - in order to to this we need to collect stats :(. We can of course project population deltas for any given segment from past history, but frequently the past is no good guide to the future - in particular changes in busines process, application software or economic environment can and do affect the rate of change of data in our commercial databases.

I think what I am saying that if we buy the argument that one should not sample every x days (hours, years), because we care about the appropriateness of the stats and not their 'freshness' then building an automated system to gather appropriate stats becomes a non-trivial task.

Some googling discovered http://www.surveysystem.com/sscalc.htm which gives a good coverage of the relevant stats.

> 4. My boss heard some Oracle expert say that 30% was the best sample size.
> Does anyone have an idea of the source?

someone who wasn't a statistician :)

-- 
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com
----------------------------------------------------------------
Please see the official ORACLE-L FAQ: http://www.orafaq.com
----------------------------------------------------------------
To unsubscribe send email to:  oracle-l-request_at_freelists.org
put 'unsubscribe' in the subject line.
--
Archives are at http://www.freelists.org/archives/oracle-l/
FAQ is at http://www.freelists.org/help/fom-serve/cache/1.html
-----------------------------------------------------------------
Received on Mon Jun 21 2004 - 05:54:35 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US