Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> Statistical sampling and representative stats collection

Statistical sampling and representative stats collection

From: Jack Silvey <jack_silvey_at_YAHOO.COM>
Date: Tue, 21 May 2002 13:43:33 -0800
Message-ID: <F001.00466B6C.20020521134333@fatcity.com>


Hi all,

Did some investigation about statistical sampling this weekend since we are going to optimize our analyze process soon, and would like some input from all you orabrains on this one.

I opened a TAR with Oracle asking about the sampling algorithm of stats collection, and they assured me it was random.

The goal of analyze...estimate is to collect stats that are representative of the data population as a whole using a given sample set. Since analyzing tables takes up resources (does sorts to order the data for investigation) the fewer rows you use in estimate, the less system resources you use and the faster the analyze will go.

Since our goal is to get as small a sample as possible and still have stats that are representative, my contention is that we could start by finding what the margin of error will be for each sample size and gauge our tolerance for it.

One standard way to calculate margin of error for a given sample is by using this formula:

M = 1/SQRT(N) where:
M = margin of error
N=sample size

So, if we can tolerate stats that have a 1% a margin of error (will deviate from representative of the whole population by 1%), our sample size should be 10,000 rows.

Also, a corollary (not a toyota corollary, though) to this would be that the more rows you add to your sample, the closer to representative your sample will be. So, in order to test whether your sample is representative enough, you could analyze using either estimate 49% or compute, take a snapshot of the stats, and then compare the stats from a 10,000 row estimate to those. Then, add rows to your estimate until you are satisfied with the stats.

This of course is a pie in the sky mathematical model, but seems like a reasonable place to start with testing.

Input? Input? Buhler? Buhler?

/jack silvey



Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com
--

Please see the official ORACLE-L FAQ: http://www.orafaq.com
--

Author: Jack Silvey
  INET: jack_silvey_at_yahoo.com
Fat City Network Services    -- (858) 538-5051  FAX: (858) 538-5051
San Diego, California        -- Public Internet access / Mailing Lists
--------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). Received on Tue May 21 2002 - 16:43:33 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US