Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.misc -> Re: analyzing tables

Re: analyzing tables

From: Tad Davis <davist_at_isc.upenn.edu>
Date: 1997/12/27
Message-ID: <davist-2712971640020001@ts15-62.upenn.edu>#1/1

In article <67mbua$osh$1_at_Radon.Stanford.EDU>, hasan_at_DB.Stanford.EDU (Waqar Hasan) wrote:

> (d) What sampling percentage is reasonable when data is skewed. 

> I am really intersted in hearing about what worked in practice rather
> than claims based on statistical theory (I suspect that assumptions needed
> by the theory are not true).
We are using Oracle 7.1 for a Data Warehouse application. (We'll be moving soon to 7.3, but not just yet.) I understand that the optimizer in 7.3
works quite a bit differently, so our rules of thumb may not apply.

We refresh the Warehouse once a day, at night, from various transactional system; each batch load concludes with an analyze step. There are some time constraints, because we have to make sure we finish all loads by a certain time in the early morning. We have tended to view this as a tradeoff between getting the best possible query plans and finishing within a given batch window. For reasonably small tables (less than 100,000 rows) we will typically analyze by computing statistics -- i.e. by doing the whole thing. For larger tables we will analyze by estimating statistics, sampling a percentage of rows. Oracle 7.1 will second-guess this to some extent -- if it thinks it can do a "compute statistics" nearly as fast as the percentage you specify, it will go ahead and do it. If you specify a percentage close to 50%, it will probably do a "compute" rather than an "estimate". Most of our larger tables are analyzed by sampling 25% of the rows. For some of the largest tables (with rows in the millions), a sample that large would take too long, so we have reduced the percentages to single digits in some cases. There is probably a point of diminishing returns -- at a certain point the improvement in query plans will be so small that it will not justify the extra time spent analyzing.

My understanding from the Oracle 7.1 manual is that if you say "ANALYZE TABLE," it will automatically analyze the associated indexes. All of our analyze steps consist of "analyze table" statements only. When we made this switch (from the more elaborate scheme of analyzing tables and each individual index separately), we did not see a degradation in query performance. The "explain plan" routine continues to choose indexes over full-table scans with about the same frequency. I don't know if analyzing indexes separately will produce fuller stats or compute categories of stats that are not included in the "table" version.


Tad Davis                  Lead Programmer Analyst
davist_at_isc.upenn.edu       Information Systems and Computing
voice 215-898-7864         Administrative Information Technologies
fax   215-898-0386         University of Pennsylvania
------------------------------------------------------------------
Received on Sat Dec 27 1997 - 00:00:00 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US