Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: analyzing tables
In article <67mbua$osh$1_at_Radon.Stanford.EDU>, hasan_at_DB.Stanford.EDU (Waqar Hasan) wrote:
> (d) What sampling percentage is reasonable when data is skewed.works quite a bit differently, so our rules of thumb may not apply.
> I am really intersted in hearing about what worked in practice rather
> than claims based on statistical theory (I suspect that assumptions needed
> by the theory are not true).
We are using Oracle 7.1 for a Data Warehouse application. (We'll be moving soon to 7.3, but not just yet.) I understand that the optimizer in 7.3
We refresh the Warehouse once a day, at night, from various transactional system; each batch load concludes with an analyze step. There are some time constraints, because we have to make sure we finish all loads by a certain time in the early morning. We have tended to view this as a tradeoff between getting the best possible query plans and finishing within a given batch window. For reasonably small tables (less than 100,000 rows) we will typically analyze by computing statistics -- i.e. by doing the whole thing. For larger tables we will analyze by estimating statistics, sampling a percentage of rows. Oracle 7.1 will second-guess this to some extent -- if it thinks it can do a "compute statistics" nearly as fast as the percentage you specify, it will go ahead and do it. If you specify a percentage close to 50%, it will probably do a "compute" rather than an "estimate". Most of our larger tables are analyzed by sampling 25% of the rows. For some of the largest tables (with rows in the millions), a sample that large would take too long, so we have reduced the percentages to single digits in some cases. There is probably a point of diminishing returns -- at a certain point the improvement in query plans will be so small that it will not justify the extra time spent analyzing.
My understanding from the Oracle 7.1 manual is that if you say "ANALYZE TABLE," it will automatically analyze the associated indexes. All of our analyze steps consist of "analyze table" statements only. When we made this switch (from the more elaborate scheme of analyzing tables and each individual index separately), we did not see a degradation in query performance. The "explain plan" routine continues to choose indexes over full-table scans with about the same frequency. I don't know if analyzing indexes separately will produce fuller stats or compute categories of stats that are not included in the "table" version.
Tad Davis Lead Programmer Analyst davist_at_isc.upenn.edu Information Systems and Computing voice 215-898-7864 Administrative Information Technologies fax 215-898-0386 University of Pennsylvania ------------------------------------------------------------------Received on Sat Dec 27 1997 - 00:00:00 CST