Re: Cost Based Analysis

From: MarkP28665 <markp28665_at_aol.com>
Date: 1997/02/27
Message-ID: <19970227155701.KAA28159@ladder02.news.aol.com>#1/1

Question was >>
analyze table XXXX ESTIMATE statistics; analyze table XXXX COMPUTE statistics; <<

The answer is performance! The base estimate option only looks at around 1064 rows or some such number while compute reads every row in the table. Now as the table in question grows in size the statistics generated by each command will start to drift apart. The main statistics to look at are num_rows and avg_row_len in all_tables to see this, but even more importantly to the otimizer are the stats in all_tab_columns, say, num_distinct which is used in determining which if any index is used to solve a query. Estimate does not do a very good job of generating these values or the all_indexes, distinct_keys column.

If you have Oracle 7.1.6 or higher then I recommend you use the 'sample n rows' option. If you have 7.1.3 Oracle will accept the syntax, but only does the base estimate done. I have compared the stats generated in 7.1.6 for the base estimate and for a row sample and they vary significiantly.

I like to use 'analyze table xxxx estimate statistics sample 50000 rows' as I find that I get very good results from this on tables ranging in size from a dozen to 14 million rows. I have seen the base estimate miss by a million rows on a seven million row table, and the resulting plan was a very poor performer.

Remember also that the optimizer makes some basic assumptions about data distribution that a developer should be able to determine certain cases where optimizer will be wrong. Say when an index has only three distinct values so the cost based optimizer wants to do a full table scan, but the developer knows that 95% of the data has one value while the query will be retrieving the 5% value. In this case it will be wise to code a hint in the statement to use the index.

I estimate that the cost based optimizer is right 60% of the time, and that 20% of the time it makes no real difference in performance between the rule or cost based access path. But, that other 20% of the time can be a real killer. Being able to increase the sample size has really helped our shop.

Final point - analyzing takes a temp area around four times larger than the table

Mark Powell -- The only advise that counts is the advise that you follow so follow your own advise Received on Thu Feb 27 1997 - 00:00:00 CST