Re: FOR ALL INDEXED COLUMNS SIZE 254

From: Randolf Geist <mahrah_at_web.de>
Date: Tue, 13 Sep 2011 02:03:14 -0700 (PDT)
Message-ID: <921e9307-8fd4-47f3-a406-a188e359a2dd_at_h6g2000yqe.googlegroups.com>



On Sep 12, 5:15 pm, Mladen Gogala <gogala.mla..._at_gmail.com> wrote:
> I have frequently used the method_opt from the title to analyze my tables
> and have been castigated for it with amazing regularity. I still don't
> quite understand why is that method bad. My logic is quite simple:
> The main decision that an optimizer has to make about the particular
> column condition is whether to use index to resolve it or not. That
> decision is based on the column statistics, as well as the the index
> statistics.  The things that influence the decision are the condition
> itself, the existence of the histograms which helps to estimate the
> number of rows that need to be read, the clustering factor which will
> help estimate the number of blocks to read, the values from SYS.AUX_STAT$
> to estimate the I/O cost and the statistics from the table itself. There
> are two possible decisions that optimizer can make with respect to the
> column: whether to use an index or not.
> There is no decision to be made if the index is not there, except with
> respect to join method, whether to use sort or hash. That means that
> using "for all indexed columns size 254" actually makes a lot of sense
> and it doesn't waste as much space as "FOR ALL COLUMNS SIZE 254". There
> is an option "FOR ALL COLUMNS SIZE SKEWONLY" which would be nice, except
> that I have no idea of the histogram size.
> This method_opt was called "a bane":http://structureddata.org/2008/10/14/dbms_stats-method_opt-and-for-all-
> indexed-columns/
>
> So, why is this a bane? The explanation on the page is less than clear.
>
> --http://mgogala.byethost5.com

There are several things to consider:

  1. If you use the INDEXED keyword in the METHOD_OPT clause then you effectively do not gather any statistics for columns that are not indexed. If you know that all columns that are used as filter / join predicates are indexed, then this is fine. If you however have columns that lack basic column statistics but are relevant then the CBO is in trouble. Dynamic sampling won't be used for such cases and Oracle falls back to hard-coded built-in defaults for selectivity/cardinality estimates based on those columns. Hence cardinality estimates are potentially way off leading to bad join and access orders. This is what Charles referred to as common misunderstanding - bad cardinality estimates are your enemy no. 1 for good performing execution plans.

Bottom line here is: You should ensure to have at least basic column statistics on all columns that are relevant as join / filter predicates. Note that I'm not talking about histograms here but merely basic column statistics

2. Using the clause SIZE 254 Oracle will unconditionally generate histograms on those columns. You don't need histograms on all (indexed) columns, for most cases basic column statistics are more than sufficient.

The point here is that histograms can be counter-productive for many cases. If you end up with height-balanced histograms (more than 254 distinct column values) the usefulness of the histogram is quite questionable and only helpful in rare cases.

But even if you have a frequency histogram there are little implementation details and side-effects that can introduce surprising problems, joins included.

Bottom line here is: Only generate histograms where necessary - and I believe that this means even less histograms than are generated when using the SIZE AUTO option. Histograms should rarely be applied, and in some cases you would actually need to craft a histogram manually in order to be beneficial rather than gather the histogram.

For more details I recommend reading Richard Foote's recent blog post on the SIZE AUTO option including the comments:

http://richardfoote.wordpress.com/2011/09/01/method_opt-size-auto-quiz-solution-the-trickster

Some of the reasons why histograms can be bad I've summarized in a comment to the post:

http://richardfoote.wordpress.com/2011/09/01/method_opt-size-auto-quiz-solution-the-trickster/#comment-31285

I would summarize a general recommendation like this:

  • Use FOR ALL COLUMNS SIZE 1 as default. This way you ensure that you end up with basic column statistics on all columns.
  • Generate histograms only on columns where you can prove that it will be beneficial - do this explicitly or you could use an approach of SIZE REPEAT instead of SIZE 1 and generate the histogram once explicitly
  • There are cases where you will only benefit from a histogram when crafting it manually

Hope this helps,
Randolf Received on Tue Sep 13 2011 - 04:03:14 CDT

Original text of this message