Re: FOR ALL INDEXED COLUMNS SIZE 254

From: Charles Hooper <hooperc2001_at_gmail.com>
Date: Mon, 12 Sep 2011 15:45:27 -0700 (PDT)
Message-ID: <5c846933-1e7e-43f7-b414-6bd252cdb2c8_at_e8g2000vbc.googlegroups.com>



On Sep 12, 5:15 pm, Mladen Gogala <gogala.mla..._at_gmail.com> wrote: (snip)
> So, why is this a bane? The explanation on the page is less than clear.

Mladen,

I took a look at the link that you provided and found the following comment:
http://structureddata.org/2008/10/14/dbms_stats-method_opt-and-for-all-indexed-columns/#comment-304 "Wolfgang Breitling on December 3, 2008 at 12:57 am In response to Mladen:
 “The columns that really do need histograms, where it really does matter, are the indexed columns. Those are the only columns where the optimizer can choose the access path.”
 That is a common misconception: that histograms only matter for indexed columns. Choosing an access method ( index or tablescan ) for a row source is only part of the task of the CBO when putting together an access plan. Other, at least equally important, tasks are the types and especially order of joins – unless none of your sql contain joins. And it is there where column selectivities, derived from column statistics, play a crucial role. If you deprive the optimizer of this vital information it is bound to produce suboptimal, even very bad plans. If you have not yet encountered that count yourself lucky."

I do not have the full answer for your question, but I think that there is at least one more criteria that must be determined - table join order. Consider this, you have 3 tables (T1, T2, T3). In which of the 6 join orders makes the most sense for the query optimizer:

T1 -> T2 -> T3
T1 -> T3 -> T2
T2 -> T1 -> T3
T2 -> T3 -> T1
T3 -> T1 -> T2
T3 -> T2 -> T1

Assume that you only create histograms on the indexed columns, and the current statistics indicate that each table is estimated to return the following number of rows:

T1: 1,000,000
T2: 500,000
T3: 100,000

What would be the join order selected in the above case? Very likely T3 -> T2 -> T1 (smallest cardinality estimate to largest)

Now consider that table T1 has an unindexed column named STATUS - the status is one of RELEASED, CLOSED, and CANCELLED. Your particular query is looking for the few rows in table T1 with a STATUS of CANCELLED, with just 200 of the 1,000,000 rows matching that criteria. If a histogram were created on that column, the optimizer might change the cardinality estimates like this (note that it likely will not be exactly correct due to the selectivity estimated being multiplied together when multiple predicates are present with AND specified between the predicates:

T1: 5
T2: 500,000
T3: 100,000

In the above case, the selected join order might be T1 -> T3 -> T2

I think that the above is a case where just collecting histograms on indexed columns might be limiting.

In comment 38 in the following blog article: http://richardfoote.wordpress.com/2008/01/04/dbms_stats-method_opt-default-behaviour-changed-in-10g-be-careful/ Richard Foote stated:
"I wouldn’t recommend ALL INDEXED COLUMNS. Non-indexed columns can benefit from stats and histograms as well as indexed columns as the stats can provide important cardinality estimates of steps within an execution plan that can influence join orders and when steps are performed"

Why not index the STATUS column in my example? You could, but then what if you are looking for something different, like sales in the state of Alaska (probably much fewer than for California) - would you necessarily index a column that indicates the state of a customer? (On second thought, I probably would not have thought to manually build a histogram on this column either, but the statistics collection process might).

Charles Hooper
Co-author of "Expert Oracle Practices: Oracle Database Administration from the Oak Table"
http://hoopercharles.wordpress.com/
IT Manager/Oracle DBA
K&M Machine-Fabricating, Inc. Received on Mon Sep 12 2011 - 17:45:27 CDT

Original text of this message