Re: FOR ALL INDEXED COLUMNS SIZE 254

From: Noons <wizofoz2k_at_gmail.com>
Date: Mon, 12 Sep 2011 22:09:28 -0700 (PDT)
Message-ID: <eb7d5542-b85a-44b7-b5b1-af9204512829_at_m4g2000pri.googlegroups.com>



On Sep 13, 8:45 am, Charles Hooper <hooperc2..._at_gmail.com> wrote:

>  “The columns that really do need histograms, where it really does
> matter, are the indexed columns. Those are the only columns where the
> optimizer can choose the access path.”
>  That is a common misconception: that histograms only matter for
> indexed columns. Choosing an access method ( index or tablescan ) for
> a row source is only part of the task of the CBO when putting together
> an access plan. Other, at least equally important, tasks are the types
> and especially order of joins – unless none of your sql contain joins.

Which, strangely enough, is the precise case of the examples provided by Greg in that entry.
Sorry. but trying to convince anyone that joins can be affected with an example that doesn't use joins is, IMHO, a bit of a dream.

Yes, we can see the difference in cardinality. Yes, the query in the examples uses no indexes in ALL cases exposed. As it should, given there are no indexes in those columns.
What exactly is the problem?

What, you telling me that in a join somehow magically the CBO would create an index on the fly and use it if there were histograms for all columns?

You don't use Peoplesoft HR much, do you? I can provide some concrete examples where using histograms in all columns, indexed or not, can be deadly in 10gr2.
In fact, there is now (FINALLY!!!) a documented process that specifically eliminates histograms from certain tables used by PS for paycalc runs and as a result gets reasonable performance instead of the horrible pot-luck that is histograms combined with bind variable peeking in 10gr2.

So, please: before we start crucifying those who by choice wouldn't give a fig about being an ACE or OCP, wouldn't it be better to actually provide proper examples if at all possible? Particularly where relevant to a specific release? It's not like the CBO behaves consistently across releases or has EVER done so in recent history!...

> And it is there where column selectivities, derived from column
> statistics, play a crucial role. If you deprive the optimizer of this
> vital information it is bound to produce suboptimal, even very bad
> plans. If you have not yet encountered that count yourself lucky."

I've never encountered that situation and I am not particularly lucky.
Given lack of indexes and partition pruning, the CBO can only produce full table scans to get data, end of story and period. That may or may not be bad, depending on a huge set of factors and conditions.
Nothing to do with histograms on indexed or non-indexed columns.

> snippage...
> Now consider that table T1 has an unindexed column named STATUS - the
> status is one of RELEASED, CLOSED, and CANCELLED.  Your particular
> query is looking for the few rows in table T1 with a STATUS of
> CANCELLED, with just 200 of the 1,000,000 rows matching that
> criteria.  If a histogram were created on that column, the optimizer
> might change the cardinality estimates like this (note that it likely
> will not be exactly correct due to the selectivity estimated being
> multiplied together when multiple predicates are present with AND
> specified between the predicates:
> T1: 5
> T2: 500,000
> T3: 100,000
>
> In the above case, the selected join order might be T1 -> T3 -> T2
>
> I think that the above is a case where just collecting histograms on
> indexed columns might be limiting.

"might", "likely to change", and so on. See the problem? Yes, there may be a particular case, once in a blue moon, when the wind is blowing in the right direction, where this might be necessary. In the other 99.999999% of the cases, it will never happen. Production dbas deal with the 0.000001% case when/if needed, otherwise they just set one size fits all and go with it. Nothing wrong with that and certainly not an appropriate subject for "crucification".

> Richard Foote stated:
> "I wouldn’t recommend ALL INDEXED COLUMNS. Non-indexed columns can
> benefit from stats and histograms as well as indexed columns as the
> stats can provide important cardinality estimates of steps within an
> execution plan that can influence join orders and when steps are
> performed"

Like I said: "can" is not a synonym for "must" or "will"

> Why not index the STATUS column in my example?  You could, but then
> what if you are looking for something different, like sales in the
> state of Alaska (probably much fewer than for California) - would you
> necessarily index a column that indicates the state of a customer?

Yes, of course I would! In the case where we are after a high cardinality value, the indexes would work perfectly and as desired. In the case of low cardinality, hopefully the histogram accompanying that index - the case of "FOR ALL INDEXED COLUMNS" - would allow the CBO to drop the index and FTS. And in a join the same would happen. Isn't that what one would hope for (abstracting the usual CBO bugs)?

I've said it many times before: a specific example is not proof of a much wider case's validity. I think it applies here more than anything else. Although I'm not sure I'd follow Mladen's idea of "256 always" without further scrutiny, I can't see what the "crucification" problem is with histograms on indexed columns.

> (On second thought, I probably would not have thought to manually
> build a histogram on this column either, but the statistics collection
> process might).

The lesser said about the "auto" stats gathering process, the better... Received on Tue Sep 13 2011 - 00:09:28 CDT

Original text of this message