RE: index columns

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Sat, 18 Apr 2015 08:24:40 +0000
Message-ID: <CE70217733273F49A8A162EE074F64D928288DAE_at_EXMBX01.thus.corp>


I think an example like this is worth a couple of pages on its own, particularly in light of my comment about "consider the full set of indexes on the table".

First - it is extremely likely that an index on just (account_id) would exist because it's "almost distinct" and looks like the sort of column which might commonly be used to find data (irrespective of whether the flag was Y or N). This "sole use" of column would (almost guaranteeably) over-ride any choice of column ordering.

Second - If you had a query that was concerned with the 99% YES rows then you could argue that the 1% overhead introduced by going to the table rather than resolving in the index was not sufficient to justify adding the column to the index anyway - presumably it's a column that can change value, so there's a cost in undo and redo to consider if you do add it.

Third - if you had a common requirement to pick account_ids from in the 1% NO rows then you could argue the case for creating a function-based index (and modifying the SQL to match) defined as something like: (case yes_or_no when 'NO' then account_id end)

Regarding the accidental use of the index - accidents will always happen with the optimizer, and the potential for side effects and anomalies is just part of the consideration that goes into the indexing set and planning for special queries. In this example, by the way, I'm not sure that column group stats would be likely to help: with a 99/1 split I imagine you'd have a histogram on yes_or_no as a general aid to cardinality calculations, and with "almost distinct" on the account_id and only two values on yes_or_no the column group "num_distinct" would be very close to the number of distinct values for account_id, and should echo the distinct_keys in the two-column index anyway: and you wouldn't get a useful histogram on the column group, then you'd have to worry about odd interference between the column group stats and the histogram on yes_or_no. (I don't know what the state of play is at present when individual columns have histograms and the column group doesn't, but I recall that Chris Antognini found at least one anomaly: http://antognini.ch/2014/02/extension-bypassed-because-of-missing-histogram/ )

Regards
Jonathan Lewis
http://jonathanlewis.wordpress.com
_at_jloracle



From: oracle-l-bounce_at_freelists.org [oracle-l-bounce_at_freelists.org] on behalf of Tim Gorman [tim_at_evdbt.com] Sent: 18 April 2015 00:34
To: oracle-l_at_freelists.org
Subject: Re: index columns

Consider the scenario where there are two columns, ACCOUNT_ID and YES_OR_NO, where ACCOUNT_ID is almost distinct and YES_OR_NO is populated with the "YES" in 99% of rows and "NO" in the remainder?

If both columns are used in a query, then clearly the ordering of the columns doesn't matter, as cited in Richard Foote's referenced blog posts.

But what if *only* the YES_OR_NO column is used in a query? If that column is leading the index, then it is quite possible that the optimizer might choose to perform an INDEX RANGE SCAN on that index rather than performing a FULL table scan, which would be disastrous for performance. However, if ACCOUNT_ID is the leading column, then that is far less likely to happen, although a rogue INDEX SKIP SCAN remains a remote possibility.

Column statistics, especially extended column statistics such as column groups (as described by Maria Colgan online here<https://blogs.oracle.com/optimizer/entry/how_do_i_know_what_extended_statistics_are_needed_for_a_given_workload>), can provide the optimizer enough information to prevent it from making an inappropriate decision in the scenario described above.

If column statistics or extended column statistics are not guaranteed to be in use and gathered correctly, then it is definitely worthwhile to consider cardinality when ordering columns in a concatenated index.

If your goal is that your trousers never fall down accidentally, then wear both a belt and suspenders.

On 4/17/15 16:44, Stefan Koehler wrote:

Hi Orlando,
oh a classic oldie and common myth :-))

I think the only source i need to point to is Richard Foote's blog post about this:
- https://richardfoote.wordpress.com/2008/02/13/its-less-efficient-to-have-low-cardinality-leading-columns-in-an-index-right/

Best Regards
Stefan Koehler

Freelance Oracle performance consultant and researcher Homepage: http://www.soocs.de
Twitter: _at_OracleSK

Orlando L <oralrnr_at_gmail.com><mailto:oralrnr_at_gmail.com> hat am 18. April 2015 um 00:02 geschrieben:

 All

 My colleague and I got into a discussion about indexes. I feel that putting the most selective column first while creating multi column indexes is the correct approach, followed by second most selective column as the second column in the index and so on. My colleague feels that the order does not matter. Can someone clarify.

 OL

--
http://www.freelists.org/webpage/oracle-l







--
http://www.freelists.org/webpage/oracle-l
Received on Sat Apr 18 2015 - 10:24:40 CEST

Original text of this message