RE: long, but not stupid Question about indexing

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 12 Jul 2007 10:29:58 -0400
Message-ID: <016201c7c491$20db3f50$1100a8c0@rsiz.com>

Too much to cover it all.

First, indexes absolutely can be used by queries containing inequalities in the predicate for the relevant columns. Whether they will be used depends on your statistics (possibly including histograms), what release of Oracle you're on, and either an heuristic clause-access path rating built into the RBO or a permutation limited evaluation of the "COST" that will be related to (but not exclusively determined by) what fraction of the blocks of the index and table the CBO expects (based on the statistics) to have to visit to produce the required result set.

Now there are various sophisticated numeric methods for grabbing the set of points within a box defined by four lat,lon pairs and you should probably employ one of those but let's just pretend someone is only interested in the data between zero and ten degrees for each. Then an index on lat, or an index on lon, or an index on lat and lon could each potentially help very much.

Likewise, if one of your columns is household income and you only want to see rows related to household incomes over 1000000 USD, then an index on that data value is likely to be very helpful except perhaps in Kuwait and Qatar.

Since you're reading up on 10.2, I'll presume you're at least on a version where lack of a predicate on the left most set of columns of an index does not rule out use of the index if the CBO determines skipping through the index is still less in expected cost than other means of accessing the column values of the rows you need. The calculation of whether various single column indexes or combinations of concatenated column indexes is highly dependent on the query pattern and the actual data. If a common leading edge set of columns gets you to a dramatic subset of the rows, then you do want to try to always include those columns in the predicate.

It is correct that if all the required columns for some table are in an index, then you might avoid actually visiting table blocks altogether.

Your description of possibly using a function based index is a case that often works out well in the balance of cost of maintenance versus cost of queries. Mileage varies.

To the extent you have knowledge of the query predicate patterns that will actually be used, you have a huge opportunity to match your indexes very well to that anticipated use.

Every index you add introduces extra overhead in the insert, update, and deletion of rows in the requisite table. I think Craig Shallahammer has a paper out there somewhere that addresses the detailed math of whether a particular index is good, and despite being an old paper it is certainly still relevant.

Finally (from me, this could be a long book to treat your questions comprehensively), if your data is very old and the relative rate of change is modest compared to the accumulated data, and if you do in fact know the likely patterns of access then physically ordering your data so that index row selectivity corresponds well to block selectivity has huge potential benefits. (Which are of course potentially perishable as rows are inserted, updated, and deleted). In your case you mention some columns that are always relevant. Especially if old job_id,lab_id rows are unlikely to be added to, then that might be an effective order in your case, and the rebuild could be the chance to copy all the rows with nulls in those columns to table_old while copying in job_id, lab_id order (possibly with additional ordering) to table_current. Various cluster based storage methods might also turn out well for you (I'd start with reading Steve Adams' stuff on clusters and cross reference that with what release of Oracle you'll be using, and do remember the tradeoff of using clusters versus long term maintenance in context of whether or not you'll be using partitioning).

Please notice I've used a lot of words like "might" and "potential" and that correspondence of the best indexing strategy to an actual case involves at least the rate of change of the data and the query pattern.

Regards,

mwf

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Bill Ferguson
Sent: Thursday, July 12, 2007 8:35 AM
To: oracle-l_at_freelists.org
Subject: (stupid) Question about indexing

<snip>

3. I don't believe that indexes are used when > or < symbols are used in sql, am I correct?

data_value

again, nobody has ever queried for a single value, only by a range of values. i.e. why query for ICP Ti=.014 instead of .015

At this point, I do not see any advantage for indexing these fields. Am I missing something?

So, starting with the numeric indexes, the application allows the user to specify a "box" (N, S, E, W) of latitude/longitude coordinates. I'm thinking that having these indexed will improve performance, as the where clause does a between comparison. With the data_value, the query will do either a ">=" or a "<=" depending on the user selection, and again I think an index helps here as well. I also asked for a couple function based indexes for a couple fields, where the data exists as upper, lower or mixed, so I don't have to do a case conversion during the query and negate the benefit of having the field indexed. The submitter field is one example. It can either be all upper case, all lower case, or both.

<snip>

--
http://www.freelists.org/webpage/oracle-l

Received on Thu Jul 12 2007 - 09:29:58 CDT