# Re: index skip scan

Date: Tue, 6 Jan 2009 23:36:06 -0000

Message-ID: <RPKdnZfjEvdDd_7UnZ2dnUVZ8tXinZ2d_at_bt.com>

"jernigam_at_nospam.kochind.com" <jernigam_at_kochind.com> wrote in message
news:19a8e76c-755b-45f5-a8f7-e17b7b1083cf_at_g3g2000pre.googlegroups.com...

*>I figured out how it is calculating the selectivity of the*

*> inv_trans_qty_ind2 index. First I transposed some numbers in the*

*> middle of the calculateion. It should have been 7305 instead of 7035*

*> for the high value - the low value. Second instead of 2/num distinct,*

*> it is using 1/num distinct.*

*>*

*> Instead of*

*> (val1-val2)/(high value-low value)+ 2/num distinct*

*> (Dec 5- Dec 1)/(Jan 1 2013 - Jan 1 1993) + 2/4022*

*> 4/7035+2/4022*

*> .000568586+.00497265 = .001065 instead of .0007962*

*>*

*> (val1-val2)/(high value-low value)+ 1/num distinct*

*> (Dec 5- Dec 1)/(Jan 1 2013 - Jan 1 1993) + 1/4022*

*> 4/7305+2/4022*

*> .00054757+.000248633 = 0.0007962*

Pulling the pieces together.

- There are some strange things going on in the arithmetic of skip scans, and
- things changed dramatically from 9i to 10g - and there are more changes (including some bug fixes) in 11g.

The index range scan on the date-only index behaves as expected. You have a "clopen" range - i.e "<" at one end (open) and "<=" at the other (closed).

Hence the index selectivity (and the index selectivityy with filtering) is the second formula you've produced. In the execution plan for the range scan I expect you see the cost on the index at roughly 45 and most of the cost - about 1390 - on the table access.

Obviously that 1390 should still appear as the marginal cost of visiting the table - you still have to visit those blocks - but some anomaly in the skip scan calculation has dropped this down to 427,

And now a bit of guesswork:

You'll see the ANDV figure of 560 in the skip-scan path. This is
the 'actual number of distinct values' - Oracle has decided that
doing the skip scan, it will pick up 560 distinct combinations of
(addr_id, effective_date). I think Oracle has done this through
some Bayesian-like arithmetic. Since we know that there are
4 distinct dates involved, this means that Oracle has effectively
decided that there are likely to be 140 distinct values of addr_id
appearing in selection. It's possible that the next piece of arithmetic
has taken that 140 values (rather than 560 values) and applied the
'Datablocks per key' value of 3 for that index to come up with the
incremental cost of 427 for visiting the table.

I haven't cross-checked this guess with the calculations for the rest of the indexes yet.

Mladen suggested dropping the clustering factor of the index on effective_date to bypass this problem. This will work, but you'll have to divide by something like 1.5 or greater. It's a small enough number that it may not be a major threat to many other queries - and I'd guess you have an oddity built into this column anyway, given that you've to a 20 year range of date, but only 4,400 distinct dates rather 7,300.

-- Regards Jonathan Lewis http://jonathanlewis.wordpress.com Author: Cost Based Oracle: Fundamentals http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html The Co-operative Oracle Users' FAQ http://www.jlcomp.demon.co.uk/faq/ind_faq.htmlReceived on Tue Jan 06 2009 - 17:36:06 CST