Re: index skip scan

From: Jonathan Lewis <jonathan_at_jlcomp.demon.co.uk>
Date: Tue, 6 Jan 2009 23:36:06 -0000
Message-ID: <RPKdnZfjEvdDd_7UnZ2dnUVZ8tXinZ2d_at_bt.com>


"jernigam_at_nospam.kochind.com" <jernigam_at_kochind.com> wrote in message news:19a8e76c-755b-45f5-a8f7-e17b7b1083cf_at_g3g2000pre.googlegroups.com...
>I figured out how it is calculating the selectivity of the
> inv_trans_qty_ind2 index. First I transposed some numbers in the
> middle of the calculateion. It should have been 7305 instead of 7035
> for the high value - the low value. Second instead of 2/num distinct,
> it is using 1/num distinct.
>
> Instead of
> (val1-val2)/(high value-low value)+ 2/num distinct
> (Dec 5- Dec 1)/(Jan 1 2013 - Jan 1 1993) + 2/4022
> 4/7035+2/4022
> .000568586+.00497265 = .001065 instead of .0007962
>
> (val1-val2)/(high value-low value)+ 1/num distinct
> (Dec 5- Dec 1)/(Jan 1 2013 - Jan 1 1993) + 1/4022
> 4/7305+2/4022
> .00054757+.000248633 = 0.0007962

Pulling the pieces together.

  1. There are some strange things going on in the arithmetic of skip scans, and
  2. things changed dramatically from 9i to 10g - and there are more changes (including some bug fixes) in 11g.

The index range scan on the date-only index behaves as expected. You have a "clopen" range - i.e "<" at one end (open) and "<=" at the other (closed).

Hence the index selectivity (and the index selectivityy with filtering) is the second formula you've produced. In the execution plan for the range scan I expect you see the cost on the index at roughly 45 and most of the cost - about 1390 - on the table access.

Obviously that 1390 should still appear as the marginal cost of visiting the table - you still have to visit those blocks - but some anomaly in the skip scan calculation has dropped this down to 427,

And now a bit of guesswork:
You'll see the ANDV figure of 560 in the skip-scan path. This is the 'actual number of distinct values' - Oracle has decided that doing the skip scan, it will pick up 560 distinct combinations of (addr_id, effective_date). I think Oracle has done this through some Bayesian-like arithmetic. Since we know that there are 4 distinct dates involved, this means that Oracle has effectively decided that there are likely to be 140 distinct values of addr_id appearing in selection. It's possible that the next piece of arithmetic has taken that 140 values (rather than 560 values) and applied the 'Datablocks per key' value of 3 for that index to come up with the incremental cost of 427 for visiting the table.

I haven't cross-checked this guess with the calculations for the rest of the indexes yet.

Mladen suggested dropping the clustering factor of the index on effective_date to bypass this problem. This will work, but you'll have to divide by something like 1.5 or greater. It's a small enough number that it may not be a major threat to many other queries - and I'd guess you have an oddity built into this column anyway, given that you've to a 20 year range of date, but only 4,400 distinct dates rather 7,300.

-- 
Regards

Jonathan Lewis
http://jonathanlewis.wordpress.com

Author: Cost Based Oracle: Fundamentals
http://www.jlcomp.demon.co.uk/cbo_book/ind_book.html

The Co-operative Oracle Users' FAQ
http://www.jlcomp.demon.co.uk/faq/ind_faq.html
Received on Tue Jan 06 2009 - 17:36:06 CST

Original text of this message