Re: artificial values vs null

From: Nuno Souto <nsouto_at_bizmail.com.au>
Date: Sat, 25 Sep 2004 01:24:50 +1000
Message-ID: <006601c4a24a$c9f2f900$3dfaf63c@DCS005>

Thanks Richard. Makes clear sense now.
Niall clarified it as well with his example, offline.

I still think that a high value is not
"messing" with Oracle's perception of even data distribution. Quite frankly, in this case I reckon it's wishful thinking of Oracle to assume the distribution is even to start with. As Mark put it: it is an implementation detail, not a mandatory necessity of design. It's Oracle's perception that is wrong. Assuming even distribution of any value is always dangerous. As is for example assuming an ordered distribution.

Does anyone recall when low-high value weightig became part of the optimiser? I don't remember it ever being in 7 or 8i, probably missed it.

In my cases the usual problem is someone slapping a NULL on the end date of a time series to mean current row. When the most accessed single row is precisely that one. So now instead of a FTS for a range scan (which is perfectly acceptable) I end up having to do a FTS to find one row: the current one. And given that replacing the NULL with a high value will only make that single row access now use an index (no range scan), I'm relatively safe from this problem.

We have to use histograms anyways: only thing that will avoid problems with all the large joins we have. Short of re-writing the app, which is what I wanted to do. But once again it got over-ruled in the name of "budgeting". I wonder if the long term consequences of these decisions are EVER included in these budgets?...

Cheers
Nuno Souto
nsouto_at_bizmail.com.au
----- Original Message ----- >
>
> Hi Nuno,
>
> It's because by default, Oracle assumes an even distribution of values
> between the low and high values of your (date) column data.
>
> Simplistically, if you have 1000 days worth of data between your lowest
> and
> highest dates, and you perform a date search looking for a range of say
> 100
> days (between 1 Jan 2004 and 10 April 2004 or so), then Oracle calculates
> the cardinality to be approx 10% of data:
>
> (high search value - low search value) / (highest value - lowest value) *
> 100
>
> However, if you insert an artificial high (or low) value that isn't
> representative of your data distribution, this "stuffs" up the above
> formula
> and Oracle incorrectly assumes a much smaller % of data to be returned
> than
> is valid. Setting a high date of say year 4000 means that any range scan
> of
> your "real" date distribution will be considered incredibly small (even
> potentially when searching for *all* your data) and push Oracle to perhaps
> use indexes when FTS are more appropriate.
>
> Sure, your accesses to the year 4000 data may be fine, but all other date
> range based scans will not be so great (unless you now look at using
> histograms and the various overheads that then implies).
>
> As soon as you start messing with the even distribution of Oracle's
> perception of the data, you potentially start introducing all sorts of new
> problems. I remember coming across a 1/3 party app that used American's
> Independence Day as it's "default" date and it's caused all sorts of these
> type of problems (hence the same issue with dates way in the past).
>
> Hope it makes sense !!
>
> Cheers
>
> Richard
>
>
>
>
>
>
>
> --
> http://www.freelists.org/webpage/oracle-l

--
http://www.freelists.org/webpage/oracle-l

Received on Fri Sep 24 2004 - 10:21:39 CDT