Re: artificial values vs null

From: Niall Litchfield <niall.litchfield_at_gmail.com>
Date: Fri, 24 Sep 2004 13:59:59 +0100
Message-ID: <7765c89704092405597ca0661e@mail.gmail.com>

On Fri, 24 Sep 2004 21:51:33 +1000, Nuno Souto <nsouto_at_bizmail.com.au> wrote:
> ----- Original Message -----
> From: "Niall Litchfield" <niall.litchfield_at_gmail.com>
> >> Whatever the optimiser may do with 31 dec 2099,
> >> it will be a darn long shot better than a NULL value
> >> that can't be indexed...
> >
> > you may regret making that statement.... see below.
>
> Not at all. My statement stands: it's not a throwaway comment.
> The use of a specific value as opposed to a NULL will
> cause the optimiser to pick an index if one exists and the
> query is a simple one.

For finding values that are 'CURRENT' I think we both agree that having a status field is exactly the right approach and that using NULL is incorrect. I was trying to deal with what happens to queries that operate on data where one has taken the decision to avoid nulls and use artificial values instead. This wasn't the focus of the original discussion, which is why I've renamed the subject line, but is an area where not using NULL when one should can seriously mess with the optimizer.

> Of course: if there isn't one or you use an index-stop form of
> the predicate, or you use a complex series of predicates where a
> FTS or hash makes more sense, then it might not.
>
> I did not say it was the perfect solution. However, it works when
> the perfect solution is just too expensive to implement. For
> whatever reasons.

I'd go down the histogram route then...

> What I fail to understand is what the char/number part of your
> script proves. That predicates involving mismatched data types
> don't use an index? Known since V4 and it has nothing to do
> with using artificial dates.

Not at all - there shouldn't be any *mismatched* datatypes in my queries (if there are I'll amend the script) - it was designed to deal with those apps/design decisions that use character fields or numerics for dates. Here the problem can get even worse since the range of valid values for the field in business terms (i.e of the form YYYYMMDD) is so much less than the range of allowable values for the field.

> > What I find is that *in the absence of histograms* then using an
> > artificial value - especially if you do the other thing app designers
> > do and use the wrong datatype
>
> Oh no they don't! Not *THIS* app designer. The correct way of
> implementing this artificial date is to create a stored function
> that returns the correct type and value. It is the function
> return value that we compare against in the predicate:
> guaranteed data type match. And guaranteed storage of the
> correct high value, with a couple of triggers.

Not sure if you are reacting personally, or to the idea of using wrong datatypes as a bad idea. I didn't intend to single any individual out with the post, and I apologize if it reads like I thought you personally did bad designs.

>
>
> > do will most likely not be sensible at all. histograms do fix this,
> > but at a non-zero cost.
>
> Agreed. I still think though that if one has the possibility of
> complex joins, histograms are a must (10g possibly excepted?)
> In this I must grant that DKB is correct, IME: use of histograms
> is mandatory to get the optimiser to pick the correct mix
> of indexes in a complex join. At least in 7 and 8i. With 9i,
> I've already found a few cases where it makes the correct
> choices, histograms or no histograms. It appears dynamic sampling
> works quite well.

I did think about including a note on this, I have observed under 10g - don't have any 9i databases with willfully missing stats - that dynamic sampling means calculate some key stats on a sample of the query you are about to execute. I.E. I *think* that it gets stats based on your query rather than just randomly sampling tables in the query with no stats on them [1]. Thus it may be that having no stats is better than having insufficient stats.

-- 
Niall Litchfield
Oracle DBA
http://www.niall.litchfield.dial.pipex.com

[1] It appears that actually it does block sampling on the tables
*and* a sample based on rownum on your query in case your query is
skewed but I don't have anything like a full understanding of dynamic
sampling.
--
http://www.freelists.org/webpage/oracle-l

Received on Fri Sep 24 2004 - 07:58:28 CDT