Re: How to cope with missing values - NULLS?

From: Peter Koch Larsen <pkl_at_mailme.dk>
Date: 22 Apr 2003 09:18:25 -0700
Message-ID: <61c84197.0304220818.3050fbdf_at_posting.google.com>


"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:<b82sro$1vfe$1_at_gazette.almaden.ibm.com>...
> "Peter Koch Larsen" <pkl_at_mailme.dk> wrote in message
> news:61c84197.0304210717.5005e971_at_posting.google.com...
> > "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
> news:<b7ur3i$2abi$1_at_gazette.almaden.ibm.com>...
> > > "Peter Koch Larsen" <pkl_at_mailme.dk> wrote in message
> > > news:61c84197.0304181445.46842baa_at_posting.google.com...
> > > > [snip] Let us say that we have
> > > > defined a view as an outer join: columns col1 and col2 are "common,
> > > > but col3 and col4 might be missing. Now let's query that view - we are
> > > > only interested in tuples, where col4 is larger than 10: SQL gives us
> > > > something like
> > > >
> > > > SELECT * FROM ojview WHERE col4 > 10
> > > >
> > > > How would you phrase such a query in your (hypothetical) relational
> > > > system?
> > >
> > > Well col4 would not be of type say INTEGER, but of type say INTEGERuUNK
> and I
> > > would not expect the ">" operator to be defined for type INTEGERuUNK.
> >
> > Okay. But if you do so, are you not simply reintroducing the null-type
> > through the back-door, giving it a new name and some slightly new
> > semantics?
>
> It's not a matter of 'slightly new semantics', there are logical differences
> here and as you know, logical differences are BIG differences. However I will
> agree with you that what I suggested does have some superficial similar with
> NULLs. The big difference though, is that my UNK is a value, while a SQL NULL
> is not a value. This means that UNK = UNK, unlike NULLs that don't equal each
> other.

I am not sure I see the logical difference - except for the difference mentioned by you: that a UNK is a value whereas a SQL NULL is not. I do not know the significance of this:
If the user could define its own types (and operators on that type), it would be a simple matter to define a type where the comparison operator did return false if both "present" indicators were missing.

I am talking about a generic type UNK defined perhaps as (using pseudo-pascal - i do not really know what language to know apart from C++ which I shall refrain from using here):

TYPE(GENERIC T) UNK =
    RECORD

      bool present;
      T    value;

    END; OPERATOR = (UNK(T) left, UNK(T) right): BOOL

   if NOT left.present OR NOT right.present THEN RETURN FALSE;    RETURN left.value = right.value;
END; This is the SQL version of the operator, while your operator would look like:

OPERATOR = (UNK(T) left, UNK(T) right): BOOL

   if NOT left.present OR NOT right.present THEN

        RETURN left.present = right.present;    RETURN left.value = right.value;
END; There is not much of a difference the way I look at it and it really bugs down to how you feel the "=" operator should work. One of my thoughts have been that it might be a good idea to introduce an equivalence operator (we could call it "~") behaving the same way as your operator does (UNK ~UNK = TRUE); I've never heard this idea proposed anywhere else.

>
> BTW whenever I use outer joins in SQL (which is very often), I always COALESCE
> away the NULLS. Sometimes a value from the 'missing' column's datatype
> captures the meaning of an unmatched row, typical values being FALSE and 0.
> Othertimes I create a union type as best as SQL can allow. E.g. a union of
> Boolean and NOTYETOCCURED say.
If there is a reasonable COALESCE, this seems like a good idea, but your union type seems somewhat like a hack (which is not said to demean you - we ARE talking SQL ;-). But do you really believe that COALESCING and UNIONs are the real approach?

>
> Outer joins are not a good argument for NULLs.

But an UNK is?

>
> Regards
> Paul Vernon
> Business Intelligence, IBM Global Services
Received on Tue Apr 22 2003 - 18:18:25 CEST

Original text of this message