Re: How to cope with missing values - NULLS?
Date: 22 Apr 2003 09:18:25 -0700
Message-ID: <61c84197.0304220818.3050fbdf_at_posting.google.com>
"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:<b82sro$1vfe$1_at_gazette.almaden.ibm.com>...
> "Peter Koch Larsen" <pkl_at_mailme.dk> wrote in message
> news:61c84197.0304210717.5005e971_at_posting.google.com...
> > "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
> news:<b7ur3i$2abi$1_at_gazette.almaden.ibm.com>...
> > > "Peter Koch Larsen" <pkl_at_mailme.dk> wrote in message
> > > news:61c84197.0304181445.46842baa_at_posting.google.com...
> > > > [snip] Let us say that we have
> > > > defined a view as an outer join: columns col1 and col2 are "common,
> > > > but col3 and col4 might be missing. Now let's query that view - we are
> > > > only interested in tuples, where col4 is larger than 10: SQL gives us
> > > > something like
> > > >
> > > > SELECT * FROM ojview WHERE col4 > 10
> > > >
> > > > How would you phrase such a query in your (hypothetical) relational
> > > > system?
> > >
> > > Well col4 would not be of type say INTEGER, but of type say INTEGERuUNK
> and I
> > > would not expect the ">" operator to be defined for type INTEGERuUNK.
> >
> > Okay. But if you do so, are you not simply reintroducing the null-type
> > through the back-door, giving it a new name and some slightly new
> > semantics?
>
> It's not a matter of 'slightly new semantics', there are logical differences
> here and as you know, logical differences are BIG differences. However I will
> agree with you that what I suggested does have some superficial similar with
> NULLs. The big difference though, is that my UNK is a value, while a SQL NULL
> is not a value. This means that UNK = UNK, unlike NULLs that don't equal each
> other.
I am not sure I see the logical difference - except for the difference
mentioned by you: that a UNK is a value whereas a SQL NULL is not. I
do not know the significance of this:
If the user could define its own types (and operators on that type),
it would be a simple matter to define a type where the comparison
operator did return false if both "present" indicators were missing.
I am talking about a generic type UNK defined perhaps as (using pseudo-pascal - i do not really know what language to know apart from C++ which I shall refrain from using here):
TYPE(GENERIC T) UNK =
RECORD
bool present; T value;
END; OPERATOR = (UNK(T) left, UNK(T) right): BOOL
if NOT left.present OR NOT right.present THEN RETURN FALSE;
RETURN left.value = right.value;
END;
This is the SQL version of the operator, while your operator would
look like:
OPERATOR = (UNK(T) left, UNK(T) right): BOOL
if NOT left.present OR NOT right.present THEN
RETURN left.present = right.present;
RETURN left.value = right.value;
END;
There is not much of a difference the way I look at it and it really
bugs down to how you feel the "=" operator should work.
One of my thoughts have been that it might be a good idea to introduce
an equivalence operator (we could call it "~") behaving the same way
as your operator does (UNK ~UNK = TRUE); I've never heard this idea
proposed anywhere else.
>
> BTW whenever I use outer joins in SQL (which is very often), I always COALESCE
> away the NULLS. Sometimes a value from the 'missing' column's datatype
> captures the meaning of an unmatched row, typical values being FALSE and 0.
> Othertimes I create a union type as best as SQL can allow. E.g. a union of
> Boolean and NOTYETOCCURED say.
If there is a reasonable COALESCE, this seems like a good idea, but
your union type seems somewhat like a hack (which is not said to
demean you - we ARE talking SQL ;-). But do you really believe that
COALESCING and UNIONs are the real approach?
>
> Outer joins are not a good argument for NULLs.
But an UNK is?
>
> Regards
> Paul Vernon
> Business Intelligence, IBM Global Services
Received on Tue Apr 22 2003 - 18:18:25 CEST