Re: How to cope with missing values - NULLS?

From: Peter Koch Larsen <pkl_at_mailme.dk>
Date: 18 Apr 2003 15:45:11 -0700
Message-ID: <61c84197.0304181445.46842baa_at_posting.google.com>


Costin Cozianu <c_cozianu_at_hotmail.com> wrote in message news:<b7paca$3b0o0$1_at_ID-152540.news.dfncis.de>...
> Peter Koch Larsen wrote:
> > How to cope with missing values - NULLS?
> > The to me fundamental problem is how to cope with missing values. I
> > agree that You have no need for them in base tables, but what happens
> > in the situation where you have an outer join? If outer joins are to
> > be provided by the DBMS - and I believe they are to useful to be let
> > out - there must be some means of denoting that a field has no value.
> > There are three ways to go:
> >
> > 1) Use a NULLABLE type. This could be an extension like the sum-type
> > mentioned (i have not read that thread yet so i can not comment). I
> > see no problems with that approach.
> > 2) Use a DEFAULT value. This could be feasible in some situations, but
> > in many situations it would not. The most obvious example would be if
> > the field in question is boolean. Which default could you possible
> > use?
> > 3) Use a marker that existed independently of the field. This approach
> > has so many flaws that I believe it to be infeasible. For one thing,
> > what value should we store in that field? If we have ADT's we must be
> > careful as the ADT might have some constraints, that if not set will
> > invalidate the type - a CIRCLE could as an example be required a
> > non-negative radius. Even non-ADT types might have some integrity
> > constraints: think about a floating point number stored in the IEEE
> > format.
> >
> > My personal conclusion is that we are stuck with NULLS - if not in the
> > SQL sense then at least in the sense that we are to have some kind of
> > sum type. And if we are stuck with such a type, then the only sensible
> > thing is to have it standardised by the DBMS - how else are we going
> > to let the DBMS perform the outer join by itself?
> >
> > Kind regards
> > Peter
>
>
> I think you are rushing to conclusions.
>
> If we can use sum types for individual columns, we surely can use sum
> types for tuple types, so here we have your OUTER JOIN without any NULL.
>
> Besides, coming from a relational theory perspective, I'm not sure if we
> should regard outer joins as a relation over the sum type of the
> corresponding tuples (no NULLs involved), or just a collection of relations.
>
> From an host language API point of view, I'd rather prefer the later.
>
> Another alternative is to regard the tuple type for the outer join as
>
> col1 * col2 * ... colk * restOfTheJoin
>
> where col1 to colk are the common columns and restOfTheJoin is the outer
> part of the outer join and has a sum type
> Nothing | col(k+1) * ... coln
>
> Best regards,
> Costin Cozianu

I am not sure I understand you completely. Let us say that we have defined a view as an outer join: columns col1 and col2 are "common, but col3 and col4 might be missing. Now let's query that view - we are only interested in tuples, where col4 is larger than 10: SQL gives us something like

   SELECT * FROM ojview WHERE col4 > 10

How would you phrase such a query in your (hypothetical) relational system?
The point I'm getting at is that it - from the users point of view - would be nice to be able to formulate a query in a style where she can refer to col4 directly. A sum type on the tuple as a whole would possibly complicate matters? As an example I imagine a query in the style of:

   SELECT * FROM ojview WHERE {is_extended} AND col4 > 10

Where {is_extended} somehow determines that col4 exists. If it is something like this You have in mind, this would have consequences not only for the user but also for the optimizer: In what order should it evaluate the restricting conditions? I am much in doubt if removing NULLs in this situation would be feasible.

Kind regards
Peter Koch Larsen Received on Sat Apr 19 2003 - 00:45:11 CEST

Original text of this message