Re: How to cope with missing values - NULLS?
Date: Sat, 19 Apr 2003 12:50:48 -0700
Message-ID: <b7s926$42jf5$1_at_ID-152540.news.dfncis.de>
Peter Koch Larsen wrote:
> Costin Cozianu <c_cozianu_at_hotmail.com> wrote in message news:<b7paca$3b0o0$1_at_ID-152540.news.dfncis.de>...
>
>>Peter Koch Larsen wrote:
>>
>>>How to cope with missing values - NULLS?
>>>The to me fundamental problem is how to cope with missing values. I
>>>agree that You have no need for them in base tables, but what happens
>>>in the situation where you have an outer join? If outer joins are to
>>>be provided by the DBMS - and I believe they are to useful to be let
>>>out - there must be some means of denoting that a field has no value.
>>>There are three ways to go:
>>>
>>>1) Use a NULLABLE type. This could be an extension like the sum-type
>>>mentioned (i have not read that thread yet so i can not comment). I
>>>see no problems with that approach.
>>>2) Use a DEFAULT value. This could be feasible in some situations, but
>>>in many situations it would not. The most obvious example would be if
>>>the field in question is boolean. Which default could you possible
>>>use?
>>>3) Use a marker that existed independently of the field. This approach
>>>has so many flaws that I believe it to be infeasible. For one thing,
>>>what value should we store in that field? If we have ADT's we must be
>>>careful as the ADT might have some constraints, that if not set will
>>>invalidate the type - a CIRCLE could as an example be required a
>>>non-negative radius. Even non-ADT types might have some integrity
>>>constraints: think about a floating point number stored in the IEEE
>>>format.
>>>
>>>My personal conclusion is that we are stuck with NULLS - if not in the
>>>SQL sense then at least in the sense that we are to have some kind of
>>>sum type. And if we are stuck with such a type, then the only sensible
>>>thing is to have it standardised by the DBMS - how else are we going
>>>to let the DBMS perform the outer join by itself?
>>>
>>>Kind regards
>>>Peter
>>
>>
>>I think you are rushing to conclusions.
>>
>>If we can use sum types for individual columns, we surely can use sum
>>types for tuple types, so here we have your OUTER JOIN without any NULL.
>>
>>Besides, coming from a relational theory perspective, I'm not sure if we
>> should regard outer joins as a relation over the sum type of the
>>corresponding tuples (no NULLs involved), or just a collection of relations.
>>
>> From an host language API point of view, I'd rather prefer the later.
>>
>>Another alternative is to regard the tuple type for the outer join as
>>
>> col1 * col2 * ... colk * restOfTheJoin
>>
>>where col1 to colk are the common columns and restOfTheJoin is the outer
>>part of the outer join and has a sum type
>> Nothing | col(k+1) * ... coln
>>
>>Best regards,
>>Costin Cozianu
> > > I am not sure I understand you completely. Let us say that we have > defined a view as an outer join: columns col1 and col2 are "common, > but col3 and col4 might be missing. Now let's query that view - we are > only interested in tuples, where col4 is larger than 10: SQL gives us > something like > > SELECT * FROM ojview WHERE col4 > 10 > > How would you phrase such a query in your (hypothetical) relational > system? > The point I'm getting at is that it - from the users point of view - > would be nice to be able to formulate a query in a style where she can > refer to col4 directly. A sum type on the tuple as a whole would > possibly complicate matters? As an example I imagine a query in the > style of: > > SELECT * FROM ojview WHERE {is_extended} AND col4 > 10 > > Where {is_extended} somehow determines that col4 exists. If it is > something like this You have in mind, this would have consequences not > only for the user but also for the optimizer: In what order should it > evaluate the restricting conditions? I am much in doubt if removing > NULLs in this situation would be feasible. > > Kind regards > Peter Koch Larsen
First of all, I think that optimizers will work much better and more important, *correctly* in a 2 valued logic.
With regards to the above query, I'd rather propose a syntax similar
with actual languages that use sum types
SELECT ... WHERE
MATCH restOfTheRecord WITH
Empty -> false
| Nonempty r -> (r.col4 > 10)
Try to SELECT WHERE col4 < 10 OR col4v >= 10
What SQL does with NULL is that it enforces a relatively broken default
where users really need the power and flexibility and logic to deal with
on a case by case basis semantically.
NULL is currently use to represent missing information, NOT APPLICABLE,
or other ad-hoc tricks like INFINITY for dates (as per Joe Celko's
recent examples). It is obvious that a broken three valued logic cannot
match all these semantics with default rules.
Let's take for example Joe Celko's representation of date time intervals:
(begin_date DATE, end_date DATE)
where NULL in the begin_date "means" from "beginning of time" and
end_date NULL means "to the end of time". With sum types the user can
easily define a proper data type for this information, and override the
default "<" order operator to properly account for the order relation.
What is even better, we can easily define begin_date INCLUSIVE/EXCLUSIVE:
type TIME_POINT=
And now all the conditions related to start_date, end_date using
TIME_POINT can be easily and logically expressed in terms of the '<'
operator.
We can also imagine that for example end_date may come from a join, with
the semantic that if no end_date is specified in the related table, then
we mean EWN_OF_TIME. Then it is easy for the administrator to create an
outer join where instead of NULL we'll have the corresponding
END_OF_TIME.
BEGIN_OF_TIME
| INCLUSIVE of DATE
| EXCLUSIVE of DATE
| END_OF_TIME
function '<' (point1, point2) = match point1, point2 with
BEGIN_OF_TIME, BEGIN_OF_TIME -> false
| BEGIN_OF_TIME , _ -> true
| _ , BEGIN_OF_TIME -> false
... equivalent rules for
| INCLUSIVE d1, EXCLUSIVE d2 -> d1 <= d2
| EXLCUSIVE d1, EXCLUSIVE d2 -> d1 < d2
| INCLUSIVE d1, INCLUSIVE d2 -> d1 < d2
| EXCLUSIVE d1, INCLUSIVE d2 -> d1 > d2
best regards,
Costin Cozianu
Received on Sat Apr 19 2003 - 21:50:48 CEST