Re: 3vl 2vl and NULL
Date: Fri, 16 Dec 2005 17:17:08 +0100
Message-ID: <MPG.1e0d06ec4c1f130a989752_at_news.ntnu.no>
In article <2jp3q1dtipgsm9vh54vvhocnin3vpp43eg_at_4ax.com>,
hugo_at_pe_NO_rFact.in_SPAM_fo says...
>
> And even if you remodel the DB to get rid of the "inapplicable NULLs",
> you'll have them right back when you start joining tables for your
> reports.
Not necessarily. You do have a choice of how to produce your reports.
> Anyway, inapplicable was just an example. There can be many more reasons
> for a NULL.
Exactly. Which makes the one-size-fits-all approach unsuitable. I have e.g. seen models using NULL for infinity/end-of-time, and in those cases, the operators behave wrongly. There cannot be consistent treatment for NULLs unless users use NULLs consistently.
> >> To avoid that misunderstanding as much as possible, I always make the
> >> distinction between the MEANING of NULL ("no value is here") and the
> >> RESULT of NULL (that the DB doesn't know which value to substitute for a
> >> variable or column name in an expression).
> >
> >If this is the RESULT (with capital letters) of NULL, then it seems
> >obvious that it means "unknown", and not "inapplicable". If it could
> >mean "inapplicable", there wouldn't *be* any value to substitute.
>
> I'll give you a table of values:
> Position | Value1 | Value2
> ----------+--------+--------
> first | 1 | 2
> second | | 7
> third | 3 |
> fourth | 12 | 118
>
> What is the sum of the first and second Value1?
There is apparently no second Value1.
> If the third Value2 less than the third Value1?
There is apparently no second Value1.
I think these answers (or error messages) are better than "I don't
> You, like any database, have only the values. You have no idea what
> real-world entities are described in this table. So you also have no way
> of knowing WHY the second Value1 and the third Value2 are missing. They
> may be unknown, they may be inapplicable, they may be classified, or
> whatever other reasons there might be.
If they are inapplicable, they are not missing: They are not supposed to be there. Do agree that the question "What is the sum of the first and second Value3 (sic)?" should be an error, not return "I don't know"?
IMO, referring to an inapplicable value should be an error. It is not an error to refer to a NULL in SQL; hence, SQL NULLs cannot mean inapplicable. Of course, you can ignore this and use them for 'inapplicable' anyway, but you'll get wrong answers if you're not careful. Just like if you use NULL for positive infinity, and ask for integers smaller than it.
> (Note: when I wrote "mapping to RM", I should have written "mapping to a
> SQL-relational DB" or whatever the politically correct term for
> not-truly-R-implementations-of-RM is).
Please don't call this distinction political correctness. SQL is *not* relational; it allows and produces tables that are not relations by *any* definition.
> Correct: you don't _need_ to. But common everyday practice in relational
> databases is that everybody does it. I've never seen a database with
> seperate table for EmployeeAge, EmployeeName, EmployeeSalary, etc. All
> databases combine these predicates into one table "Employees".
If you just use NULL for 'unknown', that is fine. If you use it for other things as well, I think you are headed for trouble.
> >Based on the reasonable assumption that all your family members have
> >ages, I guess. And in that case, I think it is more correct to interpret
> >the tuples as
> >
> >- My family member Aunt Marge has an age of 47 years.
> >and
> >- There exists an age X so that
> > My family member Uncle Vernon has an age of X.
> >
> >---i.e. that tuples with NULLs have the same interpretation as when the
> >NULLs are projected away.
>
> No, I have to disagree, for several reasons.
>
> First: in NIAM, all facts of the same fact type have to use the same
> pattern for reading the fact. This pattern would be:
>
> - <Family member> has an age of <number> years.
>
> Your reading for Uncle Vernon doesn;t fit in this pattern. *IF* we want
> to store this as a fact, we'll have to use a seperate fact type for it.
Well, in the RM, all the tuples in the same relation have to use the same predicate for reading the fact. Your interpretation breaks this. Which breakage you prefer may be a matter of taste, though. :)
> Second: The reading for Uncle Vernon is just a complicated way to say
>
> - Uncle Vernon has an age.
No, it also says that Uncle Vernon is a family member.
> But if all family members have an age, there's no reason to store this
> fact for individual family members, and hence no reason to include this
> in the model.
See above.
> The transition from NIAM to SQL-Relational is just a change in
> representation. No facts should be added or removed during this
> transition. That's why the corresponding SQL-relational model:
> Name | Age
> -------------+-----
> Uncle Vernon | NULL
> Aunt Marge | 47
>
> means nothing more and nothing less than
> - I have a family member who is uniquely identified (in the UoD of my
> family database) by the name "Aunt Marge";
> - I have a family member who is uniquely identified (in the UoD of my
> family database) by the name "Uncle Vernon".
> - My family member Aunt Marge has an age of 47 years.