Re: 3vl 2vl and NULL

From: Hugo Kornelis <hugo_at_pe_NO_rFact.in_SPAM_fo>
Date: Thu, 15 Dec 2005 23:32:07 +0100
Message-ID: <2jp3q1dtipgsm9vh54vvhocnin3vpp43eg_at_4ax.com>


On Thu, 15 Dec 2005 10:18:00 +0100, Jon Heggland wrote:

(snip)
>> Then they find examples
>> where there's other reasons to use NULL (inapplicable, e.g.), which
>> leads to the discussion that there should be two, three, or whatever
>> number of NULLs to denote the various reasons why data can be missing.
>
>I thought it was commonly accepted that "inapplicable NULLs" are an
>artifact of wrong database design. Inapplicable data are *not* missing.

Hi Jon,

What David says.

And even if you remodel the DB to get rid of the "inapplicable NULLs", you'll have them right back when you start joining tables for your reports.

Anyway, inapplicable was just an example. There can be many more reasons for a NULL.

(snip)
>> To avoid that misunderstanding as much as possible, I always make the
>> distinction between the MEANING of NULL ("no value is here") and the
>> RESULT of NULL (that the DB doesn't know which value to substitute for a
>> variable or column name in an expression).
>
>If this is the RESULT (with capital letters) of NULL, then it seems
>obvious that it means "unknown", and not "inapplicable". If it could
>mean "inapplicable", there wouldn't *be* any value to substitute.

I'll give you a table of values:

  Position | Value1 | Value2
 ----------+--------+--------

  first    |      1 |      2
  second   |        |      7
  third    |      3 |
  fourth   |     12 |    118

What is the sum of the first and second Value1? If the third Value2 less than the third Value1?

You, like any database, have only the values. You have no idea what real-world entities are described in this table. So you also have no way of knowing WHY the second Value1 and the third Value2 are missing. They may be unknown, they may be inapplicable, they may be classified, or whatever other reasons there might be.

Given this limitation of databases, the only way to ensure that NULL as a result of "inapplicable" is treated different from NULL as a result of "unknown" is to somehow materialize the reason for the missing data in your database. As long as you're only interested in the "inapplicable" vs "applicable" distinction, a schema change can handle this. If you need finer control, then there's no way to do it without adding an extra column to store the reason for the missing data.

(snip)
>> In ORM/NIAM, each predicate has it's own "table". As a result, there are
>> no NULLS in ORM/NIAM. For the example above, there simply would be no
>> fact at all for Uncle Vernon in the fact table for a person's age.
>
>I'd say that this is how the RM works too. In my opinion, ORM/NIAM is a
>graphical representation of the RM.

At least NIAM is more. You could call IGD (the graphical representation used in NIAM) a graphical representation of the RM, but IGD is only a part of NIAM. The M in NIAM stands for Method - and the strength of NIAM is that it has a complete set of rules that, when followed rigorously, will minimize the chance of error in the development process. But let's not get distracted too much <g>.

>
>> When mapping ORM/NIAM to RM, a fixed set of rules dictates which fact
>> types are combined. The resulting tables combine the data from the
>> combined fact tables.
>
>But you don't need to combine fact types.

(Note: when I wrote "mapping to RM", I should have written "mapping to a SQL-relational DB" or whatever the politically correct term for not-truly-R-implementations-of-RM is).

Correct: you don't _need_ to. But common everyday practice in relational databases is that everybody does it. I've never seen a database with seperate table for EmployeeAge, EmployeeName, EmployeeSalary, etc. All databases combine these predicates into one table "Employees".

> It's just as much RM (more, in
>fact, since you avoid NULLs:) if you don't. Do the rules ever produce
>NULLs that don't mean "unknown"?

If all the modeling rules are followed, subtypes should have been introduced in all situations where fact types/predicates might be inapplicable. The rules for mapping to SQL-relational leave you with a choice: use a seperate table for each subtype (in that case, there'll never be "inapplicable" NULLs), use one table for the supertype and collapse all subtypes in it (in that case, there WILL be "inapplicable" NULLs), or use a hybrid of the two extremes.

>> In the readings above, I assumed that the tuples (Aunt Marge, 47) and
>> (Uncle Vernon, NULL) are the result of combining a unary fact type that
>> enumerates the people in my family with a binary fact type that holds
>> the age for members of my family.
>
>Based on the reasonable assumption that all your family members have
>ages, I guess. And in that case, I think it is more correct to interpret
>the tuples as
>
>- My family member Aunt Marge has an age of 47 years.
>and
>- There exists an age X so that
> My family member Uncle Vernon has an age of X.
>
>---i.e. that tuples with NULLs have the same interpretation as when the
>NULLs are projected away.

No, I have to disagree, for several reasons.

First: in NIAM, all facts of the same fact type have to use the same pattern for reading the fact. This pattern would be:

  • <Family member> has an age of <number> years.

Your reading for Uncle Vernon doesn;t fit in this pattern. *IF* we want to store this as a fact, we'll have to use a seperate fact type for it.

Second: The reading for Uncle Vernon is just a complicated way to say

  • Uncle Vernon has an age.

But if all family members have an age, there's no reason to store this fact for individual family members, and hence no reason to include this in the model. (If not all family members have an age, then this fact does have to be modeled and hence stored - but as a different fact type).

Third: The corresponding fact tables in my NIAM model would look roughly like this (I have to work around the limitations imposed by ASCII here; the real model is a drawing):

FAMILY MEMBERS: 'I have a family member who is uniquely identified (in the UoD of my family database) by the name "<Name>".'

          Name
          ------------
          Uncle Vernon
          Aunt Marge

AGE OF FAMILIY MEMBER: '<Family member> has an age of <number> years.'
          Family member | number
          --------------+-------
          Aunt Marge    | 47

The transition from NIAM to SQL-Relational is just a change in representation. No facts should be added or removed during this transition. That's why the corresponding SQL-relational model:

          Name         | Age
          -------------+-----
          Uncle Vernon | NULL
          Aunt Marge   | 47

means nothing more and nothing less than - I have a family member who is uniquely identified (in the UoD of my family database) by the name "Aunt Marge"; - I have a family member who is uniquely identified (in the UoD of my family database) by the name "Uncle Vernon". - My family member Aunt Marge has an age of 47 years.

Best, Hugo

-- 

(Remove _NO_ and _SPAM_ to get my e-mail address)
Received on Thu Dec 15 2005 - 23:32:07 CET

Original text of this message