Re: NULLs

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Fri, 28 Dec 2007 23:59:50 +0100
Message-ID: <g2van3h12ilt0m8s0q1hv47sceei7e386r_at_4ax.com>


On Thu, 27 Dec 2007 16:41:51 -0800 (PST), JOG wrote:

>On Dec 27, 11:31 pm, Hugo Kornelis
><h..._at_perFact.REMOVETHIS.info.INVALID> wrote:
>> On Wed, 26 Dec 2007 23:17:52 -0400, Bob Badour wrote:
>> > Codd pointed out that a single NULL marker did not suffice and
>> >suggested 2 markers. Date pointed out that one can apply the same
>> >argument to 2 markers leading to an infinite progression once one heads
>> >down that path, which suggests the path was never a productive one to
>> >head down in the first place.
>>
>> Hi Bob,
>>
>> Unfortunately, both Codd and Date forgot that the NULL marker in (for
>> instance) the age column should represent only that the age is not on
>> file and not try to represent a reason for this as well.
>>
>> Codd's suggestions to use two markers (for "not applicable" and
>> "unknown", IIRC) assumes that we want to store both the age of a person
>> (if on file), and the reason why an age is not on file (if it isn't).
>> That can of course be necessary - but in that case, we have two
>> attributes that should be stored in two seperate columns.
>>
>> Attempting to store both the age and the reason why an age is unknown in
>> a single column violates first normal form.
>
>Is the Zaniolo approach you are favouring not doing exactly the same
>thing? Surely it is attempting to store both an age and the fact that
>an age is not on file in a single column, which also violates first
>normal form.

Hi J,

That would be the case if I were intentionally storing the fact that an age is not on file, and using (or rather: abusing) the age column for that. But I'm not. I'm just storing ages I have on file.

Yes, if there is no value (as represented by the NULL marker) in this column in a specific row, then you can infer from this that I have no age on file for the person described by that row - but this is an unintended though unavoidable by-effect. Just as the fact that you can infer "Jack is an adult" from "Jack is 43 years old" - surely you wouldn't use that to maintain that storing an age violated 1NF?

> It still appears to be a hack imo, and one still ends up
>with 3VL.

3VL is not a result of using NULL to represent missing information, but a result of allowing missing information. IMO, there are only two options: either you deal with missing information, and with the 3VL that results from it -- or you somehow alter reality so that information is never missing again, for any reason.

> There has to be a more elegant way....Regards, J.

I doubt it. Given this information:

  • "Employee Jack is male"
  • "Employee Mary is female"
  • "Employee Jack is 43 years old"
  • "Employee JJ is 32 years old"

how would you answer the below questions:

  • "List all employees aged 40 and above".
  • "What is the average age of our employees?"
  • "For each employee, how many years left until retirement (assume a country with laws for retirement age of 65)"
  • "List all employees that are female, under 35 years old, or both"
  • "Is JJ older than Mary?"
  • etc

Try to describe how you would answer those questions, in an elegant way and without using anything even remotely resembling 3VL. If you succeed at that, a database without 3VL is just around the corner - because the hardest part is not implementing, but finding out what to implement.

I can give answers to all questions above, but I have to use 3VL in all cases. For me, that signifies that 3VL is part of reality and hence can't be left out of a database that attempts to model (aspects of) reality.

Best, Hugo Received on Fri Dec 28 2007 - 23:59:50 CET

Original text of this message