Re: NULLs

From: David BL <davidbl_at_iinet.net.au>
Date: Fri, 28 Dec 2007 17:59:13 -0800 (PST)
Message-ID: <072e9722-eb81-42a4-8dba-bb115ca7944d_at_s19g2000prg.googlegroups.com>


On Dec 29, 7:59 am, Hugo Kornelis
<h..._at_perFact.REMOVETHIS.info.INVALID> wrote:
> On Thu, 27 Dec 2007 16:41:51 -0800 (PST), JOG wrote:
> >On Dec 27, 11:31 pm, Hugo Kornelis
> ><h..._at_perFact.REMOVETHIS.info.INVALID> wrote:
> >> On Wed, 26 Dec 2007 23:17:52 -0400, Bob Badour wrote:
> >> > Codd pointed out that a single NULL marker did not suffice and
> >> >suggested 2 markers. Date pointed out that one can apply the same
> >> >argument to 2 markers leading to an infinite progression once one heads
> >> >down that path, which suggests the path was never a productive one to
> >> >head down in the first place.
>
> >> Hi Bob,
>
> >> Unfortunately, both Codd and Date forgot that the NULL marker in (for
> >> instance) the age column should represent only that the age is not on
> >> file and not try to represent a reason for this as well.
>
> >> Codd's suggestions to use two markers (for "not applicable" and
> >> "unknown", IIRC) assumes that we want to store both the age of a person
> >> (if on file), and the reason why an age is not on file (if it isn't).
> >> That can of course be necessary - but in that case, we have two
> >> attributes that should be stored in two seperate columns.
>
> >> Attempting to store both the age and the reason why an age is unknown in
> >> a single column violates first normal form.
>
> >Is the Zaniolo approach you are favouring not doing exactly the same
> >thing? Surely it is attempting to store both an age and the fact that
> >an age is not on file in a single column, which also violates first
> >normal form.
>
> Hi J,
>
> That would be the case if I were intentionally storing the fact that an
> age is not on file, and using (or rather: abusing) the age column for
> that. But I'm not. I'm just storing ages I have on file.
>
> Yes, if there is no value (as represented by the NULL marker) in this
> column in a specific row, then you can infer from this that I have no
> age on file for the person described by that row - but this is an
> unintended though unavoidable by-effect. Just as the fact that you can
> infer "Jack is an adult" from "Jack is 43 years old" - surely you
> wouldn't use that to maintain that storing an age violated 1NF?
>
> > It still appears to be a hack imo, and one still ends up
> >with 3VL.
>
> 3VL is not a result of using NULL to represent missing information, but
> a result of allowing missing information. IMO, there are only two
> options: either you deal with missing information, and with the 3VL that
> results from it -- or you somehow alter reality so that information is
> never missing again, for any reason.
>
> > There has to be a more elegant way....Regards, J.
>
> I doubt it. Given this information:
>
> * "Employee Jack is male"
> * "Employee Mary is female"
> * "Employee Jack is 43 years old"
> * "Employee JJ is 32 years old"
>
> how would you answer the below questions:
>
> * "List all employees aged 40 and above".
> * "What is the average age of our employees?"
> * "For each employee, how many years left until retirement (assume a
> country with laws for retirement age of 65)"
> * "List all employees that are female, under 35 years old, or both"
> * "Is JJ older than Mary?"
> * etc
>
> Try to describe how you would answer those questions, in an elegant way
> and without using anything even remotely resembling 3VL. If you succeed
> at that, a database without 3VL is just around the corner - because the
> hardest part is not implementing, but finding out what to implement.
>
> I can give answers to all questions above, but I have to use 3VL in all
> cases. For me, that signifies that 3VL is part of reality and hence
> can't be left out of a database that attempts to model (aspects of)
> reality.

In the presence of missing information the query

    "List all employees aged 40 and above"

cannot be answered at all. Instead the query should be

    "List all employees known by the HR department to be aged 40 and above"

When intensional definitions of predicates are stated correctly there is no need for 3VL, because the extension matches the intension in the same way as for when there is no missing information. The RM/RA can apply in the normal way - an algebra on the extensions that makes no attempt to formalise the intensions. A formula in the RA corresponds to a derived predicate and it is possible to write down its intensional definition in terms of the base relvar intensions. For example, the above query can be obtained by selection then projection on the set of employees with known ages. There is no 3VL required.

I think 3VL invites us to think of a recorded extension as only being a subset of the associated intension, and that is a bad idea. Predicate definitions should always be "if and only if". It is only by being careful with intensional definitions that we correctly interpret the results from a given query. Furthermore it is very efficient compared to computation of "maybes" to yield more "maybes" in the result set. I would rather keep the RM/RA simple and efficient by only calculating what's known in the confines of 2VL (instead of what's possible). Computation of maybes can be computationally expensive and provide very little information to the user.

IMO we keep "if and only if" in the intension <--> extension association, and find one way implications pop up elsewhere. For example, the set of employees with a known age is a subset of the known employees. This falls out naturally when we correctly interpret projections.

In the case of the question

    "Is JJ older than Mary?"

This should be

    "Is it known to HR department that JJ is older than Mary?"

in which case the answer is no. See, no 3VL required.

We can also ask whether it is known to HR department that JJ is not older than Mary". The answer is no as well; so we have been able to tell (using only efficient 2VL) that the DB doesn't know who it older. Received on Sat Dec 29 2007 - 02:59:13 CET

Original text of this message