Re: NULLs: theoretical problems?

From: Hugo Kornelis <hugo_at_perFact.REMOVETHIS.info.INVALID>
Date: Sat, 11 Aug 2007 01:52:42 +0200
Message-ID: <fqspb3dqogo0ekhmc35meottdcgeg49v1q_at_4ax.com>


On Thu, 9 Aug 2007 22:25:51 +0100, David Portas wrote:

(snip)
>> I noticed in your blog you said that the table with a null-able birthday
>> column was in 2NF. If saying that is right, I suppose we must be careful
>> to regard functional dependencies as determining values sometimes and
>> non-values other times.
>
>But Hugo is not right. From the Alice Book, p163:
>
>"A relation I over U satisfies X -> Y, if for each pair s, t of tuples in I,
>s{X} = t{X} implies s{Y} = t{Y}."
>
>Since null = null is not true, no attribute with nulls satisfies any FD, not
>even the trivial one {A}->{A}!

Hi David,

I don't know "the Alice Book", but I googled for some definitions of FD in order to try and get a deeper understanding.

The definition at
http://www.cs.jcu.edu.au/Subjects/cp1500/1998/Lecture_Notes/normalisation/fd.html is closest to your quote from the Alice Book. I don't agree with your interpretation of this definition - I'll come to that in a minute.

The definitions at http://en.wikipedia.org/wiki/Functional_dependency and http://www.utexas.edu/its/windows/database/datamodeling/rm/rm7.html both state that each value of the determinant has to be associated with PRECISELY one value of the dependent attribute. Since NULL is not a value, this would back your claim that there can not be any FDs in a table that allows NULLs. However, I believe that the authors either forgot about NULL, or are in the same camp as the authors of the ANSI standard, who systematically call NULL "the null value".

Finally, the definition at http://cs.wwc.edu/~aabyan/415/FunDep.html and http://databases.about.com/cs/specificproducts/g/functdep.htm state that X -> Y if the values in X uniquely determine the values in Y; this is how I have always interpreted FDs. NULLs in Y do not break down the FD (though NULLs in X do, obviously). Note that this is similar to functions in arithmetics f(x) = 1/x is still a function, even though it has no value for x=0; f(x) = SQRT(x) is not a function because it has two values for each x > 0.

Back to Alice's def, and your interpretation of it. Please correct me if I'm wrong - as far as I know, s(X), s(Y), t(X), and t(Y) are themselves tuples, based on either subset X or subset Y of the columns in relation I. Is that correct?

On first glance, it appears logical to say that s(Y) can't be equal to t(Y) if NULLs are involved. But let's not forget that we're talking about tuples here, not about values. Even if one of the columns in X is NULL - or even if all columns in X are NULL! -, that doesn't make the tuple itself NULL. AFAIK, the tuple can't even be NULL, since NULL is defined only for datatypes that can be stored in a single cell in a relational table; tuples can't.

So s(Y) can't be NULL, though it can be {'ab', NULL, 3} or {NULL, NULL}, or even just {NULL} if Y happens to be a single column. But even in that case, we're not comparing NULL to NULL, but {NULL} to {NULL}. And *that* comparison results in true, not in unknown - just compare it to the other operators in SQL that deal with comparison of whole tuples as opposed to single values (UNIQUE constraints, DISTINCT, GROUP BY).

>Furthermore, no relation with nulls satisfies any join dependency since join
>dependency requires a natural join and a natural join involving nulls will
>exclude some tuples. Chris Date this time:
>
>"Let A1, A2,., An be subsets of the heading of relvar r. Then r satisfies
>the join dependency (JD) *{A1,A2,.,An} if and only if every relation that's
>a legal value for R is equal to the join of its projections on A1, A2,.,
>An."
>
>Naturally enough it follows that any relation hypothetically containing null
>values ought to be decomposed so as to eliminate them, ie. 5NF.

IIRC, you've sent me a similar argument in an email, some time ago, to which I have failed to reply (sorry, David...)

I've tried to understand what you're saying here, but there's too much jargon (both in terms and in notation) involved for me. Most of what I've learned comes from practice, and what theory I did learn was for the most part in Dutch. Is there any chance that you repeat this argument in layman's terms for me, so that I have at least a chance of understanding it?

Best, Hugo Received on Sat Aug 11 2007 - 01:52:42 CEST

Original text of this message