Re: Proposal: 6NF

From: Brian Selzer <brian_at_selzer-software.com>
Date: Wed, 11 Oct 2006 10:01:20 GMT
Message-ID: <Qh3Xg.15394$7I1.6347_at_newssvr27.news.prodigy.net>


"David Portas" <REMOVE_BEFORE_REPLYING_dportas_at_acm.org> wrote in message news:1160133403.293728.245040_at_m73g2000cwd.googlegroups.com...
> Brian Selzer wrote:
>>
>> I agree. If the physical models are identical, then the performance
>> would
>> be similar. But common sense argues against the idea that the physical
>> models should be identical. If {i, a, b} and {i, a}, {i, b} are
>> equivalent,
>> then since i, a and b are of the same types, it would take at least 1/3
>> extra storage in separate tables because each i would occur twice.
>>
>
> That is pure speculation. When I said identical I meant precisely that.
> The only difference is in the metadata. There is no reason at all to
> store any values of i twice. Why should there be? If it is the same key
> value of the same type then it seems perfectly sensible to store it
> only once regardless of how many relations it appears in.
>

Regardless of whether you store the actual value or a pointer to a value, the storage required would be at least 1/3 more, because either you have three values or pointers per tuple or four values or pointers per pair of corresponding tuples. There is no speculation involved, just simple mathematics.

> If you want to speculate some more then consider that only two bits of
> metadata may need to be added per tuple, which is presumably exactly
> the same as would be used to support nullable columns.
>

I'm not sure I follow. By two bits of metadata, do you mean two pointers?

>
>> I agree that decomposition into separate relations still has its place.
>> I
>> just don't think that nulls should be dismissed arbitrarily.
>
> I have not dismissed nulls arbitrarily. There are good reasons to
> reject SQL's definition of a null. I'm still waiting for your
> definition. Apparently you don't have one for numerics so dare I ask if
> you have one for boolean? Do you have a null for a graphics image type
> or a date type or a cartesian co-ordinate? How does your null behave
> under the operators for those types? I think you have your work cut out
> here...
>

Sorry for the delay in responding. I think that the root of the problem with SQL's definition of a null is that it relies on the "possible worlds" assumption. From my perspective, that conflicts with the "closed world" assumption. The way I see it, either something is known or it isn't. Now, you can know that something isn't known, and that's where I see a place for null. The main difficulty with the separate relations only approach is that you can't determine whether a value is missing or that it does not apply. All you know is that it is absent. To me it makes more sense to use a nullable attribute for information that is always applicable but may not be known, and a separate relation for information that may be applicable. For information that may be applicable but may not be known, use a separate relation with a nullable attribute. For example, everybody has a date of birth, but it may not be known, so it makes sense to use a nullable attribute for date of birth. On the other hand, only married people have spouses, so it makes sense to record spouses' names in another relation; but the spouse's name may not be known, so it makes sense to use a nullable attribute in the separate relation. I guess my definition of null is a value that indicates that an applicable value has not been supplied.

I think I should clarify what I mean by "it makes sense." By choosing a specific subset of attributes for a relation schema, you're specifying the structure of the information that can be collected. In effect, you're specifying what can become known. In a perfect world, there is no missing information, so every attribute in every tuple in a perfect extension must be a known value (not null). This obviously precludes a design where an attribute applies only some of the time. Thus, a separate relation schema is needed when an attribute does not apply for every tuple (of course, this shouldn't occur if the database schema is normalized). Since universal applicability of each dependent attribute is implied by the choice of a specific subset of attributes, what do you do when the values of a dependent attribute can be missing? If you separate the attribute into another relation, you lose the information that the attribute applies all of the time. Here's where I think null makes sense, because that information is not lost.

There is only one empty set; there should only be one null. The only real difference I can see between them is that a null is always decorated with an attribute name, as are all values within any tuple, whereas the empty set can stand by itself. I still haven't figured out how a null should behave under the operators of a type other than those for sets. It's clear that for numbers, the introduction of the empty set throws a wrench into the distributive property. I briefly considered whether the presence of a null should in effect "nullify" the nearest operation, but I'm not sure if that would make any difference. (If you transform an expression into postfix form, and then during evaluation, eliminate the nulls and any unary operation where the operand is null or any binary operation where either operand is null....)

> --
> David Portas
>
Received on Wed Oct 11 2006 - 12:01:20 CEST

Original text of this message