Re: Sixth normal form

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 20 Aug 2007 08:16:52 GMT
Message-ID: <U5cyi.46995$Um6.17315_at_newssvr12.news.prodigy.net>


"Jan Hidders" <hidders_at_gmail.com> wrote in message news:1187520309.177299.208460_at_57g2000hsv.googlegroups.com...

> On 18 aug, 09:26, "Brian Selzer" <br..._at_selzer-software.com> wrote:

>> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>>
>> news:1187391299.353682.322830_at_w3g2000hsg.googlegroups.com...
>>
>>
>>
>> > On 17 aug, 19:15, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>>
>> >> [... big snip ...]
>>
>> >> If the goal is a database schema that can represent exactly the same
>> >> information content, then the cyclical interrelational constraint is
>> >> required; if the goal is a schema that can represent additional
>> >> information
>> >> without contradicting the closure of the set of FDs and INDs for all
>> >> schemata that are equivalent to the less normalized schema, then the
>> >> cyclical interrelational constraint is not always required, except, of
>> >> course, when moving from 5NF to 6NF.
>>
>> > *sigh* You already said this, and I already explained that under the
>> > usual definitions of those terms they are *never* required, including
>> > when going from 5NF to 6NF. You replied that you are using other
>> > definitons but apart from some informal examples you never gave a good
>> > definition nor a good motivation why that should be the definition. I
>> > think the onus is on you here to show why you want to depart from
>> > rather well-established terminology.
>>
>> It all boils down to the domain closure assumption, which states that the
>> only individuals that exist are represented by values in the body of the
>> database, and the identity relation, =, which guarantees that no matter
>> how
>> many times a value appears, there is only one individual represented by
>> that
>> value. If you have a database schema consisting of a single relation
>> schema
>> that satisfies the functional dependency A --> B, then due to the domain
>> closure assumption, the existence of an individual that is represented by
>> a
>> value for A depends upon the existence of a specific individual that is
>> represented by a value for B. So if the values a1 and b1, for A and B
>> respectively, appear in the same tuple, then a denial of the existence
>> of
>> the individual represented by b1 denies the existence of the individual
>> represented by a1, but a denial of the existence of the individual
>> represented by a1 does not necessarily deny the existence of the
>> individual
>> represented by b1, since there could be another tuple that has the values
>> a2
>> and b1.
>>
>> Now suppose that the relation schema also satisfies the functional
>> dependency B --> C. Then if the values a1, b1 and c1, for A, B and C
>> respectively, appear in the same tuple, then a denial of the existence of
>> the individual represented by c1 denies the existence the individual
>> represented by b1 and transitively the existence of the individual
>> represented by a1. When the relation schema is decomposed into a family
>> of
>> relation schemata such that A and B appear in one relation schema and B
>> and
>> C appear in another, then the denial of the existence of the individual
>> represented by c1 no longer denies the existence of the individuals
>> represented by b1 and a1. This is the problem. This is why I think that
>> an
>> inclusion dependency is required.
>
> I am quite impressed. How can you make something so trivial sound so
> incredibly complicated? :-) Of course, if during normalization you
> split R(A,B,C) into R1(A,B) and R2(A,C) and don't add any inclusion
> dependencies you can have C's without associated B's and vice versa.
> If that is a problem, then add the INDs. That is basically all that
> you have shown in the above. But your claim was that this is (almost?)
> always a problem, and for that you have not provided any supporting
> argumentation at all.
>

Perhaps it is because this is something that appears trivial since the solution is so intuitively obvious that it is so incredibly complicated to formulate an argument. :-) The above was an attempt (apparently inadequate) to identify the root cause of the problem and to show that it can serve as a logical basis for determining when there should be an inclusion dependency (but not necessarily when there should not).

> Note btw. that your argument is symmetric so if you indeed don't want
> to change the nature of the relationships you need INDs in both
> directions.
>

I don't think it is, exactly. For a functional dependency X --> Y, there can be more than one value for X that determines a particular value for Y. So if x1 determines y1 then the statement, "x1 cannot appear in the relation unless y1 also appears." is true, but the statement "y1 cannot appear in the relation unless x1 also appears." is false. So since the first statement is true, it should remain true for the family of relations; hence the need for an IND in that direction. Since the second statement is already false there's no need for an IND in the other direction in order to ensure that it remains false.

I agree that if you want to preserve the fact the function from X to Y is onto, then the IND in the other direction would be necessary.

> -- Jan Hidders
> Received on Mon Aug 20 2007 - 10:16:52 CEST

Original text of this message