Re: Sixth normal form

From: Brian Selzer <brian_at_selzer-software.com>
Date: Mon, 20 Aug 2007 14:32:49 GMT
Message-ID: <lChyi.9370$3x.342_at_newssvr25.news.prodigy.net>


"Jan Hidders" <hidders_at_gmail.com> wrote in message news:1187603947.341232.24570_at_g4g2000hsf.googlegroups.com...

> On 20 aug, 10:16, "Brian Selzer" <br..._at_selzer-software.com> wrote:

>> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>>
>> news:1187520309.177299.208460_at_57g2000hsv.googlegroups.com...
>>
>>
>>
>> > On 18 aug, 09:26, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>> >> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>>
>> >>news:1187391299.353682.322830_at_w3g2000hsg.googlegroups.com...
>>
>> >> > On 17 aug, 19:15, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>>
>> >> >> [... big snip ...]
>>
>> >> >> If the goal is a database schema that can represent exactly the
>> >> >> same
>> >> >> information content, then the cyclical interrelational constraint
>> >> >> is
>> >> >> required; if the goal is a schema that can represent additional
>> >> >> information
>> >> >> without contradicting the closure of the set of FDs and INDs for
>> >> >> all
>> >> >> schemata that are equivalent to the less normalized schema, then
>> >> >> the
>> >> >> cyclical interrelational constraint is not always required, except,
>> >> >> of
>> >> >> course, when moving from 5NF to 6NF.
>>
>> >> > *sigh* You already said this, and I already explained that under the
>> >> > usual definitions of those terms they are *never* required,
>> >> > including
>> >> > when going from 5NF to 6NF. You replied that you are using other
>> >> > definitons but apart from some informal examples you never gave a
>> >> > good
>> >> > definition nor a good motivation why that should be the definition.
>> >> > I
>> >> > think the onus is on you here to show why you want to depart from
>> >> > rather well-established terminology.
>>
>> >> It all boils down to the domain closure assumption, which states that
>> >> the
>> >> only individuals that exist are represented by values in the body of
>> >> the
>> >> database, and the identity relation, =, which guarantees that no
>> >> matter
>> >> how
>> >> many times a value appears, there is only one individual represented
>> >> by
>> >> that
>> >> value. If you have a database schema consisting of a single relation
>> >> schema
>> >> that satisfies the functional dependency A --> B, then due to the
>> >> domain
>> >> closure assumption, the existence of an individual that is represented
>> >> by
>> >> a
>> >> value for A depends upon the existence of a specific individual that
>> >> is
>> >> represented by a value for B. So if the values a1 and b1, for A and B
>> >> respectively, appear in the same tuple, then a denial of the
>> >> existence
>> >> of
>> >> the individual represented by b1 denies the existence of the
>> >> individual
>> >> represented by a1, but a denial of the existence of the individual
>> >> represented by a1 does not necessarily deny the existence of the
>> >> individual
>> >> represented by b1, since there could be another tuple that has the
>> >> values
>> >> a2
>> >> and b1.
>>
>> >> Now suppose that the relation schema also satisfies the functional
>> >> dependency B --> C. Then if the values a1, b1 and c1, for A, B and C
>> >> respectively, appear in the same tuple, then a denial of the existence
>> >> of
>> >> the individual represented by c1 denies the existence the individual
>> >> represented by b1 and transitively the existence of the individual
>> >> represented by a1. When the relation schema is decomposed into a
>> >> family
>> >> of
>> >> relation schemata such that A and B appear in one relation schema and
>> >> B
>> >> and
>> >> C appear in another, then the denial of the existence of the
>> >> individual
>> >> represented by c1 no longer denies the existence of the individuals
>> >> represented by b1 and a1. This is the problem. This is why I think
>> >> that
>> >> an
>> >> inclusion dependency is required.
>>
>> > I am quite impressed. How can you make something so trivial sound so
>> > incredibly complicated? :-) Of course, if during normalization you
>> > split R(A,B,C) into R1(A,B) and R2(A,C) and don't add any inclusion
>> > dependencies you can have C's without associated B's and vice versa.
>> > If that is a problem, then add the INDs. That is basically all that
>> > you have shown in the above. But your claim was that this is (almost?)
>> > always a problem, and for that you have not provided any supporting
>> > argumentation at all.
>>
>> Perhaps it is because this is something that appears trivial since the
>> solution is so intuitively obvious that it is so incredibly complicated
>> to
>> formulate an argument. :-) The above was an attempt (apparently
>> inadequate)
>> to identify the root cause of the problem and to show that it can serve
>> as a
>> logical basis for determining when there should be an inclusion
>> dependency
>> (but not necessarily when there should not).
>
> We already have that logical basis. You simply have to check if it is
> ok if there are any C's without B's and/or if there are any B's
> without C's. Invoking high-sounding principles like the domain closure
> assumption does not explain anything that was not clear already.
>

But not in the general case. See below.

>> > Note btw. that your argument is symmetric so if you indeed don't want
>> > to change the nature of the relationships you need INDs in both
>> > directions.
>>
>> I don't think it is, exactly. For a functional dependency X --> Y, there
>> can be more than one value for X that determines a particular value for
>> Y.
>> So if x1 determines y1 then the statement, "x1 cannot appear in the
>> relation
>> unless y1 also appears." is true, but the statement "y1 cannot appear in
>> the
>> relation unless x1 also appears." is false. So since the first statement
>> is
>> true, it should remain true for the family of relations; hence the need
>> for
>> an IND in that direction. Since the second statement is already false
>> there's no need for an IND in the other direction in order to ensure that
>> it
>> remains false.

>
> You are looking at the wrong line of symmetry. If you split R(A,B,C)
> into R1(A,B) and R2(A,C) then there are two lowerbounds you need to
> worry about:
> - for every C there is at least one B
> - for every B there is at least one C
> There is no reason to consider only one of them just because there is
> an FD associated with it. Especialy since FDs don't really say
> anything about lower bounds.
>

For this particular case, you're right, but it doesn't hurt to look at the FDs, and in the general case, it is necessary. Here are some simple decompositions.

(1) R(A,B,C) such that A --> B and A --> C into R1(A,B) and R2(A,C): Since A --> C, there can't be a value for A without a value for C, so the IND R1[A] in R2[A] is needed. Since A --> B, there can't be a value for A without a value for C so the IND R2[A] in R1[A] is needed.

(2) R(A,B,C) such that A --> B and B --> C into R1(A,B) and R2(B,C): Since A --> C is implied by the cover for R, there can't be a value for A without a value for C, so the IND R1[B] in R2[B] is needed.

(3) R(A,B,C) such that AB --> C and C --> B into R1(A,C) and R2(B,C): Since C --> B, there can't be a value for C without a value for B, so the IND R1[C] in R2[C] is needed.

(4) R(A,B,C) such that A --> B and B --> A into R1(A,B) and R2(A,C): Since A --> B, there can't be a value for A without a value for B, so the IND R2[A] in R1[A] is needed.

>
> -- Jan Hidders
> 
Received on Mon Aug 20 2007 - 16:32:49 CEST

Original text of this message