# Re: Sixth normal form

Date: Mon, 20 Aug 2007 14:32:49 GMT

Message-ID: <lChyi.9370$3x.342_at_newssvr25.news.prodigy.net>

"Jan Hidders" <hidders_at_gmail.com> wrote in message news:1187603947.341232.24570_at_g4g2000hsf.googlegroups.com...

> On 20 aug, 10:16, "Brian Selzer" <br..._at_selzer-software.com> wrote:

>> "Jan Hidders" <hidd..._at_gmail.com> wrote in message

*>>*

*>> news:1187520309.177299.208460_at_57g2000hsv.googlegroups.com...*

*>>*

*>>*

*>>*

*>> > On 18 aug, 09:26, "Brian Selzer" <br..._at_selzer-software.com> wrote:*

*>> >> "Jan Hidders" <hidd..._at_gmail.com> wrote in message*

*>>*

*>> >>news:1187391299.353682.322830_at_w3g2000hsg.googlegroups.com...*

*>>*

*>> >> > On 17 aug, 19:15, "Brian Selzer" <br..._at_selzer-software.com> wrote:*

*>>*

*>> >> >> [... big snip ...]*

*>>*

*>> >> >> If the goal is a database schema that can represent exactly the*

*>> >> >> same*

*>> >> >> information content, then the cyclical interrelational constraint*

*>> >> >> is*

*>> >> >> required; if the goal is a schema that can represent additional*

*>> >> >> information*

*>> >> >> without contradicting the closure of the set of FDs and INDs for*

*>> >> >> all*

*>> >> >> schemata that are equivalent to the less normalized schema, then*

*>> >> >> the*

*>> >> >> cyclical interrelational constraint is not always required, except,*

*>> >> >> of*

*>> >> >> course, when moving from 5NF to 6NF.*

*>>*

*>> >> > *sigh* You already said this, and I already explained that under the*

*>> >> > usual definitions of those terms they are *never* required,*

*>> >> > including*

*>> >> > when going from 5NF to 6NF. You replied that you are using other*

*>> >> > definitons but apart from some informal examples you never gave a*

*>> >> > good*

*>> >> > definition nor a good motivation why that should be the definition.*

*>> >> > I*

*>> >> > think the onus is on you here to show why you want to depart from*

*>> >> > rather well-established terminology.*

*>>*

*>> >> It all boils down to the domain closure assumption, which states that*

*>> >> the*

*>> >> only individuals that exist are represented by values in the body of*

*>> >> the*

*>> >> database, and the identity relation, =, which guarantees that no*

*>> >> matter*

*>> >> how*

*>> >> many times a value appears, there is only one individual represented*

*>> >> by*

*>> >> that*

*>> >> value. If you have a database schema consisting of a single relation*

*>> >> schema*

*>> >> that satisfies the functional dependency A --> B, then due to the*

*>> >> domain*

*>> >> closure assumption, the existence of an individual that is represented*

*>> >> by*

*>> >> a*

*>> >> value for A depends upon the existence of a specific individual that*

*>> >> is*

*>> >> represented by a value for B. So if the values a1 and b1, for A and B*

*>> >> respectively, appear in the same tuple, then a denial of the*

*>> >> existence*

*>> >> of*

*>> >> the individual represented by b1 denies the existence of the*

*>> >> individual*

*>> >> represented by a1, but a denial of the existence of the individual*

*>> >> represented by a1 does not necessarily deny the existence of the*

*>> >> individual*

*>> >> represented by b1, since there could be another tuple that has the*

*>> >> values*

*>> >> a2*

*>> >> and b1.*

*>>*

*>> >> Now suppose that the relation schema also satisfies the functional*

*>> >> dependency B --> C. Then if the values a1, b1 and c1, for A, B and C*

*>> >> respectively, appear in the same tuple, then a denial of the existence*

*>> >> of*

*>> >> the individual represented by c1 denies the existence the individual*

*>> >> represented by b1 and transitively the existence of the individual*

*>> >> represented by a1. When the relation schema is decomposed into a*

*>> >> family*

*>> >> of*

*>> >> relation schemata such that A and B appear in one relation schema and*

*>> >> B*

*>> >> and*

*>> >> C appear in another, then the denial of the existence of the*

*>> >> individual*

*>> >> represented by c1 no longer denies the existence of the individuals*

*>> >> represented by b1 and a1. This is the problem. This is why I think*

*>> >> that*

*>> >> an*

*>> >> inclusion dependency is required.*

*>>*

*>> > I am quite impressed. How can you make something so trivial sound so*

*>> > incredibly complicated? :-) Of course, if during normalization you*

*>> > split R(A,B,C) into R1(A,B) and R2(A,C) and don't add any inclusion*

*>> > dependencies you can have C's without associated B's and vice versa.*

*>> > If that is a problem, then add the INDs. That is basically all that*

*>> > you have shown in the above. But your claim was that this is (almost?)*

*>> > always a problem, and for that you have not provided any supporting*

*>> > argumentation at all.*

*>>*

*>> Perhaps it is because this is something that appears trivial since the*

*>> solution is so intuitively obvious that it is so incredibly complicated*

*>> to*

*>> formulate an argument. :-) The above was an attempt (apparently*

*>> inadequate)*

*>> to identify the root cause of the problem and to show that it can serve*

*>> as a*

*>> logical basis for determining when there should be an inclusion*

*>> dependency*

*>> (but not necessarily when there should not).*

> > We already have that logical basis. You simply have to check if it is > ok if there are any C's without B's and/or if there are any B's > without C's. Invoking high-sounding principles like the domain closure > assumption does not explain anything that was not clear already. >

But not in the general case. See below.

>> > Note btw. that your argument is symmetric so if you indeed don't want

*>> > to change the nature of the relationships you need INDs in both
**>> > directions.
**>>
**>> I don't think it is, exactly. For a functional dependency X --> Y, there
**>> can be more than one value for X that determines a particular value for
**>> Y.
**>> So if x1 determines y1 then the statement, "x1 cannot appear in the
**>> relation
**>> unless y1 also appears." is true, but the statement "y1 cannot appear in
**>> the
**>> relation unless x1 also appears." is false. So since the first statement
**>> is
**>> true, it should remain true for the family of relations; hence the need
**>> for
**>> an IND in that direction. Since the second statement is already false
**>> there's no need for an IND in the other direction in order to ensure that
**>> it
**>> remains false.
*

> > You are looking at the wrong line of symmetry. If you split R(A,B,C) > into R1(A,B) and R2(A,C) then there are two lowerbounds you need to > worry about: > - for every C there is at least one B > - for every B there is at least one C > There is no reason to consider only one of them just because there is > an FD associated with it. Especialy since FDs don't really say > anything about lower bounds. >

> > -- Jan Hidders >Received on Mon Aug 20 2007 - 16:32:49 CEST