# Re: Sixth normal form

Date: Mon, 20 Aug 2007 09:59:07 -0000

Message-ID: <1187603947.341232.24570_at_g4g2000hsf.googlegroups.com>

On 20 aug, 10:16, "Brian Selzer" <br..._at_selzer-software.com> wrote:

> "Jan Hidders" <hidd..._at_gmail.com> wrote in message

*>
**> news:1187520309.177299.208460_at_57g2000hsv.googlegroups.com...
**>
**>
**>
**> > On 18 aug, 09:26, "Brian Selzer" <br..._at_selzer-software.com> wrote:
**> >> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
**>
**> >>news:1187391299.353682.322830_at_w3g2000hsg.googlegroups.com...
**>
**> >> > On 17 aug, 19:15, "Brian Selzer" <br..._at_selzer-software.com> wrote:
**>
**> >> >> [... big snip ...]
**>
**> >> >> If the goal is a database schema that can represent exactly the same
**> >> >> information content, then the cyclical interrelational constraint is
**> >> >> required; if the goal is a schema that can represent additional
**> >> >> information
**> >> >> without contradicting the closure of the set of FDs and INDs for all
**> >> >> schemata that are equivalent to the less normalized schema, then the
**> >> >> cyclical interrelational constraint is not always required, except, of
**> >> >> course, when moving from 5NF to 6NF.
**>
**> >> > *sigh* You already said this, and I already explained that under the
**> >> > usual definitions of those terms they are *never* required, including
**> >> > when going from 5NF to 6NF. You replied that you are using other
**> >> > definitons but apart from some informal examples you never gave a good
**> >> > definition nor a good motivation why that should be the definition. I
**> >> > think the onus is on you here to show why you want to depart from
**> >> > rather well-established terminology.
**>
**> >> It all boils down to the domain closure assumption, which states that the
**> >> only individuals that exist are represented by values in the body of the
**> >> database, and the identity relation, =, which guarantees that no matter
**> >> how
**> >> many times a value appears, there is only one individual represented by
**> >> that
**> >> value. If you have a database schema consisting of a single relation
**> >> schema
**> >> that satisfies the functional dependency A --> B, then due to the domain
**> >> closure assumption, the existence of an individual that is represented by
**> >> a
**> >> value for A depends upon the existence of a specific individual that is
**> >> represented by a value for B. So if the values a1 and b1, for A and B
**> >> respectively, appear in the same tuple, then a denial of the existence
**> >> of
**> >> the individual represented by b1 denies the existence of the individual
**> >> represented by a1, but a denial of the existence of the individual
**> >> represented by a1 does not necessarily deny the existence of the
**> >> individual
**> >> represented by b1, since there could be another tuple that has the values
**> >> a2
**> >> and b1.
**>
**> >> Now suppose that the relation schema also satisfies the functional
**> >> dependency B --> C. Then if the values a1, b1 and c1, for A, B and C
**> >> respectively, appear in the same tuple, then a denial of the existence of
**> >> the individual represented by c1 denies the existence the individual
**> >> represented by b1 and transitively the existence of the individual
**> >> represented by a1. When the relation schema is decomposed into a family
**> >> of
**> >> relation schemata such that A and B appear in one relation schema and B
**> >> and
**> >> C appear in another, then the denial of the existence of the individual
**> >> represented by c1 no longer denies the existence of the individuals
**> >> represented by b1 and a1. This is the problem. This is why I think that
**> >> an
**> >> inclusion dependency is required.
**>
**> > I am quite impressed. How can you make something so trivial sound so
**> > incredibly complicated? :-) Of course, if during normalization you
**> > split R(A,B,C) into R1(A,B) and R2(A,C) and don't add any inclusion
**> > dependencies you can have C's without associated B's and vice versa.
**> > If that is a problem, then add the INDs. That is basically all that
**> > you have shown in the above. But your claim was that this is (almost?)
**> > always a problem, and for that you have not provided any supporting
**> > argumentation at all.
**>
**> Perhaps it is because this is something that appears trivial since the
**> solution is so intuitively obvious that it is so incredibly complicated to
**> formulate an argument. :-) The above was an attempt (apparently inadequate)
**> to identify the root cause of the problem and to show that it can serve as a
**> logical basis for determining when there should be an inclusion dependency
**> (but not necessarily when there should not).
*

We already have that logical basis. You simply have to check if it is ok if there are any C's without B's and/or if there are any B's without C's. Invoking high-sounding principles like the domain closure assumption does not explain anything that was not clear already.

> > Note btw. that your argument is symmetric so if you indeed don't want

*> > to change the nature of the relationships you need INDs in both
**> > directions.
**>
**> I don't think it is, exactly. For a functional dependency X --> Y, there
**> can be more than one value for X that determines a particular value for Y.
**> So if x1 determines y1 then the statement, "x1 cannot appear in the relation
**> unless y1 also appears." is true, but the statement "y1 cannot appear in the
**> relation unless x1 also appears." is false. So since the first statement is
**> true, it should remain true for the family of relations; hence the need for
**> an IND in that direction. Since the second statement is already false
**> there's no need for an IND in the other direction in order to ensure that it
**> remains false.
*

You are looking at the wrong line of symmetry. If you split R(A,B,C)
into R1(A,B) and R2(A,C) then there are two lowerbounds you need to
worry about:

- for every C there is at least one B

- for every B there is at least one C

There is no reason to consider only one of them just because there is
an FD associated with it. Especialy since FDs don't really say
anything about lower bounds.

- Jan Hidders