Re: Sixth normal form

From: Jan Hidders <hidders_at_gmail.com>
Date: Mon, 20 Aug 2007 09:59:07 -0000
Message-ID: <1187603947.341232.24570_at_g4g2000hsf.googlegroups.com>


On 20 aug, 10:16, "Brian Selzer" <br..._at_selzer-software.com> wrote:
> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>
> news:1187520309.177299.208460_at_57g2000hsv.googlegroups.com...
>
>
>
> > On 18 aug, 09:26, "Brian Selzer" <br..._at_selzer-software.com> wrote:
> >> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>
> >>news:1187391299.353682.322830_at_w3g2000hsg.googlegroups.com...
>
> >> > On 17 aug, 19:15, "Brian Selzer" <br..._at_selzer-software.com> wrote:
>
> >> >> [... big snip ...]
>
> >> >> If the goal is a database schema that can represent exactly the same
> >> >> information content, then the cyclical interrelational constraint is
> >> >> required; if the goal is a schema that can represent additional
> >> >> information
> >> >> without contradicting the closure of the set of FDs and INDs for all
> >> >> schemata that are equivalent to the less normalized schema, then the
> >> >> cyclical interrelational constraint is not always required, except, of
> >> >> course, when moving from 5NF to 6NF.
>
> >> > *sigh* You already said this, and I already explained that under the
> >> > usual definitions of those terms they are *never* required, including
> >> > when going from 5NF to 6NF. You replied that you are using other
> >> > definitons but apart from some informal examples you never gave a good
> >> > definition nor a good motivation why that should be the definition. I
> >> > think the onus is on you here to show why you want to depart from
> >> > rather well-established terminology.
>
> >> It all boils down to the domain closure assumption, which states that the
> >> only individuals that exist are represented by values in the body of the
> >> database, and the identity relation, =, which guarantees that no matter
> >> how
> >> many times a value appears, there is only one individual represented by
> >> that
> >> value. If you have a database schema consisting of a single relation
> >> schema
> >> that satisfies the functional dependency A --> B, then due to the domain
> >> closure assumption, the existence of an individual that is represented by
> >> a
> >> value for A depends upon the existence of a specific individual that is
> >> represented by a value for B. So if the values a1 and b1, for A and B
> >> respectively, appear in the same tuple, then a denial of the existence
> >> of
> >> the individual represented by b1 denies the existence of the individual
> >> represented by a1, but a denial of the existence of the individual
> >> represented by a1 does not necessarily deny the existence of the
> >> individual
> >> represented by b1, since there could be another tuple that has the values
> >> a2
> >> and b1.
>
> >> Now suppose that the relation schema also satisfies the functional
> >> dependency B --> C. Then if the values a1, b1 and c1, for A, B and C
> >> respectively, appear in the same tuple, then a denial of the existence of
> >> the individual represented by c1 denies the existence the individual
> >> represented by b1 and transitively the existence of the individual
> >> represented by a1. When the relation schema is decomposed into a family
> >> of
> >> relation schemata such that A and B appear in one relation schema and B
> >> and
> >> C appear in another, then the denial of the existence of the individual
> >> represented by c1 no longer denies the existence of the individuals
> >> represented by b1 and a1. This is the problem. This is why I think that
> >> an
> >> inclusion dependency is required.
>
> > I am quite impressed. How can you make something so trivial sound so
> > incredibly complicated? :-) Of course, if during normalization you
> > split R(A,B,C) into R1(A,B) and R2(A,C) and don't add any inclusion
> > dependencies you can have C's without associated B's and vice versa.
> > If that is a problem, then add the INDs. That is basically all that
> > you have shown in the above. But your claim was that this is (almost?)
> > always a problem, and for that you have not provided any supporting
> > argumentation at all.
>
> Perhaps it is because this is something that appears trivial since the
> solution is so intuitively obvious that it is so incredibly complicated to
> formulate an argument. :-) The above was an attempt (apparently inadequate)
> to identify the root cause of the problem and to show that it can serve as a
> logical basis for determining when there should be an inclusion dependency
> (but not necessarily when there should not).

We already have that logical basis. You simply have to check if it is ok if there are any C's without B's and/or if there are any B's without C's. Invoking high-sounding principles like the domain closure assumption does not explain anything that was not clear already.

> > Note btw. that your argument is symmetric so if you indeed don't want
> > to change the nature of the relationships you need INDs in both
> > directions.
>
> I don't think it is, exactly. For a functional dependency X --> Y, there
> can be more than one value for X that determines a particular value for Y.
> So if x1 determines y1 then the statement, "x1 cannot appear in the relation
> unless y1 also appears." is true, but the statement "y1 cannot appear in the
> relation unless x1 also appears." is false. So since the first statement is
> true, it should remain true for the family of relations; hence the need for
> an IND in that direction. Since the second statement is already false
> there's no need for an IND in the other direction in order to ensure that it
> remains false.

You are looking at the wrong line of symmetry. If you split R(A,B,C) into R1(A,B) and R2(A,C) then there are two lowerbounds you need to worry about:
- for every C there is at least one B
- for every B there is at least one C
There is no reason to consider only one of them just because there is an FD associated with it. Especialy since FDs don't really say anything about lower bounds.

  • Jan Hidders
Received on Mon Aug 20 2007 - 11:59:07 CEST

Original text of this message