Re: Sixth normal form

From: Brian Selzer <brian_at_selzer-software.com>
Date: Thu, 09 Aug 2007 02:15:43 GMT
Message-ID: <jNuui.44763$Um6.25598_at_newssvr12.news.prodigy.net>


"Jan Hidders" <hidders_at_gmail.com> wrote in message news:1186474827.234713.207060_at_k79g2000hse.googlegroups.com...

> On 7 aug, 06:38, "Brian Selzer" <br..._at_selzer-software.com> wrote:

>> "Jan Hidders" <hidd..._at_gmail.com> wrote in message
>>
>> news:1186391813.648681.90280_at_w3g2000hsg.googlegroups.com...
>>
>> [big snip]
>>
>>
>>
>>
>>
>> >> > Agreed. I would add that this is not specific for going from 5NF to
>> >> > 6NF but anywhere you decompose to go from a lower to a higher normal
>> >> > form.
>>
>> >> Not always. If two sets of attributes are independent, as is the case
>> >> when
>> >> moving from 1NF to 2NF, then there is no need for a referential
>> >> constraint;
>> >> if two projections are independent, as is the case when moving from
>> >> BCNF
>> >> to
>> >> 4NF, then there is no need for a referential constraint.
>>
>> > I have no idea what you mean here. In all those cases you need a
>> > cyclic pair of nclusion dependencies if you want the new schema to be
>> > equivalent with the old. And if you only want the new schema to
>> > contain at least as much information as the old then you strictly
>> > speaking don't need any inclusion dependency at all.
>>
>> If two sets of attributes are independent, then for each combination of
>> values for one of the sets of attributes, there is a copy of the
>> projection
>> over all of the other attributes.
>
> Huh? I assume you meant that for each two tuples t1 and t2 there is a
> tuple t3 such that t3[A] = t1[A] and t3[B] = t2[B] where t[A] denotes
> the projection of t on A. In other words, there is an MVD X->>A|B for
> some X.
>

Not exactly, unless perhaps "some X" were {}.

>> What I mean is, the cardinality of the
>> 1NF relation is equal to the product of the cardinalities of the
>> projections
>> over each independent set of attributes.

>
> I doubt that is what you mean. If for R(a,b,c) there is an MVD a ->> b
> | c then it is certainly not true that | R |  = | R[b] | x | R[c] |.
> Of course this is true if R a restricted to the tuples with particular
> value in the 'a' column, so perhaps that is what you meant?
>

I was talking about 2NF, not 4NF.

>> A cyclical constraint would only
>> prevent (pathologically, I should add) the case where the 1NF relation is
>> empty and where either but not both of the 2NF relations is empty: [...]

>
> This is simply not true. Also if the 1NF relation is nonempty the 2NF
> relations might violate the two inclusion dependencies if the are
> allowed to contain more than just the projections of the 1NF relation.
>

What would the inclusion dependencies look like for 2NF projections of a 1NF relation {A, B} where A and B are disjoint and independent sets of attributes?

I could be wrong, but isn't the join of the 2NF projections therefore a cross product, which just happens to be the 1NF relation?

>> The same
>> can be said for independent projections of relations, as is the case when
>> moving from BCNF to 4NF.

>
> No, also there it isn't true.
>

I agree, it's not exactly the same thing. There could be some overlap.

>> I guess I need to clarify what I mean by /at least the same/ information
>> content. Consider the following example relation schema that is in 5NF:
>>
>> (1) {Whse, Item, TranId, RcvdDate, QtyRcvd, QtySold, Cost}
>>
>> such that
>>
>> {Whse, Item, TranId} --> {RcvdDate, QtyRcvd, QtySold, Cost}
>>
>> An instance of this schema enumerates a set of cost tiers for items in a
>> warehouse. Cost is the total cost for the quantity received. The unit
>> cost
>> is therefore Cost / QtyRcvd, but is not calculated ahead of time to
>> minimize
>> rounding errors. There can be several cost tiers for an item in a
>> warehouse, and the cost assigned to a sale is computed by pulling from
>> the
>> oldest tier first, and continuing with successive tiers until the
>> quantity
>> needed for the sale has been met. Now decomposing the above schema into
>> 6NF
>> gives:
>>
>> (2) {Whse, Item, TranId, RcvdDate}
>> (3) {Whse, Item, TranId, QtyRcvd}
>> (4) {Whse, Item, TranId, QtySold}
>> (5) {Whse, Item, TranId, Cost}
>>
>> This schema allows a Cost without a QtyRcvd, thus preventing the
>> calculation
>> of unit cost. It permits the existence of a QtySold without a QtyRcvd,
>> thus
>> preventing the calculation of the remaining quantity. It also permits
>> QtyRcvd without RcvdDate, thus preventing the selection of the oldest
>> tier.
>> Clearly the existence of QtySold and Cost depend upon the existence of
>> QtyRcvd, and the existence of QtyRcvd depends upon the existence of
>> RcvdDate. Consequently, the set of 6NF relation schemata can't contain
>> /at
>> least the same/ information content as the 5NF schema, because some valid
>> instances of the 6NF schemata do not make sense.
>>
>> Suppose that tuples exist for a particular {Whse, Item, TranId} in
>> instances
>> of the 6NF schemata for (3) and (4), but not for (2) and (5). A query
>> seeking the total quantity on hand for the Item (SUM(QtyRcvd - QtySold))
>> could indicate that quantity required for a sale is available, but since
>> the
>> tuples for (2) and (5) do not exist, the cost to be assigned to the sale
>> transaction cannot be computed. It certainly doesn't sound to me like a
>> good thing to carry inventory that you can't sell!

>
> You could, once the missing information has been added. Yes, there
> might be circumstances under which you would want to disallow such
> incomplete information, but it is also very conceivable that you would
> want to explicitly allow it.
>



> Btw., did you really have to type all this to explain the rather
> trivial observation that if you move to a more liberal schema then
> certain assumptions that are necessary for certain queries to make
> sense may no longer hold? And again, this is not typical for the step
> from 5NF to 6NF, but it can happen at all normalization steps where
> you split and don't include inclusion dependencies in both directions.
>
> Finally, your redefinition of the phrase "a schema with at least the
> same information" is a bit confusing, to say the least, because the
> problem you are describing is caused by the schema being too liberal
> and thereby allowing information that does not make sense. It would be
> more appropriate to say that the schema allows too much information. >

I don't think that it's about a schema being too liberal. I think it's about changing a deterministic model into one that isn't. If a FD is lost as a result of the decomposition of a relation, then that deterministic relationship between the two sets of attributes is also lost. To recover that deterministic relationship requires the addition of an inclusion dependency.

>> It should be obvious that if sets of dependent attributes are truly
>> independent of each other (which is the only case that an inclusion
>> dependency is not indicated), then they should have already been
>> separated
>> into their own relation schemata as part of the normalization process.

>
> Assuming my guess of your definition of "independence" is correct this
> is not true.
>

My use of the term "independent" is not the same as that defined by Rissanen. I consider a projection over a set of attributes A to be independent if and only if every dependent set of attributes determined by any subset of A is also a subset of A. In other words, a projection over a set of attributes A is independent if and only if for each functional dependency X --> Y in the closure of the set of functional dependencies for the relation schema containing A, if X is a subset of A then Y must also be a subset of A. If a subset of A is the determinant for a set of attributes that is not a subset of A, then the existence of a tuple in the projection over A is dependent upon the existence of a tuple in another projection.

Consider a simple relation schema, {A, B, C} such that A --> B and B --> C. Rissanen would consider both of the projections {A, B} and {B, C} independent. I don't. The closure of the set of functional dependencies includes A --> C, which can only be preserved by the inclusion dependency, {A,B}[B] IN {B,C}[B]. In this way there can't be a value for A unless there is also a specific value for C. As a result, I would consider {B, C} to be independent, but not {A, B}.

>> It
>> means that a multivalued dependency exists that is not implied by the
>> candidate keys.

>
> Nope. Also that is not true.
>

>> It means that the relation schema is not even in 4NF.
>
> Also false.
>

>> Therefore, when moving from 5NF to 6NF, a cyclical constraint is /always/
>> necessary to avoid information loss.
>
> Which is also not true, unless of course you redefine "avoid
> information loss" as "a schema that is not equivalent", which is how
> you seem to interpret it, and in which case, as I already said, this
> is true for any normalization split.
>
> -- Jan Hidders
> 
Received on Thu Aug 09 2007 - 04:15:43 CEST

Original text of this message