Re: How to normalize this?

From: <nvitacolonna_at_gmail.com>
Date: Tue, 7 May 2013 02:54:47 -0700 (PDT)
Message-ID: <6261de91-09dd-40c2-800f-c0d0a19ded48_at_googlegroups.com>


Il giorno martedì 7 maggio 2013 00:51:01 UTC+2, Jan Hidders ha scritto:
> On 2013-05-06 21:44:04 +0000, compdb_at_hotmail.com said:
>
> > On Monday, May 6, 2013 8:18:38 AM UTC-7, Jan Hidders wrote:
>
> >> On 2013-05-06 12:26:10 +0000, Erwin said:
>
> >>> How can deliberate admission of "information differences" coexist with>
> >>> > "aims of being lossless" ?
>
> > It is called a "lossless" decomposition because if the components are
> > projections of the original then they join to the original. Nb this
> > assumes certain predicates for the components in terms of the original.
> > (It has nothing per se to do with other predicates.) Nb this does not
> > deal with appropriate constraints on the components or the original.
> >
> > Compare to additionally being "dependency-preserving" where if given
> > dependencies hold in the components then they hold in the original. So
> > enforcing them in the components enforces them in the join. Although
> > moving from 3nf to bcnf is not dependency-preserving, one can introduce
> > suitable inter-component constraints. Nb other constraints are not
> > dealt with by normalization.
>
> >> Well, "lossless" means nothing is lost, but, yes, something is allowed>
> >> to be added, i.e., the new schema might be able to represent>
> >> information that the old one could not.
>
> > Normalization does not "allow" this. What happens is people normalize,
> > then they notice they really wanted a schema allowing more states (ie
> > predicates that the normalization-generated predicates imply), which
> > they didn't need to normalize to find, then they change to new the
> > predicates for the components, giving a new predicate for their join,
> > BUT in such a way that just happens to leave the new components
> > properly normalizing the new join. They confuse this process with
> > normalization and do not understand normalization's role in it.
>
> I'm afraid you are losing me here. You seem to be unilaterally
> redefining the notion of normalization and then accusing other people
> of misunderstanding what they are doing just because they are using a
> different definition. That's not going to lead to a productive
> discussion.

Normalization theory based on functional dependencies says nothing about the inter-relational constraints of the decomposition—and it can't, because normal forms are defined wrt a single relational schema. Consider R(A,B,C) with F = {A -> B, B -> C}, which can be decomposed into (R1(A,B), {A -> B}) and (R2(B,C), {B -> C}). Which of the following would you consider valid instances?

(1) Information-equivalent, that is, R1[B] = R2[B]:

r1: {(a,b)}
r2: {(b,c)}

(2) No inter-relational constraint:

r1: {(a,b)}
r2: {(c,d)}

(3) Referential integrity, that is, R1[B] ⊆ R2[B]:

r1: {(a1,b1)}
r2: {(b1,c1), (b2, c2)}

(4) Inclusion dependency (IND), that is, R2[B] ⊆ R1[B]:

r1: {(a1,b1),(a2,b1),(a3,b2)}
r2: {(b1,c1)}

(if you don't see the symbol ⊆, it is the “(not necessarily proper) subset of” symbol).

One may argue that normalization (implicitly?) deals only with case (1), and that (2)–(4) are the result of realizing, after normalizing, that the original design was, in fact, incorrect (i.e., wrong predicates). Whether you consider this “healing” feature a goal of normalization or an entirely different process may be argued at length (my point of view is the former).

The way I see it, the decomposition is not the end of the story. At that point, a much more complex process starts: you must discover how to keep the whole database schema together (in the most basic case, this means you have to discover which INDs hold). This is where normalization theory falls short, in my opinion. For example:

S(A,B,C), {A -> BC}
R(A,B), {}

are each in BCNF. Now, suppose that you determine that R[A,B] ⊆ S[A,B]. Does that introduce any redundancy? Does it change the keys of R? Is it still a good database schema (in some sense to be defined)?

The issue the OP had falls in this category:

R(a,b,c,d), {abc -> d}
S(a,b,e), {ab -> e}
T(b,c,e), {bc -> e}

Has this redundancy or anomalies? Can this be simplified further? Under which hypotheses?

As far as I know, there is no accepted notion of normalization for the database schema as a whole. The two definitions that I know of which take into account FDs and INDs (*) are not satisfactory in my opinion (we may discuss that in another thread if you want).

Yes, even reasoning with just functional and inclusion dependencies together is very hard (read: many problems become undecidable). But I'd like to see, if I may say so, a more holistic approach to the process.

Nicola

(*) Tok Wang Ling and Cheng Hian Goh. Logical database design with inclusion dependencies. In Proceedings of the Eighth International Conference on Data Engineering, pages 642–649, 1992.

(*) Mark Levene and Millist W. Vincent. Justification for inclusion dependency normal form. IEEE Transactions On Knowledge and Data Engineering, 12(2):281–291, March 2000. Received on Tue May 07 2013 - 11:54:47 CEST

Original text of this message