Re: How to normalize this?

From: Erwin <e.smout_at_myonline.be>
Date: Wed, 8 May 2013 03:30:45 -0700 (PDT)
Message-ID: <f7af181d-0b88-45cb-bdf8-4c45e33fff23_at_googlegroups.com>


Op dinsdag 7 mei 2013 16:50:03 UTC+2 schreef nvitac..._at_gmail.com het volgende:
> On Tuesday, May 7, 2013 4:07:49 PM UTC+2, Jan Hidders wrote:
>
>
>
> > On 2013-05-07 13:22:55 +0000, ... said:
>
> > >> Normalization theory based on functional dependencies says nothing
>
> > >> about the inter-relational constraints of the decomposition
>
> >
>
> > > “It says nothing about the inter-relational constraints of a database
>
> > > schema” would be more precise. Here is another example:
>
> >
>
> > > R(A,B,C), {B -> A}
>
> > > S(D,E), {D -> E}
>
> >
>
> > > R is not in 2NF, the only key being BC. Normalization theory would then
>
> > > suggest that it must be decomposed. Now, suppose that, in addition to
>
> > > the above, you impose R[B,C]⊆S[D,E] (a referential integrity constraint
>
> > > from (B,C) to (D,E). Would you still decompose R? Or would you do
>
> > > something else?
>
> >
>
> > R is then in BCNF, since B has become a key. So no splitting needed.
>
>
>
> Sure. You need to look at the whole database schema to infer that. And, in the general case, there is no algorithm that is guaranteed to give you an answer! Data modeling is hard, isn't it? :)
>
>
>
> The schema above could then be replaced by:
>
>
>
> R(A,B), {B -> A}
>
> S(D,E), {D -> E}
>
>
>
> plus R[B]⊆S[D] (a foreign key from B to D). Arguably, this schema is better because redundant information (the C attribute) has been removed.
>
>
>
> Personally, I believe that this extension of normalization theory, including at least special cases of INDs, hasn't be fully understood yet (or, well, _I_ haven't fully understood it), among other reasons, I guess, because computational complexity results are discouraging. The papers I've mentioned give significant contributions, but I don't think that they are the final answer. And we are still considering only the most basic types of constraints… How lucky NoSQL users are, who don't need data modeling (e.g., http://www.couchbase.com/why-nosql/nosql-database)! :)

"And, in the general case, there is no algorithm that is guaranteed to give you an answer!"

In this example, it seemed to me like the "pullback rule" was sufficient to find the answer. Applying the pullback rule to a finite set of explicitly stated INDs, their sources and their targets, certainly seems like a feasible piece of algorithm.

So what is "the general case" ? Received on Wed May 08 2013 - 12:30:45 CEST

Original text of this message