Re: How to normalize this?

From: Jan Hidders <hidders_at_gmail.com>
Date: Tue, 7 May 2013 20:34:54 +0200
Message-ID: <5189494e$0$26895$e4fe514c_at_dreader37.news.xs4all.nl>


On 2013-05-07 14:50:03 +0000, nvitacolonna_at_gmail.com said:

> On Tuesday, May 7, 2013 4:07:49 PM UTC+2, Jan Hidders wrote:
>

>> On 2013-05-07 13:22:55 +0000, nvitacolonna_at_gmail.com said:
>>>> Normalization theory based on functional dependencies says nothing> >> 
>>>> about the inter-relational constraints of the decomposition
>> 
>>> “It says nothing about the inter-relational constraints of a database> 
>>> > schema” would be more precise. Here is another example:
>> 
>>> R(A,B,C), {B -> A}> > S(D,E), {D -> E}
>> 
>>> R is not in 2NF, the only key being BC. Normalization theory would then 
>>> suggest that it must be decomposed. Now, suppose that, in addition to> 
>>> > the above, you impose R[B,C]⊆S[D,E] (a referential integrity 
>>> constraint> > from (B,C) to (D,E). Would you still decompose R? Or 
>>> would you do> > something else?
>> 
>> R is then in BCNF, since B has become a key. So no splitting needed.

>
> Sure. You need to look at the whole database schema to infer that. And,
> in the general case, there is no algorithm that is guaranteed to give
> you an answer! Data modeling is hard, isn't it? :)
>
> The schema above could then be replaced by:
>
> R(A,B), {B -> A}
> S(D,E), {D -> E}
>
> plus R[B]⊆S[D] (a foreign key from B to D). Arguably, this schema is
> better because redundant information (the C attribute) has been removed.

Indeed. Why did I miss that? I bow my head in shame. :-)

But, to work. So we would like to define normal forms that characterize avoidable or on the other hand the complete lack of redundancy. I can think of several candidates for characterization:

(A) Can be losslessly transformed into another schema with no redundancy, where in the new schema where all implied local dependencies (FDs and INDs) are maintained. (B) Can be losslessly transformed into another schema with no redundancy, where in the new schema all implied local FDs and global INDs are maintained.
(C) Can be losslessly transformed into another schema with no redundancy, where in the new schema all implied EGDs and TGDs (local and non-local) are maintained.

Of course the notion of transformation needs to be suitably defined. Probably transformations consist only of projections, renamings and joins. Note that the last one covers the case where the schema needs to be information-equivalent, presuming we start with only EGDs and TGDs.

I smell a good paper. :-)

> How lucky NoSQL users are, who don't need data modeling (e.g.,
> http://www.couchbase.com/why-nosql/nosql-database)! :)

I sometimes point them to Libkin's work on XML normal forms and tell them "that's now your normalzation theory right there, and things actually get a lot more compicated". That work applies to any data model that is hierarchical / semistructured like XML or JSON.

  • Jan Hidders
Received on Tue May 07 2013 - 20:34:54 CEST

Original text of this message