Re: How to normalize this?

From: Jan Hidders <hidders_at_gmail.com>
Date: Tue, 7 May 2013 15:50:35 +0200
Message-ID: <518906ab$0$6074$e4fe514c_at_dreader36.news.xs4all.nl>


On 2013-05-07 09:54:47 +0000, nvitacolonna_at_gmail.com said:

> Il giorno martedì 7 maggio 2013 00:51:01 UTC+2, Jan Hidders ha scritto:

>> On 2013-05-06 21:44:04 +0000, compdb_at_hotmail.com said:
>> 
>>> On Monday, May 6, 2013 8:18:38 AM UTC-7, Jan Hidders wrote:
>> 
>>>> On 2013-05-06 12:26:10 +0000, Erwin said:
>> 
>>>>> How can deliberate admission of "information differences" coexist with>
>>>>>> "aims of being lossless" ?
>> 
>>> It is called a "lossless" decomposition because if the components are> 
>>> > projections of the original then they join to the original. Nb this> 
>>> > assumes certain predicates for the components in terms of the 
>>> original.> > (It has nothing per se to do with other predicates.) Nb 
>>> this does not> > deal with appropriate constraints on the components or 
>>> the original.
>>> 
>>> Compare to additionally being "dependency-preserving" where if given> > 
>>> dependencies hold in the components then they hold in the original. So> 
>>> > enforcing them in the components enforces them in the join. Although> 
>>> > moving from 3nf to bcnf is not dependency-preserving, one can 
>>> introduce> > suitable inter-component constraints. Nb other constraints 
>>> are not dealt with by normalization.
>> 
>>>> Well, "lossless" means nothing is lost, but, yes, something is allowed>
>>>> to be added, i.e., the new schema might be able to represent> 
>>>> information that the old one could not.
>> 
>>> Normalization does not "allow" this. What happens is people normalize,> 
>>> > then they notice they really wanted a schema allowing more states 
>>> (ie> > predicates that the normalization-generated predicates imply), 
>>> which> > they didn't need to normalize to find, then they change to new 
>>> the> > predicates for the components, giving a new predicate for their 
>>> join,> > BUT in such a way that just happens to leave the new 
>>> components> > properly normalizing the new join. They confuse this 
>>> process with> > normalization and do not understand normalization's 
>>> role in it.
>> 
>> I'm afraid you are losing me here. You seem to be unilaterally 
>> redefining the notion of normalization and then accusing other people> 
>> of misunderstanding what they are doing just because they are using a> 
>> different definition. That's not going to lead to a productive> 
>> discussion.

>
> Normalization theory based on functional dependencies says nothing
> about the inter-relational constraints of the decomposition—and it
> can't, because normal forms are defined wrt a single relational schema.
> Consider R(A,B,C) with F = {A -> B, B -> C}, which can be decomposed
> into (R1(A,B), {A -> B}) and (R2(B,C), {B -> C}). Which of the
> following would you consider valid instances?
>
> (1) Information-equivalent, that is, R1[B] = R2[B]:
>
> r1: {(a,b)}
> r2: {(b,c)}
>
> (2) No inter-relational constraint:
>
> r1: {(a,b)}
> r2: {(c,d)}
>
> (3) Referential integrity, that is, R1[B] ⊆ R2[B]:
>
> r1: {(a1,b1)}
> r2: {(b1,c1), (b2, c2)}
>
> (4) Inclusion dependency (IND), that is, R2[B] ⊆ R1[B]:
>
> r1: {(a1,b1),(a2,b1),(a3,b2)}
> r2: {(b1,c1)}
>
> (if you don't see the symbol ⊆, it is the “(not necessarily proper)
> subset of” symbol).
>
> One may argue that normalization (implicitly?) deals only with case
> (1), and that (2)–(4) are the result of realizing, after normalizing,
> that the original design was, in fact, incorrect (i.e., wrong
> predicates).

I'd argue that classical normalization theory is agnostic here: it simply doesn't talk about any other dependencies that might be implied, and so in some sense considers all of them valid. Case in point: consider the claim that one can always reach 3NF while staying dependency preserving. Under assumption (1) this claim becomes, although strictly speaking true, virtually meaningless from a practical point of view. Yes, all the original FDs are implied by the local ones, but along the way you may have had to introduce inter-component dependencies, some of them not even expressible as INDs, to stay information equivalent (*). That might be a (too) heavy a price to pay and one that may dwarf the cost of having any inter-component FDs. However, under (2) this reachability claim actually does make sense. On the other hand, most of the original theoretical papers worked under the universal relation assumption, whcih seems to be more consistent with (1). So, as I said, I think classical normalization theory is mostly agnostic here, assuming that such a statement actually makes sense and is not anthropomorphising too much.

(*) Note the relationship with the observation that you cannot reach 3NF by just splitting of violating dependencies.

> Whether you consider this “healing” feature a goal of normalization or
> an entirely different process may be argued at length (my point of view
> is the former).

Indeed. But I'd consider that a rather vacuous discussion as the answer is mostly a matter of defintion. Life is short, I have exams to grade, articles to write, faculty meetings to attend, and so preferrably try to stay away from that type of discussion. :-) Let us please concentrate on what constitutes a good design theory and what types of algorithms and theorems are needed to help with that. We will give that baby a name once it is born, not before. :-)

> The way I see it, the decomposition is not the end of the story. At
> that point, a much more complex process starts: you must discover how
> to keep the whole database schema together (in the most basic case,
> this means you have to discover which INDs hold). This is where
> normalization theory falls short, in my opinion. For example:
>
> S(A,B,C), {A -> BC}
> R(A,B), {}
>
> are each in BCNF. Now, suppose that you determine that R[A,B] ⊆ S[A,B].
> Does that introduce any redundancy? Does it change the keys of R? Is it
> still a good database schema (in some sense to be defined)?

Indeed. Those are the right questions. My tentative answers: - Yes.
- Depends on what you mean by "the keys of R". Presuming we are doing logical design and so are happy with a set of constraints that implies all the ones we know, I'd say we don't need ot add this key constraint. From a practical point of view it migh of course be useful. - Maybe, as I don't see one at the moment that is better in every respect.

> The issue the OP had falls in this category:
>
> R(a,b,c,d), {abc -> d}
> S(a,b,e), {ab -> e}
> T(b,c,e), {bc -> e}
>
> Has this redundancy or anomalies? Can this be simplified further? Under
> which hypotheses?

It all depends on which INDs, TGDs, etc. you think should hold in addition.

> As far as I know, there is no accepted notion of normalization for the
> database schema as a whole. The two definitions that I know of which
> take into account FDs and INDs (*) are not satisfactory in my opinion
> (we may discuss that in another thread if you want).

Interesting. The Levene et al. paper I knew, but not the Ling et al. paper. I'd be very interested in such a discussion.

> Yes, even reasoning with just functional and inclusion dependencies
> together is very hard (read: many problems become undecidable). But I'd
> like to see, if I may say so, a more holistic approach to the process.

Me too. :-)

  • Jan Hidders
Received on Tue May 07 2013 - 15:50:35 CEST

Original text of this message