Re: How to normalize this?

From: Jan Hidders <hidders_at_gmail.com>
Date: Thu, 2 May 2013 15:23:07 +0200
Message-ID: <518268bb$0$6086$e4fe514c_at_dreader36.news.xs4all.nl>


On 2013-05-02 10:40:28 +0000, Erwin said:

> Op donderdag 2 mei 2013 03:35:15 UTC+2 schreef com..._at_hotmail.com het volgende:

>> On Tuesday, April 30, 2013 3:17:44 PM UTC-7, Erwin wrote:
>>
>>> Op dinsdag 30 april 2013 23:39:58 UTC+2 schreef Jan Hidders het volgende:

>>
>>> 

>>
>>>> On 2013-04-30 14:21:47 +0000, Erwin said:

>>
>>
>>
>>>>>> R1(a, b, c, d)

>>
>>>>>> R2(a, b, e)

>>
>>>>>> R3(b, c, e)

>>
>>
>>
>>>>> In particular, that at all times R1 JOIN R2 === R1 JOIN R3.

>>
>>
>>
>>>> But on the other hand, this type of restriction is often an>> > > 
>>>> artifact of the initial table design

>>
>>
>>
>> No, it is not an artifact. The fact that it was a single-table schema
>> is irrelevant to normalization. The normalized design will have such a
>> constraint exactly when the original does. The original schema,
>> involving a single table, is capable of representing any predicate on
>> its attributes including the one with the constraint and the one
>> without.
>>
>>
>>
>> Eg: Given predicate R(a,b,c,d,e) the normalized relations have predicates
>>
>> R1: EXISTS e R(a,b,c,d,e)
>>
>> R2: EXISTS c,d R(a,b,c,d,e)
>>
>> R3: EXISTS a,d R(a,b,c,d,e)
>>
>> where R1(a,b,c,d) AND R2(a,b,e) AND R3(b,c,e) iff R(a,b,c,d,e). But
>> people erroneously think the Ri can hold any old values. They come up
>> with different predicates that instead of being of the form
>> "...a...b...c...d...some e..." are of the form "...a...b...c...d...".
>> When these mistaken predicates are conjoined they give a predicate that
>> R(a,b,c,d,e) implies but might not be logically equivalent. It will be
>> non-equivalent if the mistaken meanings differ from the correct
>> meanings. Ie if it is possible for the "...a...b...c...d..." version to
>> hold when there isn't some appropriate e value.
>>
>>
>>
>>>> And the getting rid of it is actually a feature and not a>> > > bug.

>>
>>
>>
>> I agree that people come to understand that they had the wrong design.
>> But the feature-not-bug situation is that it is noticed that the
>> mistaken predicates and their conjunction are actually the correct
>> design. Likely it is never noticed that the mistaken predicates are in
>> fact mistaken, and the discrepancy between their conjunction and the
>> original predicate is attributed, as by you and Erwin, to starting
>> normalization from a single table. Normalization didn't have anything
>> to do with the fact that the user chose an original predicate that was
>> wrong for the design and that they misinterpreted the
>> normalization-produced predicates as ones that were right for the
>> design.
>>
>>
>>
>>> It's often bothered me that normalization theory/procedure seems to 
>>> quietly ignore the notion of "nullability or not" of any of the 
>>> attributes in the "initial table design" ...

>>
>>> 

>>
>>> Iow, that normalization theory per se doesn't actually allow to 
>>> determine when and when not the phenomenon is "an artifact of the 
>>> iniital table design".

>>
>>
>>
>> There is no "phenomenon". Initially modeling by a universal relation
>> does not itself cause "artifacts". Such a constraint either holds or
>> doesn't in the original design and either correctly models the problem
>> or doesn't, and normalization is independent of it. We are left with
>> normalization just not happening to do something that we'd like done
>> that we have no reason to expect it to do.
>>
>>
>>
>> philip
> 
> You've characterized my botherings with normalization stuff very well.
> 
> However, I'd argue that there is indeed "a phenomenon", and it's 
> exactly the phenomenon you described as the "...a...b...c...d...some 
> e..." vs. "...a...b...c...d..." confusion.
> 
> Jan observes that in practice most of the time, making this "mistake" 
> leads people to "discovering" the correct predicates for the business 
> problem at hand, and seems to be on the side that therefore it must be 
> a good thing, I personally am rather bothered with the fact that a 
> theory would be useful because misunderstanding it or only partially 
> understanding it, in practice mostly leads to correct designs being 
> derived from less correct ones ...

Well put, and I think we are not that far apart in these matters. I'm certainly not in favor of applying the normalization algorithms blindly and without understanding what it is that they do and do not achieve. Understanding that the usual normalization algorithms do not necessarily result in an equivalent schema is part of that. But most textbooks I know are correct and clear on this and will state that the goal is for example a lossless-join and dependency-preserving decomposition.

A question I often ask to see if people really understand normalization is the following:

The reason that is usually stated for not normalizing all the way up to BCNF and stopping at 3NF is that you cannot always reach BCNF while remaining dependency preserving. However, suppose we follow the following procedure:

  1. Apply the 3NF normalization procedure we discussed earlier
  2. Include with each component only the FDs that generated that component

This gives you a lossless-join dependency-preserving decomposition. And each component clearly is in BCNF for its included FDs. But wait, is that not exactly the thing that was stated as impossible? So what gives here?

> (I guess this must also be the reason why Fabian Pascal insists that if 
> you do the conceptual modeling properly from the get-go, you won't even 
> need normalization at all.)

In practice that is often true, but on the other hand it is theoreticallly possible that even in a correct conceptual model there are some normalization issues left. And apart from that I would argue that normalization theory is important to understand what happens if you start denormalizing in your implementation. And in practice it happens that you have to start working on an existing system where you need the theory to determine if and why the exisiting implementation is problematic, and if it is what you can do about it.

  • Jan Hidders
Received on Thu May 02 2013 - 15:23:07 CEST

Original text of this message