Re: Define "flatten database" ?

From: Dawn M. Wolthuis <dwolt_at_tincat-group.comREMOVE>
Date: Sun, 30 Jan 2005 15:41:05 -0600
Message-ID: <ctjk9u$h1u$1_at_news.netins.net>


"Jan Hidders" <jan.hidders_at_REMOVETHIS.pandora.be> wrote in message news:WPWKd.8976$tX7.741404_at_phobos.telenet-ops.be...
> Dawn M. Wolthuis wrote:

>> "Alan" <alan_at_erols.com> wrote in message 
>> news:35vdofF4rvh8nU1_at_individual.net...
>>>"Dawn M. Wolthuis" <dwolt_at_tincat-group.comREMOVE> wrote in message
>>>news:ctdn9o$7sq$1_at_news.netins.net...
>>>>"DA Morgan" <damorgan_at_x.washington.edu> wrote in message
>>>>news:1106859773.807419_at_yasure...
>>>>>silversw2000_at_yahoo.com wrote:
>>>>>>
>>>>>>Can someone define "flatten database" for me, in 100 words or less (OK
>>>>>>500 words is fine).
>>>>>
>>>>>Denormalize.
>>>>
>>>>I would guess "normalize" is more likely. [...]
>>>
>>>Flattening is DEnormalizing. Period.
>>
>> OK, I'd vote for both terms being incorrect.  It is not denormalizing 
>> because it is starting with non-1NF data -- agreed?  And it is not 
>> normalizing because, well, it clearly isn't (the key, the whole keyu and 
>> nothing but the key ...).
>

> Can I try to bring some light here?

Yes, you are always welcome!

> Some database texts actually do treat the step from NFNF (or UNF or
> non-1NF or whatever you want to call it) to 1NF as the first step of the
> normalization process which makes it valid to call this step
> "normalization". Note that normalization means in general that you are
> transforming something such that it afterwards adheres to a certain normal
> form, i.e., a form in which it satisfies certain criteria, and that is of
> course exactly what is happening in this step. So the term "normalization"
> is certainly justified and correct.

>

> However, it is also defendable to classify it as denormalization since one
> of the goals of normalization was to remove redundancy, but flattening the
> database may actually introduce redundancy, which sets it somewhat apart
> from the other steps. Since in industry the term normalization is often
> loosely used synonymously with any form of restructuring of the schema
> that removes redundancy, it then makes sense to see it as denormalization.
> But, from an academic point of view that is not strictly correct.
> Obviously, *defining* flattening as denormalization is clearly wrong, even
> under the latter interpretation, since not all denormalization is
> flattening.

Yes -- thanks, you clearly were tracking with me and it looks like Alan now understands this point too. He wants to say that when you flatten you ALWAYS are denormalizing, which seems to me to have a semantic problem, but otherwise I understand that point. If you start with data that are not normalized, do we say we are denormalizing if we introduce more redundancy? Or should we reserve the term "denormalize" for times when we start with data that are at least in 1NF and head to a structure that is not? I'm OK either way, but it makes a difference in whether I can give a counter example to Alan's claim or simply agree with it.

> To close off a small test question for those who really understand the
> subject: Introductory database texts sometimes differ in how to get to
> 1NF: in some cases they tell you to immedeately split off the nested
> columns into separate relations (thus avoiding redundancy) and in others
> the relation is merely flattened and the splitting off is postponed to the
> later normalization steps. So the question is this. Can we always use both
> approaches

Let's pretend I meet your criteria for answering this question ;-) then I would say that yes, either approach may be used and, if done properly, will yield the desired result.

> and if we can does it matter for the end result which one we take?

It is conceivable to me that if the list in question is implicitly ordered, then one of these approaches might be more likely to encourage the data modeler to take care to add in an explicit ordering attribute. My guess would be that starting by splitting off the nested columns up front might be more likely to retain ordering information, so I would favor that approach if you really are trying to end up with data in 1NF.

I would opt for a third approach, however, and drag those lists along for the normalization ride, treating the list as an entity that, as a whole, needs to depend on the key, the whole key, and nothing but the key. In other words, skip the 1NF step for lists that seem like good candidates for child/nested/embedded tables -- those with strong life-cycle ties to the parent, for example.

smiles. --dawn

> -- Jan Hidders
Received on Sun Jan 30 2005 - 22:41:05 CET

Original text of this message