Re: Newbie question about db normalization theory: redundant keys OK?

From: Dr. Dweeb <spam_at_dweeb.net>
Date: Sun, 16 Dec 2007 21:54:20 +0100
Message-ID: <4765907d$0$21925$157c6196_at_dreader1.cybercity.dk>


Bob Badour wrote:
> David Cressey wrote:
>
>> "raylopez99" <raylopez99_at_yahoo.com> wrote in message
>> news:15b312ea-1f66-4f22-abbb-63581e0eca73_at_x69g2000hsx.googlegroups.com...
>>
>>> With a few hours of theory under my belt, I'd like to ask if there's
>>> ever a time that you don't want a completely normalized dB, that
>>> is, a normalized database being a dB that has no redundant
>>> information (my understanding of what a normalized database is).
>>>
>>> Or, is there ever a time that you want redundant keys (that is, the
>>> same keys in many different tables, that obviously are not linked
>>> (in a relationship) between two tables?). Having redundant
>>> attributes and/ or keys seems to me a very lazy way of designing a
>>> database that doesn't require lots of initial thought, but of
>>> course you have to pay for it by meticulously "synching" all
>>> redundant keys to one another everytime there is a change in one of
>>> the redundant keys, so the keys don't drift and have different
>>> values. But is there ever a time you want to do this?
>>>
>>> THanks in advance
>>>
>>> RL
>>
>> The answer is yes, there are times when a design is a good one,
>> even if less than fully normalized. For each normalization form,
>> there is a known set of anomalies that come up when you insert,
>> update, or delete data in that form. If you are willing and able to
>> program around those anomalies, and if the design yields benefits
>> that justify that effort, it can be the right thing to do. Learning when
>> to normalize is more subtle than learning how to
>> normalize. There is a particular form of database design, called "star
>> schema"
>> that yields good results when used in a data mart or data warehouse
>> situation. A star schema mimics a multidimensional database in
>> relational (or SQL) form. A star schema follows design rules of its
>> own, and those rules sometimes contradict the rules of
>> normalization. The up side of star schema is that it's very easy to
>> use with report generators, or with OLAP tools like Cognos or
>> Business Objects. The down side of star schema is that the process
>> of keeping the data current involves some fairly intricate
>> programming, and heavy use of computer resources. Star schema, and
>> other unnormalized or denormalized designs almost
>> always cost more than they are worth when used in a high transaction
>> operational setting, like OLTP.
>>
>> Unfortunately, most deviations from normalization occur due to
>> blunders, and not due to well considered design decisions. Many
>> deviations from normalization occur because the designer is
>> unfamiliar with some of the normal forms. Back when I was building
>> databases, I only really knew 1NF, 2NF, and 3NF. Update anomalies
>> due to deviations from BCNF and beyond were rare, but my design
>> process would not have obviated them. Another major cause of deviations
>> from normalization is failure to
>> understand the data. In particular, the functional dependendencies
>> inherent in the data are not discovered during data analysis, and
>> the design unknowingly violates normalization rules. By the time
>> this is discovered, there is usually a large body of application
>> code that is dependent of the bad design.
>>
>> Sometimes, denormalized design is the reult of sheer pigheadedness.
>
> Don't listen to a work David says. Star schema was sold by Cognos and
> Business Objects so their customers would have to do the work they
> should have done in the first place.
>
> I seldom see anyone 'denormalize' who is aware of the actual costs of
> doing so. On the other hand, I have seen plenty of ignoramuses
> 'denormalize' when physical clustering for the same performance
> characteristics was an available option.

Me too. The inability of a large number of "dba"s to distinguish between logical and physical and to know which is which is trult astonishing - and quite sad, given the fundamental nature of this distinction and its relevance to relational databases.

This forum is full of examples of people who just have not understood it, and propose bizarre ideas as "performance" enhancing, essentially because they haven't got a clue what an optimizer is, why it is there, and just what forms of optimization are in fact possible within any specific product. Throw in physical proximity and other physical placement solutions and one quickly realises that most problems are solvable without dicking around with the logical model.

Dr. Dweeb
(an occasional lurker and OracleRdb guru) Received on Sun Dec 16 2007 - 21:54:20 CET

Original text of this message