Re: Newbie question about db normalization theory: redundant keys OK?

From: -CELKO- <jcelko212_at_earthlink.net>
Date: Fri, 14 Dec 2007 08:50:14 -0800 (PST)
Message-ID: <ab2d49f5-e88d-417a-9eaf-d55024ee05e8_at_d4g2000prg.googlegroups.com>


On Dec 14, 2:26 am, raylopez99 <raylope..._at_yahoo.com> wrote:
> On Dec 13, 2:10 pm, mAsterdam <mAster..._at_vrijdag.org> wrote:
>
>
>
> > raylopez99 schreef:
>
> > > With a few hours of theory under my belt, I'd like to ask if there's
> > > ever a time that you don't want a completely normalized dB, that is, a
> > > normalized database being a dB that has no redundant information (my
> > > understanding of what a normalized database is).
>
> > This question is very well put.
>
> > First, about your understanding.
> > The goal of normalizing is not to get rid of redundancies but to get
> > rid of update anomalies. If, when in the real world your database is
> > modeling one thing changes and you have to change several pieces of
> > data in your database to keep it up to date, this means you have an
> > update anomaly.
>
> > Several (mostly increasingly strict) normal forms exist to check if
> > your set of relations suffers from some anomaly with recepies of
> > changes to it to make it comply. Just google for normal form if you
> > didn't already.
>
> Thanks; I did Google it,and learned a bit about 1NF, 2NF, 3NF here:http://defiant.yk.psu.edu/~lxn/IST_210/normal_form_definitions.html
>> While I didn't really understand the differences between the three types of NF <<

It gets a little tricky and a Newsgroup is not a good place to get the details. The idea of 1NF is that all the columns (attributes) are expressed as scalar values -- no sequences, arrays, linked lists or fancy stuff. This is a starting point. Maybe we should have had a "Zero Normal Form" for file systems which allow complicated structures :)

Something in 2NF has two or more "simple facts" hidden inside it. A simple example would be (teacher, student, dept) which hides the (teacher, dept) facts -- "Mr. Celko is in the CS department" and (student, dept) facts -- "Ms. Lohan is a CS student". Chris Date has done a good job of discussing the idea of RDBMS rows as facts and he is worth reading.

Something in 3NF has a key (one or more columns, no problem!) that determine all the non-key attributes. For example, (longitude, latitude, landmark_name) have a key (longitude, latitude). Notice that the pair is atomic (from the Greek "a-toma" meaning indivisible or literally "without parts"); the whole key is required to find your landmark.

After that, things get tricky. Ever hear of the "Eight Queens Problem"? Given a chess board and eight queens, put them on the board so that no queen can capture another. Classic programming problem. This can be modeled as (board_row, board_column, queen_nbr), but any two of the three determines the third column.

>>I like to use GUIDs as the primary key almost exclusively, I don't really suffer from this problem <<

Stop that at once! You are not doing RDBMS at all and have no data integrity. That is another whole rant.

>> employing a "star" configuration" or "cluster", as suggested or implied by David Cressey. <<

Star schemas work for Data Warehouses because the data coming into them came from normalized sources and it NEVER changes once in the DW. It is read-only and can suffer only an INSERT anomaly -- usually from trying to load the same data twice. Received on Fri Dec 14 2007 - 17:50:14 CET

Original text of this message