Re: normalisation problems

From: Andrew R. Brink <abrink_at_linuxfreak.com>
Date: Tue, 21 Nov 2000 21:13:38 -0600
Message-ID: <XRGS5.680$VF3.927_at_news.more.net>


Would you happen to know any good URL's / books on DB normalization?

n article <8vdto5$ksp$1_at_news.tue.nl>, hidders_at_REMOVE.THIS.win.tue.nl (Jan Hidders) wrote:

> wrote:

>> 
>> I am currently working on a normalisation assignment. I feel that I
>> understand the theory of normalisation reasonably well, but have
>> difficulty applying it to the example below -
>> 
>> Relation R = < A, B, C (D, E, F, G)>
>> 
>> where A is a primary key and the attributes in brackets are a repeating
>> group.
>> 
>> Functional dependencies -
>> 
>> FD1: A -> B C FD2: A D -> E F G FD3: A E -> D F G FD4: E -> G FD5: B ->
>> C
>> 
>> I understand that R is not in the first normal form (1NF) because it
>> has a repeating group. My intuition tells me to take out B and C into a
>> separate relation whose key field is A since the other fields (D, E, F,
>> G) are not functionally dependant on B or C. Presumably A would be a
>> partial key to the remaining table (D, E, F, G). What would be its
>> complete primary key?

>
> Lets do this one step at a time. First you flatten the relation:
>
> R1(A,B,C,D,E,F,G)
>
> Now, we are going to check if we are in 2NF. For this you have to know
> what the candidate keys are. To determine them you can use the following
> rule:
>
> (X,Y and Z are sets of attributes, + = set union, - = set difference)
>
> if X is a key and
> FD Y->Z holds and Y is a subset of X
> then (X - Z) + Y is a key
>
> For example, you know that ABCDEFG is (by definition) a key of R1. You
> also know that A->BC and A is a subset of ABCDEFG. That means that you
> can leave B and C out, so ADEFG is also a key. You also know that E->G
> so you can also leave the G out: ADEF is a key. You also know that
> AE->DFG, so AE is a key. This key cannot be minimalized any further by
> any functional dependency and is therefore a minimal key, i.e., a
> candidate key.
>
> If you try all possibilities then you will find the following candidate
> keys:
>
> {AD, AE}
>
> Knowing this, you can now determine which FDs violate 2NF. This is
> straightforward but if you are going to split the tables then you also
> have to make sure that the set of dependencies is irreducible:
> 1. only single attributes on the right hand side,
> 2. the left-hand side has to be minimal and
> 3. no FD is derivable from the others.
>
> This step is often ignored but it is important if you want to be
> dependency preserving.
>
> First, lets take care of the right-hand sides by splitting the
> dependencies if neccessary:
>
> FD1a: A->B FD1b: A->C FD2a: AD->E FD2b: AD->F FD2c: AD->G FD3a: AE->D
> FD3b: AE->F FD3c: AE->G FD4: E->G FD5: B->C
>
> Then, we have to check if the left-hand sides are minimal. So, for AD->E
> we have to check if it not holds that A->E or D->E. If you check this
> you will see that AE->G has a non-minimal left-hand side because it also
> holds that E->G (FD4). In general this is solved by replacing the FD
> with the one with the smaller left-hand side. In this case this means
> duplicating FD4, so we can also simply leave FD3c out. The result:
>
> FD1a: A->B FD1b: A->C FD2a: AD->E FD2b: AD->F FD2c: AD->G FD3a: AE->D
> FD3b: AE->F FD4: E->G FD5: B->C
>
> Finally, we have to check if there are some spurious dependencies. In
> this case, for example, FD1b A->C can be derived from FD1a A->B and FD5
> B->C. You can simply solve this in general by leaving out the FDs that
> can be derived from the rest. We can also derive AD->F from AD->E and
> AE->F, and we can derive AD->G from AD->E and E->G. The result:
>
> FD1a: A->B FD2a: AD->E FD3a: AE->D FD3b: AE->F FD4: E->G FD5: B->C
>
> And now we can check which FDs violate which NFs:
>
> 2NF 3NF BCNF
>
> FD1a: A->B bad bad bad FD2a: AD->E good good good FD3a:
> AE->D good good good FD3b: AE->F good good good FD4: E->G
> bad bad bad FD5: B->C good bad bad
>
> To reach the 2NF the bad FDs have te be split off into separate tables.
> So, we get:
>
> R1.1(A,B) R1.2(E,G) R1.3(A,C,D,E,F)
>
> But, as you can see, this is not dependency preserving because FD5 is
> now split over R1.1 and R1.2. So, we will skip this phase and go to
> 3NF immediately. The split is then:
>
> R1.1(A,B) R1.2(E,G) R1.3(B,C) R1.4(A,D,E,F)
>
> As this is dependency preserving you can now assign the FDs to the
> different relations as shown below.
>
> A->B E->G B->C AD->E
> AE->D AE->F
>
> From this you can now easily derive the candidate keys of the new
> relations:
>
> {A} {E} {B} {AD,AE}
>
> As you can see there are no more FDs that violate 3NF, or even BCNF. In
> fact, all the dependencies follow from the key constraints. So you are
> not only in BCNF, but also in 4NF, 5NF and DKNF (domain key normal
> form).
>
> Kind regards,
>
> Jan Hidders
Received on Wed Nov 22 2000 - 04:13:38 CET

Original text of this message