Re: normalisation problems

From: Jan Hidders <hidders_at_REMOVE.THIS.win.tue.nl>
Date: 21 Nov 2000 13:36:05 GMT
Message-ID: <8vdto5$ksp$1_at_news.tue.nl>


 wrote:
>
> I am currently working on a normalisation assignment. I feel that I
> understand the theory of normalisation reasonably well, but have
> difficulty applying it to the example below -
>
> Relation R = < A, B, C (D, E, F, G)>
>
> where A is a primary key and the attributes in brackets are a
> repeating group.
>
> Functional dependencies -
>
> FD1: A -> B C
> FD2: A D -> E F G
> FD3: A E -> D F G
> FD4: E -> G
> FD5: B -> C
>
> I understand that R is not in the first normal form (1NF) because it
> has a repeating group. My intuition tells me to take out B and C into
> a separate relation whose key field is A since the other fields (D,
> E, F, G) are not functionally dependant on B or C. Presumably A would
> be a partial key to the remaining table (D, E, F, G). What would be
> its complete primary key?

Lets do this one step at a time. First you flatten the relation:

  R1(A,B,C,D,E,F,G)

Now, we are going to check if we are in 2NF. For this you have to know what the candidate keys are. To determine them you can use the following rule:

(X,Y and Z are sets of attributes, + = set union, - = set difference)

  if X is a key and

     FD Y->Z holds and
     Y is a subset of X

  then (X - Z) + Y is a key

For example, you know that ABCDEFG is (by definition) a key of R1. You also know that A->BC and A is a subset of ABCDEFG. That means that you can leave B and C out, so ADEFG is also a key. You also know that E->G so you can also leave the G out: ADEF is a key. You also know that AE->DFG, so AE is a key. This key cannot be minimalized any further by any functional dependency and is therefore a minimal key, i.e., a candidate key.

If you try all possibilities then you will find the following candidate keys:

  {AD, AE}

Knowing this, you can now determine which FDs violate 2NF. This is straightforward but if you are going to split the tables then you also have to make sure that the set of dependencies is irreducible:

1. only single attributes on the right hand side,
2. the left-hand side has to be minimal and
3. no FD is derivable from the others.

This step is often ignored but it is important if you want to be dependency preserving.

First, lets take care of the right-hand sides by splitting the dependencies if neccessary:

FD1a: A->B
FD1b: A->C
FD2a: AD->E
FD2b: AD->F
FD2c: AD->G
FD3a: AE->D
FD3b: AE->F
FD3c: AE->G

FD4: E->G
FD5: B->C

Then, we have to check if the left-hand sides are minimal. So, for AD->E we have to check if it not holds that A->E or D->E. If you check this you will see that AE->G has a non-minimal left-hand side because it also holds that E->G (FD4). In general this is solved by replacing the FD with the one with the smaller left-hand side. In this case this means duplicating FD4, so we can also simply leave FD3c out. The result:

FD1a: A->B
FD1b: A->C
FD2a: AD->E
FD2b: AD->F
FD2c: AD->G
FD3a: AE->D
FD3b: AE->F

FD4: E->G
FD5: B->C

Finally, we have to check if there are some spurious dependencies. In this case, for example, FD1b A->C can be derived from FD1a A->B and FD5 B->C. You can simply solve this in general by leaving out the FDs that can be derived from the rest. We can also derive AD->F from AD->E and AE->F, and we can derive AD->G from AD->E and E->G. The result:

FD1a: A->B
FD2a: AD->E
FD3a: AE->D
FD3b: AE->F

FD4: E->G
FD5: B->C

And now we can check which FDs violate which NFs:

               2NF 3NF BCNF

FD1a: A->B     bad   bad   bad
FD2a: AD->E    good  good  good
FD3a: AE->D    good  good  good
FD3b: AE->F    good  good  good
FD4:  E->G     bad   bad   bad
FD5:  B->C     good  bad   bad

To reach the 2NF the bad FDs have te be split off into separate tables. So, we get:

  R1.1(A,B) R1.2(E,G) R1.3(A,C,D,E,F)

But, as you can see, this is not dependency preserving because FD5 is now split over R1.1 and R1.2. So, we will skip this phase and go to 3NF immediately. The split is then:

  R1.1(A,B) R1.2(E,G) R1.3(B,C) R1.4(A,D,E,F)

As this is dependency preserving you can now assign the FDs to the different relations as shown below.

  A->B       E->G       B->C       AD->E
                                   AE->D
                                   AE->F

From this you can now easily derive the candidate keys of the new relations:

  {A} {E} {B} {AD,AE}

As you can see there are no more FDs that violate 3NF, or even BCNF. In fact, all the dependencies follow from the key constraints. So you are not only in BCNF, but also in 4NF, 5NF and DKNF (domain key normal form).

Kind regards,

    Jan Hidders Received on Tue Nov 21 2000 - 14:36:05 CET

Original text of this message