Re: 4NF is Where It Is At! [WAS Re: 1:1 relationships]
Date: 14 Feb 2001 10:13:49 GMT
Message-ID: <96dlot$eg5$1_at_news.tue.nl>
Peter Franklin wrote:
> On 13 Feb 2001 16:31:08 GMT, hidders_at_REMOVE.THIS.win.tue.nl (Jan
> Hidders) wrote:
>
> >You're welcome. :-) And if you have any questions about 4NF I will be
> >quite happy to answer them. The problem is usually understanding what
> >exactly is a multi-valued dependency. I think I already gave a formal
> >definition earlier in this thread. Would you like a small example?
>
> Thanks for the offer. Yes, I would like an example. I am currently
> rereading Date, and hope that with an example all will become clear
> at last :-)
What often works is looking at what happens if you denormalize. Suppose, for instance, you have the following normalized tables:
Is_member_of(student, class) key: {student} Follows(class, course) key: {class, course}
They might contain the following (abstract) data:
(I hope you are looking at this in a monospaced font :-))
Is_member_of Follows ------------ ------------- student class class course ------------ ------------- st1 cl1 cl1 crs1 st2 cl1 cl1 crs2 st3 cl2 cl2 crs1 st4 cl2 cl2 crs3
Now, suppose we merge these two tables into one and see what goes wrong:
Follows2 -------------------- student class course -------------------- st1 cl1 crs1 st2 cl1 crs1 st1 cl1 crs2 st2 cl1 crs2 st3 cl2 crs1 st4 cl2 crs1 st3 cl2 crs3 st4 cl2 crs3
I hope you see that the problem is that the fact that a certain student is in a certain class is stored more than once. In some sense you could say that the reason for this is that every row now has to express two independent facts; (1) which student is in which class, and (2) which class follows which course. For us this is clear because those are the tables we started with, but how could we have detected this if we started with the Follows2 table?
The short answer is: by looking at the repeating combinations. In the example contents you see that student-class combinations are repeated and class-course combinations are repeated. But simple repetition is not enough, the repetition has to be consistent. This means that if a certain student (say st1) is in a class (say cl1) according to one record then this fact should be repeated for all courses (so, also for crs2) that the class cl1 is following. Vice versa, it should hold that if a class (say cl1) follows a certain course (crs1) according to a certain record then this should be repeated for every student in that class (so also for st2).
A more formal way of expressing this would be the following:
The MVD class ->> student holds if for every tuple (st1, cl1, co1) it holds that if there is a tuple (*, cl1, co2) then there must also be a tuple (st1, cl1, co2)
Note that if 'class' would have been a key in Follows2 (i.e. every class has only one student) then this does not lead to duplication of combinations. So that is exactly the 4NF normalization rule:
If an MVD class ->> student holds and 'class' is not a candidate key then this MVD should be split into a separate table.
Another way of looking at an MVD is as follows. The MVD class ->> student says that with every class a certain set of students is associated, and another independent set of courses. We can visualize this with the help of a nested table:
Follows2_nested ----------------------------------- { student } class { course } ----------------------------------- { st1, st2 } cl1 { crs1, crs2 } { st3, st4 } cl2 { crs1, crs2 }
As you can see, two sets are associated with every class, and we can reconstruct the original table by simply taking all the possible student-course combinations for every record in this table.
If it is the case that you can always build such a nested table without losing information then the multi-valued dependency class ->> student holds. But since the case is symmetric it will inevitably then also hold that class ->> course. Therefore, this MVD is the dual of class ->> student.
Also here you can see that if 'class' is a key in Follows2 then there will be no duplication and, therefore, no reason for a split.
Yet another way of checking if a certain MVD holds is by simply checking if the associated split is information preserving. So, for instance, for class ->> student the associated split is {class, student} and {class, course} and this is indeed information preserving. Note that if 'class' is a (candidate) key then this split is obviously information preserving. But that is exactly the case where 4NF tells you *not* to split. So in some sense you could say that 4NF says that the obvious splits (that follow directly from the keys) should not be made, but only those that do *not* directly follow from the keys.
Summarizing you have three (equivalent) ways of checking if a certain MVD holds:
1. Consistent repitition. 2. Information preservation under nesting. 3. Information preservation under splitting.
I can explain why these three are all equivalent but that would take some time and mathematics. So, I hope this already helps you understand MVDs a little better and also when and why they should be split out of the table.
-- Jan HiddersReceived on Wed Feb 14 2001 - 11:13:49 CET