Re: normalization question

From: Jan.Hidders <hidders_at_hcoss.uia.ac.be>
Date: 14 Aug 2002 22:35:30 +0200
Message-ID: <3d5abf12$1_at_news.uia.ac.be>


In article <ulkigftlfm4i62_at_corp.supernews.com>, Manatee Morton <manatee_at_nwlink.com> wrote:
>
>What would be a good defination of normalization? Could someone explain what
>the first through third normal forms are?

So you want a "Database normalization for Dummies", preferrably in just a few paragraphs? :-) Let me give it a try. If you allow me I will present it as a little dialogue.


Q: Can you explain to me what normalization is?

  1. I will tell you in a minute, but first let me tell you something about the relational model. A fundamental assumption of the relational model is that all the tuples in a relation represent a statement that (we hope) is a fact. You probably already understood this but this is just so you understand the terminology that follows.
  2. Oh, ok. So will you now tell me what normalization is?
  3. In a minute, but first let me explain that we can have different facts mixed into the same relation. For example, consider the following two relations:
     Has_address(name, address)  and 
     Has_date_of_birth(name, date_of_birth)

   Both represent different facts but we can mix them also into one relation:

     Person(name, address, date_of_birth)

Q: Sure, so we can mix different facts into the same relation. Is that a

   problem?

  1. Not in this case, but sometimes it can be. Suppose we mix the following two relations:
     Has_address(name, address)  and
     Has_hobbies(name, hobby)  

   That results in:

     Person2(name, address, hobby)

   This is not a pleasant mix if you look for example at a possible content    of the relation:

     name   address      hobby
     --------------------------------
     John   Broadway 3   Origami
     John   Broadway 3   Ice Hockey
     Mary   Oak Ridge 2  Wrestling
     Mary   Oak Ridge 2  Scuba diving
     Harry  Broadway 3   Cooking
     Harry  Broadway 3   Origami

   If we compare it with the original table there are some obvious problems    here: the address information is repeated unnecessarily and we cannot    store the address of somebody who doesn't have a hobby unless we use some    trick like NULL values.

Q: I see, but what has this to do with normalization?

  1. Normalization is about finding out if a relation contains a bad mix or not and what we can do about it.
  2. Sounds useful, so how do we do that?
  3. By looking at the functional dependencies and the candidate keys, and if you really want to know that your mix is without these problems then you also have to look at the multi-valued dependencies and the join dependencies. But I'll save that for another lesson.
  4. Great. So how can we see if a mix is bad?
  5. For a good mix it must hold that: if there is a functional dependency A1,...,An -> B such that
    1. the set {A1,...,An} is minimal, i.e., B is not functionally dependent upon a proper subset of this set, and
    2. the dependency is not trivial, i.e., B is not in the set {A1,...,An} then {A1,...,An} must be a candidate key.

   So if there is a functional dependency that satisfies 1. and 2. but    {A1,...,An} is not a candidate key, then it is a bad mix.

Q: Wait, wait! You haven't explained yet what functional dependencies and

   candidate keys are!

  1. That's in your book. You can read that for yourself. If you have you will see that our relation Person2 has only one candidate key ({name, hobby}) and only one functional dependency that satisfies 1. and 2. (name -> address). Since {name} is not a candidate key the previous rule tells us that this is a bad mix.
  2. But this doesn't look like the definition of 3NF in my book.
  3. There are several ways to define the normal forms, but the main reason is probably that it defines the Boyce-Codd Normal Form and not 3NF. If we don't consider the multi-valued and join-dependencies then it's the strongest but also the easiest NF and it tells us exactly if there are functional dependencies that cause redundancies or not. I will explain more about 3NF later on.
  4. Ok. So now we know that it is a bad mix, what do we do about it?
  5. First note that a dependency actually corresponds with a certain fact in the mix. In the Person2 relation it is the fact that a person lives at a certain address that corresponds with the offending dependency. So what we have to do is take this fact out of the mix and put it into its own relation. This new relation will have the columns {A1,...,An,B} (in this case {name, address}) and from the old table we remove the B column (so we are left with {name, hobbie}).

   After this we check again if the mix is bad and if it is we again give    the offending dependency its own relation.

Q: Ok. But if BCNF is such a great normal form, why then do we still have

   3NF?

  1. Because if you normalize all the way to BCNF you sometimes unmix a little too much.
  2. What does that mean exactly?
  3. That in the result there will be dependencies that hold between columns in different relations and do not logically follow from dependencies between columns within the relations.
  4. Why would that be a problem?
  5. Because these dependencies are database constraints that should be maintained by the database. Dependencies that hold over columns in different relations are usually computationally expensive to maintain.
  6. So how does 3NF prevent this?
  7. It allows certain dependencies to stay within the relation even though in BCNF they would give them their own relation.
  8. Which dependencies?
  9. The ones where B is in at least one of the candidate keys?
  10. Why those?
  11. Because if you put them in a separate relation you are removing a column from the relation that was part of a candidate key. For example, if we have a relation R(a, b, c, d) with candidate key {a, b} and a dependency c -> b then we should give c -> b its own relation and remove b from the old relation. But then the dependencies corresponding to the candidate key ( a, b -> c and a, b -> d) would hold over two relations. So you see that if we remove a column that belongs to a candidate key we get dependencies over more than one relation, and that is why 3NF doesn't consider them offending dependencies.
  12. Ok. I think I understand. But you said that BCNF exactly indicated if there were still redundancy problems or not, so that means that 3NF actually still leaves some of those problems in the relations where BCNF would remove them, right?
  13. Exactly. There is a trade-off here, either you may end up with dependencies over different relations or you are left with redundancies and anomalies.
  14. And there is no clever trick to get around that?
  15. No.
  16. Bummer.
  17. Yeah.
  18. Are you going to say something about 1NF and 2NF?
  19. Well, if you insist, but I'm running a bit short on time here, so I will be brief.
  20. If you must.
  21. I must. The 1NF only says that the columns in you relations should contain atomic values. This requirement is the same as one of the fundamental assumptions of the relational model, so it actually just says that your relations should satisfy the relational model.
  22. And the 2NF?
  23. That's actually not a very useful normal form but it is often used to give an introduction to the 3NF.
  24. What's the differenct between 2NF and 3NF then?
  25. Where 3NF says that {A1,...,An} must be a candidate key, (i.e. is not a proper subset and not a proper superset of a candidate key) the requirement in 2NF is just that {A1,...,An} must not a proper subset of a candidate key.
  26. I'm not sure if that helps me a lot to understand 3NF.
  27. Well, it helps a little to understand the usual 'pons asinorum' for 3NF and BCNF: 'Every (non-key) attribute depends upon the key, the whole key and nothing but the key. So help me Codd.'
  28. Sounds cute, but I'm not sure if I follow. What's this about *the* key? I thought there could be more than one key!? Can you explain this a bit more? And what is a pons asinorum anyway?
  29. Sorry. Really have to run now. See you next time. Bye bye.
  30. But, we haven't even discussed 4NF and 5NF yet! ... bye.

Kind regards,

  • Jan Hidders
Received on Wed Aug 14 2002 - 22:35:30 CEST

Original text of this message