Re: Treating addresses in 3NF?

From: Jan Hidders <hidders_at_hcoss.uia.ac.be>
Date: 12 Nov 2002 19:16:59 +0100
Message-ID: <3dd1459a$1_at_news.uia.ac.be>


In article <Pine.LNX.4.44.0211121539240.8050-100000_at_demon.ecs.soton.ac.uk>, Rich Leigh <rtl101_at_ecs.soton.ac.uk> wrote:
>I know that in a lot (i.e. almost all?) databases, a customer list which
>included an address is usually stored as CustomerID, Customer Name,
>Address1, Address2, City, Post/Zip code, blah blah. But as a purely, uh,
>"theoretical exercise", i.e. I don't have a bet on it with anybody at all
>;), when this is normalised as far as 3NF, I come up with the tables;
>Customers (CustID, CustName, AddressID)
>Addresses (AddressID, Zip/Postcode)
>AddressLines (SequenceNo, AddressID, Data)

I'm affraid you have lost the bet. :-) Just to begin with, normalization does not involve introducing new identifiers. So let's take a look at your relation:

  Customer(ID, Name, Address1, Address2, City, Zip_code )

How do we know that the two addresses have the same city and zip code? We probably don't so let's fix that:

  Customer(ID, Name, Addr1, City1, Zip1, Addr2, City2, Zip2)

Next question: what are the candidate keys? The usual suspect is {ID} but if two persons with the same name live at the same address then there is probably something wrong, so {Name, Addr1, City1} and {Name, Addr2, City2} are very likely also candidate keys. But since the zip code implies the city we also have the candidate keys {Name, Addr1, Zip1} and {Name, Addr2, Zip2}.

So what functional dependencies are there? Since ID is the identifier we have the trivial:

  (1) ID -> Name, Addr1, City1, Zip1, Addr2, City2, Zip2

But we also know that the zip code implies the city:

  (2) Zip1 -> City1
  (3) Zip2 -> City2

Finally we also know that the zip code is determined by the address and city:

  (4) Addr1, City1 -> Zip1
  (5) Addr2, City2 -> Zip2

That's about it. Actually we should now check if we have a minimal cover, but we have and I won't bore you with the technical details. So what does 3NF say:

  For every non-trivial FD X->A with A a non-key attribute it holds that X   is a candidate key.

So do we violate the 3NF? For (1) it is easy to see that it doesn't since X = {ID} and a candidate key. For (2) (3) (4) and (5) this is also easy to see because City1, City2, Zip1 and Zip2 are all part of a candidate key and therefore key attributes and the left-hand sides (the X in X->A) do not have to be candidate keys.

So we are already in 3NF and nothing needs to be done.

But what if we allow people with the same name at the same address? Let's even assume that it is possible that two persons with the same name live at the same two addresses. (Otherwise you are still in 3NF.) In this case (2), (3), (4) and (5) all violate 3NF so we will have to split off some of these. The best choice for this is (4) and (5) because if you split off (2) and (3) the dependencies (4) and (5) get distributed over two tables and we don't want that. So you get:

  Customer(ID, Name, Addr1, City1, Addr2, City2)
  Zip1(Addr1, City1, Zip1)
  Zip2(Addr1, City1, Zip1)

Since Zip1 and Zip2 are simply subsets of the zip-code book they can be joined into one relation:

  Customer(ID, Name, Addr1, City1, Addr2, City2)   Zip(Addr, City, Zip)

So are we done now? We still have to check if the new relation is in 3NF. What are the candidate keys? It's just {Addr, City}. Are there any FDs? Yes, we have one that goes with the candidate key:

  (4') Addr, City -> Zip

This is clearly not a problem in 3NF because {Addr, City} is the candidate key. Then there is the one that says that the zip code determines the city:

  (3') Zip -> City

and this is also not a problem because City is in the candidate key and therefore a key attribute. So we have reached now 3NF.

  • Jan Hidders
Received on Tue Nov 12 2002 - 19:16:59 CET

Original text of this message