Re: Treating addresses in 3NF?

From: Paul <pbrazier_at_cosmos-uk.co.uk>
Date: 13 Nov 2002 05:24:32 -0800
Message-ID: <51d64140.0211130524.537bf798_at_posting.google.com>


hidders_at_hcoss.uia.ac.be (Jan Hidders) wrote in message news:<3dd1459a$1_at_news.uia.ac.be>...
> In article <Pine.LNX.4.44.0211121539240.8050-100000_at_demon.ecs.soton.ac.uk>,
> Rich Leigh <rtl101_at_ecs.soton.ac.uk> wrote:
> >I know that in a lot (i.e. almost all?) databases, a customer list which
> >included an address is usually stored as CustomerID, Customer Name,
> >Address1, Address2, City, Post/Zip code, blah blah. But as a purely, uh,
> >"theoretical exercise", i.e. I don't have a bet on it with anybody at all
> >;), when this is normalised as far as 3NF, I come up with the tables;
> >Customers (CustID, CustName, AddressID)
> >Addresses (AddressID, Zip/Postcode)
> >AddressLines (SequenceNo, AddressID, Data)
>
> I'm affraid you have lost the bet. :-) Just to begin with, normalization
> does not involve introducing new identifiers. So let's take a look at your
> relation:
>
> Customer(ID, Name, Address1, Address2, City, Zip_code )
>
> How do we know that the two addresses have the same city and zip code? We
> probably don't so let's fix that:
>
> Customer(ID, Name, Addr1, City1, Zip1, Addr2, City2, Zip2)

I think this might be a misinterpretation of the original post. I think what was meant by Address1, Address2 is the separate address *lines* of the address e.g.:

Address1: 123 Acacia Avenue
Address2: Anytown
City: London
PostCode: SW12 45T

the problem is that you can get addresses like:

Flat 12,
"The Gables",
3 Leafy Ct,
High St,
AnyVillage,
AnyTown,
AnyCity,
Countyshire,
UK.
AB12 34C

though I think the Post Office does have the concept of a canonical form for an address to help get round this problem (e.g. you leave out the village since the postcode can identify it). So I don't think you need to allow for the possibility of an arbitrary number of address lines. Although I guess it depends if you are cleaning the data before it goes in the database or allowing address entry as the person wants and cleaning it later.

Now AB12 will always be in the county of Countyshire. and AB12 34C will be a group of about 10? houses (usually?) in the same road, maybe with sequential house numbers but maybe not as well. I think the combination of postcode and house number (or name if it doesn't have a number) uniquely identifies the address. Though sometimes I think companies in a single building with large amounts of mail have their own unique postcode or maybe even multiple postcodes (for different departments).

So the pair (postcode, house name or number) should in theory uniquely identify the addess. Maybe actually you need the flat number in there as well to allow for multiple addresses in the same building.

Another problem is that the counties used by the Post Office are different to the counties used for administrative purposes. e.g. someone in Greater London could be in the (non-existent) county of Middlesex for postal reasons...

Paul.

> Next question: what are the candidate keys? The usual suspect is {ID} but if
> two persons with the same name live at the same address then there is
> probably something wrong, so {Name, Addr1, City1} and {Name, Addr2, City2}
> are very likely also candidate keys. But since the zip code implies the city
> we also have the candidate keys {Name, Addr1, Zip1} and {Name, Addr2, Zip2}.
>
> So what functional dependencies are there? Since ID is the identifier we
> have the trivial:
>
> (1) ID -> Name, Addr1, City1, Zip1, Addr2, City2, Zip2
>
> But we also know that the zip code implies the city:
>
> (2) Zip1 -> City1
> (3) Zip2 -> City2
>
> Finally we also know that the zip code is determined by the address and
> city:
>
> (4) Addr1, City1 -> Zip1
> (5) Addr2, City2 -> Zip2
>
> That's about it. Actually we should now check if we have a minimal cover,
> but we have and I won't bore you with the technical details. So what does
> 3NF say:
>
> For every non-trivial FD X->A with A a non-key attribute it holds that X
> is a candidate key.
>
> So do we violate the 3NF? For (1) it is easy to see that it doesn't since X
> = {ID} and a candidate key. For (2) (3) (4) and (5) this is also easy to see
> because City1, City2, Zip1 and Zip2 are all part of a candidate key and
> therefore key attributes and the left-hand sides (the X in X->A) do not have
> to be candidate keys.
>
> So we are already in 3NF and nothing needs to be done.
>
> But what if we allow people with the same name at the same address? Let's
> even assume that it is possible that two persons with the same name live at
> the same two addresses. (Otherwise you are still in 3NF.) In this case (2),
> (3), (4) and (5) all violate 3NF so we will have to split off some of these.
> The best choice for this is (4) and (5) because if you split off (2) and (3)
> the dependencies (4) and (5) get distributed over two tables and we don't
> want that. So you get:
>
> Customer(ID, Name, Addr1, City1, Addr2, City2)
> Zip1(Addr1, City1, Zip1)
> Zip2(Addr1, City1, Zip1)
>
> Since Zip1 and Zip2 are simply subsets of the zip-code book they can be
> joined into one relation:
>
> Customer(ID, Name, Addr1, City1, Addr2, City2)
> Zip(Addr, City, Zip)
>
> So are we done now? We still have to check if the new relation is in 3NF.
> What are the candidate keys? It's just {Addr, City}. Are there any FDs? Yes,
> we have one that goes with the candidate key:
>
> (4') Addr, City -> Zip
>
> This is clearly not a problem in 3NF because {Addr, City} is the candidate
> key. Then there is the one that says that the zip code determines the city:
>
> (3') Zip -> City
>
> and this is also not a problem because City is in the candidate key and
> therefore a key attribute. So we have reached now 3NF.
>
> -- Jan Hidders
Received on Wed Nov 13 2002 - 14:24:32 CET

Original text of this message