Re: Primary Key Theory Question
Date: Mon, 29 Mar 2004 00:07:53 +0200
Message-ID: <9aie60p88f5p07u0osaud08km55aq03cri_at_4ax.com>
On Sun, 28 Mar 2004 18:56:03 GMT, Ben <BenNOSPAM_at_NOSPAMknieff2.com> wrote:
>Hello group-
>I am re-working the structure of a database, and I have decided to do
>the design work as 'properly' as possible. So, I've been reading through
>posts in this group for most of a day. While I wait for Celko's books to
>arrive, I thought I'd put a question to the gurus here.
Although I'm certainly not a "guru", I've been down this path before,
maybe what I say can be of a little help...
>I've come to a conceptual impasse regarding primary keys. In doing my
>pen and paper design work and I've come up with two issues: Some of my
>entities can only be uniquely identified by all of their properties and
>some of my entities have good candidate keys, but in real life, that
>data might not be known.
>
>For example - I see an address as an entity. The pseudo-DDL would look
>like this:
> table Address
> Street Address 1 alpha,
> Street Address 2 alpha,
> City alpha,
> State alpha,
> Postal code alpha,
> Country alpha
>
>It seems to me the only unique key would be combination of all of these
>properties.
Correct. Until we get DNA databases, HR data is very often just like this.
>In the past, I would add an integer ID field as the key and
>leave it at that. I have read numerous postings suggesting that this is
>poor relational design, many hinging on the fact that the generation of
>of these IDs tends to be specific to each DBMS vendor. As portability is
>a concern, I don't want to make a _big_ mistake. Additionally, I have
>run into issues on the application development side when using generated
>keys. So, what is the best way to define a primary key for an entity
>such as this one?
You need a surrogate key. I believe Joe Celko will probably recommend
some kind of hash function over all the columns ... although the hash
function will probably be more DBMS specific than a sequence, it does
have certain advantages (e.g. you might be able to sort more
efficiently with a hashed key).
>Example 2 - I see a person as an entity. SSN is a good candidate key for
>a person, but in my situation, that information will frequently be
>unknown. I see no other candidate keys. Yet if it is poor practice to
>'manufacture' a key, what approach should be taken?
SSN is also not a good choice ... it can change, it is often unknown
(foreigners to the USA and children don't have SSN), and it is not
guaranteed to be unique in all cases (don't ask me for the source on
this ... Joe will have a much better answer<g>)
>I fear that I have missed something fundamental in my understanding of
>relational design, because I know I'm not the only one to come up
>against this. Any thoughts or resources are very much appreciated.
>
>Thank you,
>Ben