Re: Primary Key Theory Question

From: Bob Hairgrove <wouldnt_you_like_at_to_know.com>
Date: Mon, 29 Mar 2004 00:07:53 +0200
Message-ID: <9aie60p88f5p07u0osaud08km55aq03cri_at_4ax.com>


On Sun, 28 Mar 2004 18:56:03 GMT, Ben <BenNOSPAM_at_NOSPAMknieff2.com> wrote:

>Hello group-
>I am re-working the structure of a database, and I have decided to do
>the design work as 'properly' as possible. So, I've been reading through
>posts in this group for most of a day. While I wait for Celko's books to
>arrive, I thought I'd put a question to the gurus here.

Although I'm certainly not a "guru", I've been down this path before, maybe what I say can be of a little help...

>I've come to a conceptual impasse regarding primary keys. In doing my
>pen and paper design work and I've come up with two issues: Some of my
>entities can only be uniquely identified by all of their properties and
>some of my entities have good candidate keys, but in real life, that
>data might not be known.
>
>For example - I see an address as an entity. The pseudo-DDL would look
>like this:
> table Address
> Street Address 1 alpha,
> Street Address 2 alpha,
> City alpha,
> State alpha,
> Postal code alpha,
> Country alpha
>
>It seems to me the only unique key would be combination of all of these
>properties.

Correct. Until we get DNA databases, HR data is very often just like this.

>In the past, I would add an integer ID field as the key and
>leave it at that. I have read numerous postings suggesting that this is
>poor relational design, many hinging on the fact that the generation of
>of these IDs tends to be specific to each DBMS vendor. As portability is
>a concern, I don't want to make a _big_ mistake. Additionally, I have
>run into issues on the application development side when using generated
>keys. So, what is the best way to define a primary key for an entity
>such as this one?

You need a surrogate key. I believe Joe Celko will probably recommend some kind of hash function over all the columns ... although the hash function will probably be more DBMS specific than a sequence, it does have certain advantages (e.g. you might be able to sort more efficiently with a hashed key).

If you want to have a "portable sequence", you can add an integer column and select max()+1 of that column when you do an update. Be sure that you have a mechanism for locking the table when more than 1 user is inserting data, though ... this can be a real drag on performance.

>Example 2 - I see a person as an entity. SSN is a good candidate key for
>a person, but in my situation, that information will frequently be
>unknown. I see no other candidate keys. Yet if it is poor practice to
>'manufacture' a key, what approach should be taken?

SSN is also not a good choice ... it can change, it is often unknown (foreigners to the USA and children don't have SSN), and it is not guaranteed to be unique in all cases (don't ask me for the source on this ... Joe will have a much better answer<g>)

>I fear that I have missed something fundamental in my understanding of
>relational design, because I know I'm not the only one to come up
>against this. Any thoughts or resources are very much appreciated.
>
>Thank you,
>Ben

Just my 2 cents' worth ... you'll need some kind of auto-sequence or else a "before insert" trigger to generate either a hash key or get the next sequence value ... Better yet, do all updates/inserts/deletes through some kind of API so that you can better control security (i.e. access to the database). Oracle has roles for this sort of thing, other databases have similar mechanisms (except for MS-Access, which isn't really a database...)

--
Bob Hairgrove
NoSpamPlease_at_Home.com
Received on Mon Mar 29 2004 - 00:07:53 CEST

Original text of this message