Re: Primary Key Theory Question

From: Ben <BenNOSPAM_at_NOSPAMknieff2.com>
Date: Tue, 30 Mar 2004 02:00:34 GMT
Message-ID: <6x4ac.6065$z%1.5806_at_twister.rdc-kc.rr.com>


Wow, I really appreciate everyone's thoughts here. I have some new resources to check out and I have done a lot of re-thinking my assumptions.

It seems to be agreed that many (most) times a natural key is either not available, not a good choice (the SSN situation, which I fully agree with), or too 'wide' to be practical. The question then turns to generating a good surrogate key. In the past I have always used implementation specific methods to accomplish this (Identity columns, sequences). The select max() + 1 approach makes a lot of sense, yet from the point of view of programming an application, this means getting a table lock for two round trips to the server (as I need the new key in the application logic). This just doesn't seem right. With an identity or a sequence, I an return the key value to my application in one call, without the need to lock the table (although that may happen under the covers, I haven't checked). Generating a random number in the application seems a bit, well, crude. I like to keep my DB/CRUD code and app code separated.

Another option seems to be to create a key based on hash values. This works fine if the data are not sparse, which can happen in my problem domain. In fact, it can happen that I need two records with *identical* known data (f.e. a person needs to be added, but the name will be John Doe until the real name of the person can be updated later, possibly days or months). This seems odd, but in my problem domain the possibility is likely.

I know that Mr. Celko does not agree with using Identity columns, I think I understand the reasoning/theory behind it, leading me to believe that sequences are also not ideal. At the practical level, generating a unique surrogate key (without these tools) seems to be escaping me.

Thanks again for everyone's input.

Ben wrote:
> Hello group-
> I am re-working the structure of a database, and I have decided to do
> the design work as 'properly' as possible. So, I've been reading through
> posts in this group for most of a day. While I wait for Celko's books to
> arrive, I thought I'd put a question to the gurus here.
>
> I've come to a conceptual impasse regarding primary keys. In doing my
> pen and paper design work and I've come up with two issues: Some of my
> entities can only be uniquely identified by all of their properties and
> some of my entities have good candidate keys, but in real life, that
> data might not be known.
>
> For example - I see an address as an entity. The pseudo-DDL would look
> like this:
> table Address
> Street Address 1 alpha,
> Street Address 2 alpha,
> City alpha,
> State alpha,
> Postal code alpha,
> Country alpha
>
> It seems to me the only unique key would be combination of all of these
> properties. In the past, I would add an integer ID field as the key and
> leave it at that. I have read numerous postings suggesting that this is
> poor relational design, many hinging on the fact that the generation of
> of these IDs tends to be specific to each DBMS vendor. As portability is
> a concern, I don't want to make a _big_ mistake. Additionally, I have
> run into issues on the application development side when using generated
> keys. So, what is the best way to define a primary key for an entity
> such as this one?
>
> Example 2 - I see a person as an entity. SSN is a good candidate key for
> a person, but in my situation, that information will frequently be
> unknown. I see no other candidate keys. Yet if it is poor practice to
> 'manufacture' a key, what approach should be taken?
>
> I fear that I have missed something fundamental in my understanding of
> relational design, because I know I'm not the only one to come up
> against this. Any thoughts or resources are very much appreciated.
>
> Thank you,
> Ben
>
Received on Tue Mar 30 2004 - 04:00:34 CEST

Original text of this message