Re: Primary Key Theory Question

From: Ben <BenNOSPAM_at_NOSPAMknieff2.com>
Date: Mon, 29 Mar 2004 00:18:44 GMT
Message-ID: <EXJ9c.4563$z%1.2293_at_twister.rdc-kc.rr.com>


Thank you for your thoughtful reply. I read one of Joe Celko's comments regarding using a hash function to create a key, and I thought, wow, great idea. Then I thought, a hash based on what? The initial values, a hash that gets updated (no), a hash of the x most significant columns?

You also brought up a great point about SSN, in the work that I am doing, the SSN not only could be duplicated, but it is in fact likely.

The select max()+1 route came to mind, but I don't like the concurrency/locking issues it brings up. A table lock for an insert just seems wrong to me.

So, if sequential, server generated surrogate keys are not inherently evil, why do I see so much negative press given to the Sybase/MS style Identity column? Outside of it being non-portable/non-standard, I can get the same functionality in MS and Oracle (my primary RDBMSs). Sure, I could replicate the functionality of a Oracle sequence for both using tables with one row and one field for each ID I need, and I could code up some handy SPs to make the coding easy, but, if the server will do it for me, why bother (I take laziness as the sign of a good coder in general).

<snip>
 > other databases have similar mechanisms (except for MS-Access, which  > isn't really a database...)

Too true about Access, and one of most painful aspects of my professional existence is that I need to always keep it in mind because some clients are still using it as a back end. I'm going to use your words next time I suggest dropping support for Access

Thanks again for your thoughts.
BK

Bob Hairgrove wrote:

> On Sun, 28 Mar 2004 18:56:03 GMT, Ben <BenNOSPAM_at_NOSPAMknieff2.com>
> wrote:
>
>

>>Hello group-
>>I am re-working the structure of a database, and I have decided to do 
>>the design work as 'properly' as possible. So, I've been reading through 
>>posts in this group for most of a day. While I wait for Celko's books to 
>>arrive, I thought I'd put a question to the gurus here.

>
>
> Although I'm certainly not a "guru", I've been down this path before,
> maybe what I say can be of a little help...
>
>
>>I've come to a conceptual impasse regarding primary keys. In doing my 
>>pen and paper design work and I've come up with two issues: Some of my 
>>entities can only be uniquely identified by all of their properties and 
>>some of my entities have good candidate keys, but in real life, that 
>>data might not be known.
>>
>>For example - I see an address as an entity. The pseudo-DDL would look 
>>like this:
>>	table Address
>>	Street Address 1	alpha,
>>	Street Address 2	alpha,
>>	City			alpha,
>>	State			alpha,
>>	Postal code		alpha,
>>	Country			alpha
>>
>>It seems to me the only unique key would be combination of all of these 
>>properties. 

>
>
> Correct. Until we get DNA databases, HR data is very often just like
> this.
>
>
>>In the past, I would add an integer ID field as the key and 
>>leave it at that. I have read numerous postings suggesting that this is 
>>poor relational design, many hinging on the fact that the generation of 
>>of these IDs tends to be specific to each DBMS vendor. As portability is 
>>a concern, I don't want to make a _big_ mistake. Additionally, I have 
>>run into issues on the application development side when using generated 
>>keys. So, what is the best way to define a primary key for an entity 
>>such as this one?

>
>
> You need a surrogate key. I believe Joe Celko will probably recommend
> some kind of hash function over all the columns ... although the hash
> function will probably be more DBMS specific than a sequence, it does
> have certain advantages (e.g. you might be able to sort more
> efficiently with a hashed key).
>
> If you want to have a "portable sequence", you can add an integer
> column and select max()+1 of that column when you do an update. Be
> sure that you have a mechanism for locking the table when more than 1
> user is inserting data, though ... this can be a real drag on
> performance.
>
>
>>Example 2 - I see a person as an entity. SSN is a good candidate key for 
>>a person, but in my situation, that information will frequently be 
>>unknown. I see no other candidate keys. Yet if it is poor practice to 
>>'manufacture' a key, what approach should be taken?

>
>
> SSN is also not a good choice ... it can change, it is often unknown
> (foreigners to the USA and children don't have SSN), and it is not
> guaranteed to be unique in all cases (don't ask me for the source on
> this ... Joe will have a much better answer<g>)
>
>
>>I fear that I have missed something fundamental in my understanding of 
>>relational design, because I know I'm not the only one to come up 
>>against this. Any thoughts or resources are very much appreciated.
>>
>>Thank you,
>>Ben

>
>
> Just my 2 cents' worth ... you'll need some kind of auto-sequence or
> else a "before insert" trigger to generate either a hash key or get
> the next sequence value ... Better yet, do all updates/inserts/deletes
> through some kind of API so that you can better control security (i.e.
> access to the database). Oracle has roles for this sort of thing,
> other databases have similar mechanisms (except for MS-Access, which
> isn't really a database...)
>
>
> --
> Bob Hairgrove
> NoSpamPlease_at_Home.com
Received on Mon Mar 29 2004 - 02:18:44 CEST

Original text of this message