Re: Primary Key Theory Question
Date: Mon, 29 Mar 2004 00:18:44 GMT
Message-ID: <EXJ9c.4563$z%1.2293_at_twister.rdc-kc.rr.com>
Thank you for your thoughtful reply. I read one of Joe Celko's comments regarding using a hash function to create a key, and I thought, wow, great idea. Then I thought, a hash based on what? The initial values, a hash that gets updated (no), a hash of the x most significant columns?
You also brought up a great point about SSN, in the work that I am doing, the SSN not only could be duplicated, but it is in fact likely.
The select max()+1 route came to mind, but I don't like the concurrency/locking issues it brings up. A table lock for an insert just seems wrong to me.
So, if sequential, server generated surrogate keys are not inherently
evil, why do I see so much negative press given to the Sybase/MS style
Identity column? Outside of it being non-portable/non-standard, I can
get the same functionality in MS and Oracle (my primary RDBMSs). Sure, I
could replicate the functionality of a Oracle sequence for both using
tables with one row and one field for each ID I need, and I could code
up some handy SPs to make the coding easy, but, if the server will do it
for me, why bother (I take laziness as the sign of a good coder in general).
<snip>
Too true about Access, and one of most painful aspects of my
professional existence is that I need to always keep it in mind because
> other databases have similar mechanisms (except for MS-Access, which
> isn't really a database...)
Thanks again for your thoughts.
BK
Bob Hairgrove wrote:
> On Sun, 28 Mar 2004 18:56:03 GMT, Ben <BenNOSPAM_at_NOSPAMknieff2.com>
> wrote:
>
>
>>Hello group- >>I am re-working the structure of a database, and I have decided to do >>the design work as 'properly' as possible. So, I've been reading through >>posts in this group for most of a day. While I wait for Celko's books to >>arrive, I thought I'd put a question to the gurus here.
>
>
> Although I'm certainly not a "guru", I've been down this path before,
> maybe what I say can be of a little help...
>
>
>>I've come to a conceptual impasse regarding primary keys. In doing my >>pen and paper design work and I've come up with two issues: Some of my >>entities can only be uniquely identified by all of their properties and >>some of my entities have good candidate keys, but in real life, that >>data might not be known. >> >>For example - I see an address as an entity. The pseudo-DDL would look >>like this: >> table Address >> Street Address 1 alpha, >> Street Address 2 alpha, >> City alpha, >> State alpha, >> Postal code alpha, >> Country alpha >> >>It seems to me the only unique key would be combination of all of these >>properties.
>
>
> Correct. Until we get DNA databases, HR data is very often just like
> this.
>
>
>>In the past, I would add an integer ID field as the key and >>leave it at that. I have read numerous postings suggesting that this is >>poor relational design, many hinging on the fact that the generation of >>of these IDs tends to be specific to each DBMS vendor. As portability is >>a concern, I don't want to make a _big_ mistake. Additionally, I have >>run into issues on the application development side when using generated >>keys. So, what is the best way to define a primary key for an entity >>such as this one?
>
>
> You need a surrogate key. I believe Joe Celko will probably recommend
> some kind of hash function over all the columns ... although the hash
> function will probably be more DBMS specific than a sequence, it does
> have certain advantages (e.g. you might be able to sort more
> efficiently with a hashed key).
>
> If you want to have a "portable sequence", you can add an integer
> column and select max()+1 of that column when you do an update. Be
> sure that you have a mechanism for locking the table when more than 1
> user is inserting data, though ... this can be a real drag on
> performance.
>
>
>>Example 2 - I see a person as an entity. SSN is a good candidate key for >>a person, but in my situation, that information will frequently be >>unknown. I see no other candidate keys. Yet if it is poor practice to >>'manufacture' a key, what approach should be taken?
>
>
> SSN is also not a good choice ... it can change, it is often unknown
> (foreigners to the USA and children don't have SSN), and it is not
> guaranteed to be unique in all cases (don't ask me for the source on
> this ... Joe will have a much better answer<g>)
>
>
>>I fear that I have missed something fundamental in my understanding of >>relational design, because I know I'm not the only one to come up >>against this. Any thoughts or resources are very much appreciated. >> >>Thank you, >>Ben
>
>
> Just my 2 cents' worth ... you'll need some kind of auto-sequence or
> else a "before insert" trigger to generate either a hash key or get
> the next sequence value ... Better yet, do all updates/inserts/deletes
> through some kind of API so that you can better control security (i.e.
> access to the database). Oracle has roles for this sort of thing,
> other databases have similar mechanisms (except for MS-Access, which
> isn't really a database...)
>
>
> --
> Bob Hairgrove
> NoSpamPlease_at_Home.com
Received on Mon Mar 29 2004 - 02:18:44 CEST