Re: Design approaches about primary key

From: Noons <wizofoz2k_at_yahoo.com.au.nospam>
Date: Sun, 21 Dec 2003 00:06:01 +1100
Message-ID: <3fe448e0$0$18746$afc38c87@news.optusnet.com.au>

"--CELKO--" <joe.celko_at_northface.edu> wrote in message
>
> This is redundant and dangerous. I'll get to the details shortly.

The details unfortunately don't work.... And "this" is called surrogate primary keys.

> >> The main advantage is you can change the "logic key", without the
> need to do any changes in detail tables. <<
>
> That is what ON UPDATE CASCADE does for you automatically -- if you
> designed a database instead of a file in SQL. And what the heck are

Oh yeah? Try an "UPDATE CASCADE" in a database with 6 levels of PK/FK relationship hierarchy in tables related to each other via native PKs/FKs and with a few hundred million rows in each table and then come back here with that theory.

It's very easy to talk without having the foggiest what's involved in providing efficient access to data in today's very large databases. You simply CANNOT have that sort of overhead when an operator has typed an incorrect SSN and it's now spread as an FK over tables with hundreds of millions of rows. The overhead in correcting that, with or without UPDATE CASCADE, is just NOT acceptable. No matter how nice in theory that might have been.

> Which you can never verify. Tryt this table:
>
> CREATE Drivers
> (driver_id IDENTITY (1,1) NOT NULL PRIMARY KEY,
> ssn CHAR(9) NOT NULL REFERENCES Personnel(ssn),
> vin CHAR(17) NOT NULL REFERENCES Motorpool(vin));
>
> Do your joins on driver_id. Find an error in the ssn (Social Security
> Number, the tax identification number for US citizens) and correct it
> in just Drivers. None of your joins know that the driver_id refers to
> the wrong person now. You have to make sure that the redundant key is
> also changed and that no application -- present or future -- fails to
> do so. Double the work at best. Mostly likely have no data
> integrity.

This example simply doesn't work. You start by citing a case in which the SSN and the VIN are wrong and that the user should correct it "in just DRIVERS". Which doesn't make sense: why "just"? Are you storing SSN and VIN as redundant information in your design? Why? You KNOW that is a no-no in pure relational design. Nowhere in your example SQL did you imply that SSN and VIN are stored somewhere else, nor is there any need for such storage anywhere else once they are in DRIVERS.

Unless you are creating said redundant information on purpose, in which case you'll have the problem ANYWAY, native primary keys notwithstanding! And if a SSN and a VIN are corrected and you join on driver_id, you are NOT referring to the wrong person: you just CORRECTED it, you WERE referring to the wrong person BEFORE! In other words, you start by citing an example that is correct in syntax, jump to an assumption that you never included in the initial premise, you assume that a correction to data creates wrong data and then use that as justification for pointing out a "bad" practice?

Care to think a bit longer about an example that MAKES sense?

> I find these systems fall apaprt in about a year.

Funny. Just about EVERY SINGLE 'relational database that I know (and I DO know quite a few, you're NOT the only one who's been around for ages) that has surrogate primary keys instead of native primary keys, is STILL running and without any performance problems. Can't say the same for those that use native PK's in their design, though.

Of course, I'm talking about databases that have not been "denormalised" to solve silly join performance problems that should have not been in the RDBMS software in the first place!

-- 
Cheers
Nuno Souto
wizofoz2k_at_yahoo.com.au.nospam

Received on Sat Dec 20 2003 - 07:06:01 CST