Re: Surrogate Key vs Production Key

From: Noons <wizofoz2k_at_yahoo.com.au>
Date: 10 Oct 2004 23:55:17 -0700
Message-ID: <73e20c6c.0410102255.761ce25e@posting.google.com>

jcelko212_at_earthlink.net (--CELKO--) wrote in message news:<18c7b3c2.0410091028.2a9a36c0_at_posting.google.com>...

> 1) I get paid by the word in magazines :)

I know... Some people have all the luck. :)

> 2) I am positively concise compared to most writers

THAT is debatable! ;)

> That is part of validation -- the check digits are an old favorite of
> mine that new programmers simply do not know about. What are they
> teaching in schools? I understand not knowing the fancy ones like
> Diherdal Five, but Mod 11 is easy. The one thing that ought to have a
> CHECK() constraint on it is the keys.

More overhead, Joe. More overhead.
All for the sake of not using the obvious.

> Done a data warehouse yet? My favorite was the part number "I Hate my
> Job" which appeared thousands of times in legacy data and nobody had
> spotted it until the warehouse project.

Well, there is a good example of perfect fit for sks. Do a search on google for surrogate keys and the first to spring is precisely how helpful they are to dwhousing. You may chose to ignore it, but it's there...

> I would not call validation and verification arbitrary ...

Actually, they very much are.... :)

> And what is the cost of the bad data you created?

You keep coming back to the "bad data" straw man. WHAT bad data? There is no such thing as bad data in user land.

>
> This is the year 2004; storage is insanely cheap, why we make it from
> sand! And if there is a huge compound key, then you must make it
> unique to have a valid model -- unless doing it fast was more
> important than doing it right.

You see, you are harking back to subjective stuff like "right". What is "right"? You have to define that first and prove that the adjective applies to it before you can start going off about what is or is not so.

> What I find in reality with larger
> keys is that they are very natural to the users -- best examples:
> physical locations like (longitiude, latitude) or bin address systems
> (area, aisle, stack, bin).

Show me where in heck does the relational model define keys have to be "natural to the users"? Searches are searches, no matter what attribute is picked. If it happens to be indexed, it may under some circumstances improve performance. And that's about it.

> You also forget that indexes are only one kind of access method. For
> compound keys and VLDB problems, I'd use a hash or compressed bit
> vector. Teradata for example takes an absolute max of two probes to
> locate any row regardless of the key; 90%+ are done in one probe.

So? A specific implementation detail, of one choice, of one maker. That is not the definition of a law that has to be followed by everyone.

> car. If I were to type the exposed physical locator used in a
> particlar release of a particular product on one particular machine,
> what do you do with it?

Nothing. And that is precisely the point: I am not supposed to do anything with it other than RI. If I truly wanted to do something with a VIN, then I'd get it from the same row.

It's an attribute, Joe. Your choice to call it a natural key. But if I chose to use the VIN as a unique key, it doesn't change the fact that it's still the VIN. There is absolutely NO CHANGE whatsoever to its validity.

> Get on the internet and tell me what automobile is referenced by
> row_id (IDENTITY or whatever) 42 on my machine. Cannot do it.

So what? If I wanted to search on the VIN, then I'd use the VIN, period. Surrogate key or natural key is immaterial.

> You
> have a file system with a physical index that is tied to the hardware,
> not a data model. if the machine wants to hide this "pointer chain",
> that's fine. But I never write code with it.

Neither does anyone else. You totally omitted the point: surrogate keys are for RI, not for users to locate rows. There is no such thing in relational in fact as a row locator: you are supposed to be able to use ANY suitable column. That is why we moved away from the fixed pointer stff of IMS, where the "entry points" were sooooo important.

> >> You missed all the discussions here about how unimportant database
> portability really is, didn't you? :) <<
>
> You haven't seen what I charge for that "unimportant" service, have
> you :)?

That may be so, but it does not change the simple fact I pointed. I repeat, do a search for portability here. It's not even my words: Tom Kyte's.

> Because we have is bunch of certificate holders without any problem
> solving ability or education. You have to give them "paint by number"
> programming. Why don't newbie know about coupling, cohesion and basic
> software engineering nowdays? Rant, rant, rant.

I agree with you 100%. But both of us ranting is not gonna change one bit what the market is doing. And quite frankly, when there is not a SINGLE commercial application out there using natural keys, I at least start asking questions. Anyone answering them should have a better argument than subjective judgements and comments about "purity" or "validity". Worth nothing and totally misdirected.

>
> I have worked with or advised 17 diffrent SQL products over the years
> --I've seen it all :)

Same here, chief, same here. And hierarchical as well...

> Yeah, so why do the SQL Server newsgroups have that question asked
> about every 3-4 months? becauyse the mental model is a 1950's file
> system!

Because SQL Server needs that compaction for some obscure reason. No one in Oracle would recommend a sequence to be made contiguous or to compact on sequence: serves no purpose whatsoever. In fact, nowadays we don't even "reorg" like we used to. That means to me the "compaction" is not an argument against surrogate keys but an argument against SQL Server. See where I'm getting at?

> Then you must never see it or code with it, as per Dr. Codd. My gripe
> is that newbies do code with it and expect it to maintain uniqueness
> for them.

Your gripe is 100% supported by me. His Coddness was absolutely right.

> If you use it as a key, then you should be able to verify it agains
> the reality of the model. Would you use a VIN number for your car
> title if it changed everytime you went to the DMV?

Of course not. But I don't expect a query returning VIN numbers to do so in any predictable order whatsoever unless I specifically ask for one with ORDER BY. That is what your code implied and why I found it so objectionable.

> If you update via ssn and I update via driver_id, what happens when
> they get out of synch?

They don't, Joe. If you use surrogate keys or not, you do NOT use them to locate rows in a parent relation. EVER. You use the UK. Which achieves precisely the result everyone wants.

> Sure hope I got the right IDENTITY value when I say, Remember how
> worried you were about typos? Ever try to type a dozen GUIDs?

Perish the thought!

> at the schema level. My point: programmer do not maintain fake pointer
> chains and when they try they are human will screw up.

Urp! Stop there: you do NOT "maintain" fake pointer chains. You LET the RI maintain them for you. Period. That is all Codd said we could do, that is all that gets done!

> When we had IDMS and other network databases, the system prevented
> most of the screw ups and utility programs to rebuild pointer chains.
> IDENTITY has nothing to do that. If I make a bulk copy error, the
> IDENTITY values are re-numbered and my PK-FK fake pointer chains are
> garbage.

Exactly. So, once used you leave them alone. What's wrong with that? Exactly nothing.

> Rdb, Mimer, Solid (I think)

Are they? Full relational domains? Then most likely they will be in Oracle in another couple of revisions: just about anything good from rdb has ended up there...

> DB2, Mimer, Ocelot, SQL Server, Sybase, etc.

Up to Oracle to catch up, eh? :)
I still have to see one SS or Sybase commercial system using them, though: ALL I've seen so far use surrogates. In fact, the Sybase people even PARTITION on surrogates! That is just about as objectionable as I can imagine... I wonder why they don't use the right technique. Something they wanna hide?

> DRI cascades. I hate it because it is procedural code, but that's
> what we have. It is far safer than exposed physical locators.

Safety is relative. The last thing I want is every coder out there writing RI code that should have been declarative. The subject is extremely complex with all the locking implications. That is PRECISELY why many dbs still don't have it: it better be done properly. I can perfectly see another coder going there a year later and messing up things badly.
No: I'd rather eliminate the problem than shift it in time.

> Oracle is total piece of crap and I really feel sorry for the poor
> bastards that get stuck with it. You look at most production Oracle
> code and it is COBOL with sequential algorithms in a thing disguise.
> You just know that you are going to see cursors when you do the audit.
> Rant, rant, rant.

LOL! Good one!

>
> I don't understand that one. A VIEW does not exists until it is
> invoked then it disappears at the end of the query or statement. Any
> RI rule would have to be enforced in the defining SELECT in the VIEW
> definition. Is this "Oracle speak"?

No, not really. If you want to implement true subtyping (I know you don't like it, but guess what: it is useful and is used quite a lot) then you need to be able to define a view that covers the entities in the subtype. The view is now the type. And the relationships hanging off the type (as opposed to off the subtypes) should be delaratively done against the view. Not every single one of the subtypes. Otherwise you end up with uniqueness problems.

Oracle now lets us declare RI against view columns. But it does not enforce them. At least not since I last looked. Maybe it's changed with 10g.
Of course, you can implement subtypes inside a single table, with a locating column. I don't like it: no way I can enforce proper RI and constraints without triggers. (Urgh) indeed, as you said!

>
> >> Nice to hear from you, BTW. <<
>
> Likewise! Most of the kids these days on newsgroup are whining
> over-sensitive little wimps and you cannot get any good flow with
> them.

No worries. My back is so wide and scarred from the many years of backstabbing, it don't matter anymore! :) Thanks for the feedback. Received on Mon Oct 11 2004 - 01:55:17 CDT