Re: Surrogate Key vs Production Key

From: Ed prochak <ed.prochak_at_magicinterface.com>
Date: 21 Oct 2004 10:17:15 -0700
Message-ID: <4b5394b2.0410210917.7d6340f1@posting.google.com>

wizofoz2k_at_yahoo.com.au (Noons) wrote in message news:<73e20c6c.0410202043.3268333_at_posting.google.com>...
> ed.prochak_at_magicinterface.com (Ed prochak) wrote in message news:<4b5394b2.0410131031.49b21ed_at_posting.google.com>...
>
> > >
> > > I don't understand. Are you suggesting that surrogate keys
> > > are buggy or that only surrogate key systems have bugs?
> >
> > If you have a bug and it corrupts values in the child table, you may
> > have no way to restore those child records back to the proper parent.
>
> That is complete bollocks. First of all: corruption of
> values is corrected with data recovery. It has NOTHING to do
> with modelling, PKs, FKS, keys, whatever!

SO you have never had a programming bug that touched your surrogate values?
Lucky you.
(Better keep your favorite good luck charm always close by!)
>
> > Why because that is not relationship between the Surrogate key value
> > and the data it is supposed to connect. It is often just a value
>
> This is where your examples and arguments are completely wrong.
> If you do not establish a referential integrity constraint
> between ANY two PK and FK keys (natural or not is COMPLETELY
> immaterial), then you WILL have the potential for wrong data.

Look RI between the PK and FK has nothing to do with it. My point is there is NO RELATIONSHIP between the surrogate and the data it represents. The assignment of the surrogate is completely arbitrary and therefore impossible to validate.

Here's a simple example thet you should be able to understand.

table A
idA ...other columns
PK
1 john smith
2 linda loon

table B
idB idA ... other columns
PK FK to table A
1 1 123 baker street
2 2 987 signa Drive

Along comes a data change (command line SQL, programming error, whatever) and the effect is an update of table B to yield this:

table B
idB idA ... other columns
1 2 123 baker street
2 1 987 signa Drive

Now, what FK to PK RI constraint did I violate?

If this happened to two rows out of 100,000 addresses, how would you find it?
Alternate optional question: how do you design your model to enable recovery from this? (yes there are ways, but they reduce some of the advantages you speak of for SKs)

>
> It has NOTHING to do with how those keys are generated/populated.
> The problem is NOT the use of surrogate or natural keys, the
> problem is the ABSENCE of a RI constraint!

It has NOTHING to do with RI as I just showed you. bad SK values can get into the system even with strong RI in place.

>
> Can we for once stop invoking absolutely farcical and
> irrelevant scenarios that have NOTHING to do with use
> of surrogate versus natural keys? Since WHEN does the
> use of surrogate keys preclude the use of RI? Helloooooo?

This is your arguement about RI, it is not mine.
>
>
> > selected from a sequence. So changing the FK value from 1000 to 10001
> > just changed the parent from john smith to abby hutchison. But a bug
> > in a natural key which changed john smith to john smithe might be
> > correctable.
>
> No it most definitely is NOT correctable with natural keys.
> Or you have never heard of how many people there are with
> the name john smith!

I picked that example precisely because I've experienced such problems on BOTH sides (the names corrupted and the surrogate key moved). Many times to recovery required manual review, but a human can look at john smithe in the child table and guess that maybe it is really belongs to john smith. and in the other example it may be clear that the child record with id 1001 does not belong to heidi buchenwald, but I have no clue how to restore it back to john smith's id.
>
>
> > You made the ZERO claim for surrogate keys, I made no such claim for
> > natural keys. My only contention is that it is a lot harder to recover
> comp.databases.oracle.server> from a problem when the surrogate key
> values get screwed up.
>
> The "surrogate key values" do NOT get "screwed up"! THAT is the
> whole argument for using them. They are not manipulated by ANYONE
> other than the code and/or database engine. UNLIKE natural keys,
> that indeed MUST have check digits or other devices to avoid
> user corruption! AS WELL AS whatever other devices must be
> in place to ensure RI is not lost!

Again, you are lucky having NEVER had a programming bug, never having to convert your data into another system.

But now you changed your mind. Before the argument for SKs was saving data space. you don't want to waffle in your views this late in the campaign, er discussion. 8^)

>
>
> > The Surrogate keys are not part of the real data, they are a
> > ficticious relaionship. Logically the SK's are just POINTERS. That
> > make it logically a network model, IMHO.
>
> No it most definitely does NOT! A network model is not
> just a collection of rows in tables with numeric links!
> That shows you don't have the foggiest idea how a network
> (or hierarchical) database works.

After reading more, I comback to here just to point out that I think you are mixing the discussion between physical models and logical models. more about this follows below.

>
> Suffice to say that non-relational databases do NOT allow
> for ad-hoc relationships between any two entities (tables).

Since when?
>
> Something that is at the very root of why relational
> models are used: with relational, you can always create a join
> between two tables on any two columns, dynamically and without
> having to specify a-priori the relationship in the database
> schema. Regardless of what that join may retrieve.
>
> THAT is why relational is so much more flexible than network
> or hierarchical models and why it uses the flat table
> mechanism. While the others use a COMPLETELY different
> mechanism for data storage and retrieval.

Relational IS more flexible but there are is another issue here.

There's the logical model and the physical model and then the implementation DB engine.
I've seen more than one database engine that internally is a network model, but supports a SQL query engine. The only difference is the wider range of performance on queries.
(queries that match the model are fast, those that don't can be VERY SLOoow)

the logical model for network is the fundamental relations are fixed, and this is modeled by POINTERS so no data is repeated fom the parent to the child table, there's only a link between them. But that doesn't stop you from asking about relationships between to entities in the model that are not directly connected. You may have to follow some long chains to make the connection that's all.

But with SK's you have the same problem. Simple example: given your SK database, how do you find all the BOOKS owned by john smith. You have to follow the link, selecting from BOTH the PERSONS table and the BOOKS table joining on the SK, while a natural key model would just look in the BOOKS table, because the owner would be there.

>
> In fact, relational is so flexible that it may indeed allow
> emulation of what a network or hierarchical database does.
> And dynamically! That is what your argument was intended to
> show.

Please don't speak for me. Once you pick surrogate keys, there's nothing dynamic about your logical data model.

>
> But the internal mechanics between the two are COMPLETELY
> different and have NOTHING to do with "links", hence the
> argument's total lack of validity.

the DBMS engine might use whatever it likes, that's the physical model. I'm trying to teach you that you are working in a network LOGICAL model. Try not to confuse yourself. Just because you use a product touted as a Relational DBMS that your specific data model is relational.

>
> The use or not of numeric "links" to rows is a physical
> design detail that has NOTHING to do with relational
> modelling or relational logic design of a database,
> whatsoever!

Find surrogate keys in Codd's rules. I really don't think you will.
>
>
> > > Still waiting to hear of one with SKs.
> >
> > Then you haven't been listening.
>
> Still waiting. Perhaps you could start by re-reading
> what I said, instead of jumping to imaginary
> situations that NOBODY proposed or has ever implemented?

Maybe you should look again at what I said. I've seen these problems happen. I've had to go in and clean up the mess. Call me when it hapens to you. I could use some new clients. Meanwhile, I'm tired of presenting examples that you apparently don't read.

I think I'll just watch Joe and others try to explain this to you. (not that they are having much more success.) Received on Thu Oct 21 2004 - 12:17:15 CDT