Re: what are keys and surrogates?

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Sun, 27 Jan 2008 22:01:27 -0400
Message-ID: <479d377d$0$4053$9a566e8b_at_news.aliant.net>


JOG wrote:

> On Jan 27, 7:39 pm, rp..._at_pcwin518.campus.tue.nl (rpost) wrote:
> 

>>(Sorry to be a bit late with my reply.)
> 
> no prob. there a million and one things more important than arguing
> about dangers of tuple id's! Anyhow I think we've done pretty well  to
> get this far disagreeing so much and still being civil ;) Go us.
> 

>>JOG wrote:
>>
>>[...]
>>
>>
>>>We have some bits of paper with numbers written on (in pencil). We are
>>>storing info about these bits of paper in a database using the schema:
>>>{paperID, Value}. The key, PaperID, is a unique database generated
>>>hidden surrogate.
>>
>>>We have an enumeration:
>>>{ (paperID:1, Value:X), (paperID:2, Value:Y), (paperID:3, Value:Z) }
>>
>>Fine. Now why does this ID exist? What useful purpose can it possibly
>>serve? The purpose I see, and that I've been defending here, is that
>>relations can now use this ID attribute when they want to refer to table
>>rows, instead of using its "real" attributes (just Value in this case),
>>and that in doing so, the database designer introduces entities, or
>>objects if you wish, into the database: it separates changes to paper
>>(i.e. to an attribute value in the paper relation) and additions/deletions
>>of papers (i.e. to the paper relation) from changes to the *use* of paper
>>(how paper is used in other relations).
> 
> 
> Yeah, I fully understand your position, its where I once stood. I
> think you see the mistake (imho of course) a lot when people move from
> OOP to databases, because it's such a shift facing that RM doesn't
> have any addresses in it.
> 
> 

>>So we expect this paperID to occur at least once as an attribute
>>in another relation.
>>
>>
>>>Someone comes to you the DB admin, with 3 bits of paper and says, ok
>>>the boss has changed the values on some of the bits of paper. What I
>>>have here is one bit of paper with an A on, one with a B and and one
>>>with a Z. Please update the database accordingly.
>>
>>That is impossible. We need to know how these bits of paper relate
>>to the existing bits. Can we go by our gut feeling and assume
>>that the Z bit stays the same while the X bit is relabeled A
>>and the Y bit relabeled B? Or are the X and Y bits discarded entirely,
>>and replaced with two new A and B bits? Or is perhaps the X bit
>>relabeled to B, the Y bit to A, the Z bit discarded, and a new bit
>>introduced with the label Z? The boss will have to tell us.
> 
> 
> The boss cannot tell you, that was the whole point.
> 
> 

>>>Please tell me what you do to deal with this. No redefinitions, or
>>>tangents, just tell me how you ensure that the right PaperID's get the
>>>right updates? How do you known which paperID has changed, and to
>>>which value? How do you even know you have been presented with the
>>>same bits of paper as the initial three, and not new ones altogether?
>>>The DBA would be at a complete loss.
>>
>>Exactly, but this is not because of the presence of paperIDs;
>>it's because the boss did not provide that information.
>>
>>As soon as the boss provides this information, we can think of how
>>we implement the required updates. In the presence of paperID,
>>it's easy, for instance,
> 
> 
> As I said, the boss can't remember what he did. Just take it as given
> that you cannot get the information, and have to deal with what you
> see in this example. An update is impossible, and our db is broken.
> 
> 

>> UPDATE Paper SET Value = "A" WHERE Value = "X";
>> UPDATE Paper SET Value = "B" WHERE Value = "Y";
>>
>>Without paperID, what would we have instead?
> 
> 
> Nothing at all, if we were happy with just 'value'. We'd also have to
> accept that we weren't tracking bits of paper then. In fact the notion
> of an individual piece of paper, that existed independently of what
> was written on it, wouldn't exist in our model at all. Funny things
> entity types.
> 
> It took me a long time to grok how this even made any sense, to see
> that an entity type is actually _defined_ by what we say consistently
> identifies it over different states. I then realised that the entity
> type I have in the real world (an individual bit of paper) might not
> be the same type I've constructed in the model (a value written on
> paper). I didn't need OID's to fix my problems...I just needed to make
> sure my entities tallied with my constructs! And that meant I just
> needed to use the right identifiers.
> 
> Release the power of the grok too reinier! :-)
> 
> 
> 

>>The Paper table would
>>not be present at all, and all my foreign key attributes to the
>>Paper.paperID attribute would just be plain Paper values. So what
>>would the update look like?
>>
>>Well, you'd have to know, somehow, which
>>of all the possible attributes in your tables stand for bits of paper,
>>and write the same update queries for each of those attributes,
>>that I wrote just for my Paper table. Not so easy. Why not?
>>Because this Paper-less schema provides less information about paper
>>than the one you proposed: it no longer tells us explicitly which
>>attributes in the database stand for bits of paper.
>>
>>But wait, you may object, we may be using "abstract domains"
>>in our database; instead of directly using this enum as value type,
>>we may have an explicitly declared a Paper value domain that is defined
>>to be this enum. Some DBMSs promote this approach, e.g. Firebird.
>>But that only tells us that we should rather be looking at entities
>>that are a bit more complicated than enums; e.g. we can study what happens
>>with entities that have ID-based foreign keys *themselves*; e.g. a Paper
>>table may refer to types of paper, and one of its attributes may be
>>a reference to its manufacturer.
>>
>>But wait, you may try again, perhaps the boss has something more
>>complicated in mind: e.g., the X paper may be *split* into A and B,
>>with some of its uses being replaced with the use of A paper, and
>>the other uses with B paper. So your nifty paperID doesn't always
>>buy you simplicity of updates, nyaah nyaah. Well, I can only agree.
>>
>>
>>>If you do follow this example, then perhaps you could explain why you
>>>are having trouble generalizing it to any schema where the key
>>>attribute is hidden (i.e. non-visible to the business), and all other
>>>attributes are mutable.
>>
>>The "trouble", as I tried to explain before, arises when a relation exists
>>of which none of the keys can be decomposed (by following foreign key
>>references) in such a way that only non-ID attributes remain.
> 
> 
> The "trouble" is that you are probably stuck in thinking in OO, like I
> once was when I thought C++ was the shizzle to end all shizzles.
> Unfortunately to guarantee you can recognize an 'entity' when it
> arrives again at your database, you need that stable identifier. Each
> "bit of paper" has already has a natural identifier of course, but we
> can't use it (it's x,y path), and this is the root of our problem. And
> yet all the issues can be solved one single swoop by just...
> 
> *Writing the paper_id in permanent marker on the back of each bit of
> paper*
> 
> Now that the surrogate key isn't hidden, all problems vanish. It
> represents the papers x,y path and every bit of paper is simply
> identifiable, updates are trivial, and importantly the attribute is no
> longer some magical OID or yucky tuple identifier fudge.

But wait... now, it's familiar! ... we have (re)invented the natural key! Woo! Hoo! Received on Mon Jan 28 2008 - 03:01:27 CET

Original text of this message