Re: what are keys and surrogates?

From: <reinpost_at_win.tue.nl>
Date: Fri, 01 Feb 2008 21:13:36 +0100
Message-ID: <d12ab$47a37d70$839b4533$15472_at_news1.tudelft.nl>


JOG writes:

[...]

>> >Someone comes to you the DB admin, with 3 bits of paper and says, ok
>> >the boss has changed the values on some of the bits of paper. What I
>> >have here is one bit of paper with an A on, one with a B and and one
>> >with a Z. Please update the database accordingly.
>>
>> That is impossible. We need to know how [...]
>
>The boss cannot tell you, that was the whole point.

I missed the point, but I think I get it now: what you meant to imply is that the Value is not a key, that the tuples in Paper represent different individual pieces of paper, that the schema won't allow us to tell them apart, and that for this reason, if the boss comes with updated values for three specific bits of paper, we won't be able to tell which tuples to update, we can't even specify the update.

So this particular schema cannot be used to represent individual pieces of paper that need to be told apart. (Technically, this also depends on how paperID is used in other entities, but let's put that aside for now.)

By the way, the comparison I made was misguided; we can omit the paperID while keeping the Paper table, and if we do, the difference is much smaller.

>It took me a long time to grok how this even made any sense, to see
>that an entity type is actually _defined_ by what we say consistently
>identifies it over different states.

That is not where we disagree. We disagree on whether another entity may participate in that identification (i.e. on whether we allow weak entities or relationships).

I believe you're confusing the language of the model (e.g. an E/R or a relational model) with the language in which we talk about the model (i.e. the query language). "Surrogates" in queries are bad, (and aren't actually surrogates: they aren't hidden); but that alone doesn't rule out the presence of surrogates in the model itself. But you've been laughing away all the examples I gave to prove this, without convincing me they are mistaken, so I'll just shut up about it.

>I then realised that the entity
>type I have in the real world (an individual bit of paper) might not
>be the same type I've constructed in the model (a value written on
>paper). I didn't need OID's to fix my problems...I just needed to make
>sure my entities tallied with my constructs! And that meant I just
>needed to use the right identifiers.

I'm just trying to convince you that the requirement to state good natural keys doesn't somehow force you to get rid of OIDs.

>> The "trouble", as I tried to explain before, arises when a relation exists
>> of which none of the keys can be decomposed (by following foreign key
>> references) in such a way that only non-ID attributes remain.
>
>The "trouble" is that you are probably stuck in thinking in OO, like I
>once was when I thought C++ was the shizzle to end all shizzles.

The trouble I see is that your answers tend to be based on guesses about my background, while you ignore the details of what I actually write. This is probably because I'm not that good a writer, for which I apologize, but it's still annoying.

I've always agreed with you that the above model for Paper is broken when used as a database schema, when Value is not a key. I just don't agree that the paperID should go. It doesn't follow, in any case.

>Each "bit of paper" has already has a natural identifier of course, but we
>can't use it (it's x,y path), and this is the root of our problem. And
>yet all the issues can be solved one single swoop by just...
>
>*Writing the paper_id in permanent marker on the back of each bit of
>paper*

:-) Well, at least you've acknowledged that there may be situations in which we may not (yet) have good natural keys, while I acknowledge that here we have a schema we need to fix. BTW exporting broken surrogates is the broken way out. Better find out the actual attribute(s) that make it necessary to distinguish these bits of paper, and record them; or find out that we don't really need to distinguish them after all, and make Value a key (e.g. add a Number attribute).

>Now that the surrogate key isn't hidden, all problems vanish.

But new problems may arise. We must now maintain this new attribute (e.g. mark the bits of paper and never change any of the marks) and whoever needs to identify bits of paper must store and recall these arbitrary marks. This is not always possible or practical.

>It represents the papers x,y path and every bit of paper is simply
>identifiable, updates are trivial, and importantly the attribute is no
>longer some magical OID or yucky tuple identifier fudge.

Nothing yucky about magical OIDs, as long as they're used correctly.

-- 
Reinier
Received on Fri Feb 01 2008 - 21:13:36 CET

Original text of this message