Re: what are keys and surrogates?

From: JOG <>
Date: Sun, 27 Jan 2008 17:38:40 -0800 (PST)
Message-ID: <>

On Jan 27, 7:39 pm, (rpost) wrote:
> (Sorry to be a bit late with my reply.)

no prob. there a million and one things more important than arguing about dangers of tuple id's! Anyhow I think we've done pretty well to get this far disagreeing so much and still being civil ;) Go us.

> JOG wrote:
> [...]
> >We have some bits of paper with numbers written on (in pencil). We are
> >storing info about these bits of paper in a database using the schema:
> >{paperID, Value}. The key, PaperID, is a unique database generated
> >hidden surrogate.
> >We have an enumeration:
> >{ (paperID:1, Value:X), (paperID:2, Value:Y), (paperID:3, Value:Z) }
> Fine. Now why does this ID exist? What useful purpose can it possibly
> serve? The purpose I see, and that I've been defending here, is that
> relations can now use this ID attribute when they want to refer to table
> rows, instead of using its "real" attributes (just Value in this case),
> and that in doing so, the database designer introduces entities, or
> objects if you wish, into the database: it separates changes to paper
> (i.e. to an attribute value in the paper relation) and additions/deletions
> of papers (i.e. to the paper relation) from changes to the *use* of paper
> (how paper is used in other relations).

Yeah, I fully understand your position, its where I once stood. I think you see the mistake (imho of course) a lot when people move from OOP to databases, because it's such a shift facing that RM doesn't have any addresses in it.

> So we expect this paperID to occur at least once as an attribute
> in another relation.
> >Someone comes to you the DB admin, with 3 bits of paper and says, ok
> >the boss has changed the values on some of the bits of paper. What I
> >have here is one bit of paper with an A on, one with a B and and one
> >with a Z. Please update the database accordingly.
> That is impossible. We need to know how these bits of paper relate
> to the existing bits. Can we go by our gut feeling and assume
> that the Z bit stays the same while the X bit is relabeled A
> and the Y bit relabeled B? Or are the X and Y bits discarded entirely,
> and replaced with two new A and B bits? Or is perhaps the X bit
> relabeled to B, the Y bit to A, the Z bit discarded, and a new bit
> introduced with the label Z? The boss will have to tell us.

The boss cannot tell you, that was the whole point.

> >Please tell me what you do to deal with this. No redefinitions, or
> >tangents, just tell me how you ensure that the right PaperID's get the
> >right updates? How do you known which paperID has changed, and to
> >which value? How do you even know you have been presented with the
> >same bits of paper as the initial three, and not new ones altogether?
> >The DBA would be at a complete loss.
> Exactly, but this is not because of the presence of paperIDs;
> it's because the boss did not provide that information.
> As soon as the boss provides this information, we can think of how
> we implement the required updates. In the presence of paperID,
> it's easy, for instance,

As I said, the boss can't remember what he did. Just take it as given that you cannot get the information, and have to deal with what you see in this example. An update is impossible, and our db is broken.

> UPDATE Paper SET Value = "A" WHERE Value = "X";
> UPDATE Paper SET Value = "B" WHERE Value = "Y";
> Without paperID, what would we have instead?

Nothing at all, if we were happy with just 'value'. We'd also have to accept that we weren't tracking bits of paper then. In fact the notion of an individual piece of paper, that existed independently of what was written on it, wouldn't exist in our model at all. Funny things entity types.

It took me a long time to grok how this even made any sense, to see that an entity type is actually _defined_ by what we say consistently identifies it over different states. I then realised that the entity type I have in the real world (an individual bit of paper) might not be the same type I've constructed in the model (a value written on paper). I didn't need OID's to fix my problems...I just needed to make sure my entities tallied with my constructs! And that meant I just needed to use the right identifiers.

Release the power of the grok too reinier! :-)

> The Paper table would
> not be present at all, and all my foreign key attributes to the
> Paper.paperID attribute would just be plain Paper values. So what
> would the update look like?
> Well, you'd have to know, somehow, which
> of all the possible attributes in your tables stand for bits of paper,
> and write the same update queries for each of those attributes,
> that I wrote just for my Paper table. Not so easy. Why not?
> Because this Paper-less schema provides less information about paper
> than the one you proposed: it no longer tells us explicitly which
> attributes in the database stand for bits of paper.
> But wait, you may object, we may be using "abstract domains"
> in our database; instead of directly using this enum as value type,
> we may have an explicitly declared a Paper value domain that is defined
> to be this enum. Some DBMSs promote this approach, e.g. Firebird.
> But that only tells us that we should rather be looking at entities
> that are a bit more complicated than enums; e.g. we can study what happens
> with entities that have ID-based foreign keys *themselves*; e.g. a Paper
> table may refer to types of paper, and one of its attributes may be
> a reference to its manufacturer.
> But wait, you may try again, perhaps the boss has something more
> complicated in mind: e.g., the X paper may be *split* into A and B,
> with some of its uses being replaced with the use of A paper, and
> the other uses with B paper. So your nifty paperID doesn't always
> buy you simplicity of updates, nyaah nyaah. Well, I can only agree.
> >If you do follow this example, then perhaps you could explain why you
> >are having trouble generalizing it to any schema where the key
> >attribute is hidden (i.e. non-visible to the business), and all other
> >attributes are mutable.
> The "trouble", as I tried to explain before, arises when a relation exists
> of which none of the keys can be decomposed (by following foreign key
> references) in such a way that only non-ID attributes remain.

The "trouble" is that you are probably stuck in thinking in OO, like I once was when I thought C++ was the shizzle to end all shizzles. Unfortunately to guarantee you can recognize an 'entity' when it arrives again at your database, you need that stable identifier. Each "bit of paper" has already has a natural identifier of course, but we can't use it (it's x,y path), and this is the root of our problem. And yet all the issues can be solved one single swoop by just...

*Writing the paper_id in permanent marker on the back of each bit of paper*

Now that the surrogate key isn't hidden, all problems vanish. It represents the papers x,y path and every bit of paper is simply identifiable, updates are trivial, and importantly the attribute is no longer some magical OID or yucky tuple identifier fudge.

> E.g. in
> the above example, the "trouble" arises when Value is not a key. If two
> Papers with the same Value are allowed, we run the danger of starting to
> accumulate multiple indistinguishable copies of Papers in the database.
> The trouble is limited; it's not that we can no longer write the correct
> queries, we can; but it will be harder. So I wouldn't be happy to see
> this in a finished database schema for an actual database.
> BTW I did not state or assume that all other attributes are mutable.
> Neither did I assume that the ID-based attribute is the only key.
> --
> Reinier
Received on Mon Jan 28 2008 - 02:38:40 CET

Original text of this message