Re: what are keys and surrogates?

From: rpost <rpost_at_pcwin518.campus.tue.nl>
Date: Sun, 27 Jan 2008 20:39:14 +0100
Message-ID: <bb11f$479cdde2$839b4533$5271_at_news1.tudelft.nl>


(Sorry to be a bit late with my reply.)

JOG wrote:

[...]

>We have some bits of paper with numbers written on (in pencil). We are
>storing info about these bits of paper in a database using the schema:
>{paperID, Value}. The key, PaperID, is a unique database generated
>hidden surrogate.
>
>We have an enumeration:
>{ (paperID:1, Value:X), (paperID:2, Value:Y), (paperID:3, Value:Z) }

Fine. Now why does this ID exist? What useful purpose can it possibly serve? The purpose I see, and that I've been defending here, is that relations can now use this ID attribute when they want to refer to table rows, instead of using its "real" attributes (just Value in this case), and that in doing so, the database designer introduces entities, or objects if you wish, into the database: it separates changes to paper (i.e. to an attribute value in the paper relation) and additions/deletions of papers (i.e. to the paper relation) from changes to the *use* of paper (how paper is used in other relations).

So we expect this paperID to occur at least once as an attribute in another relation.

>Someone comes to you the DB admin, with 3 bits of paper and says, ok
>the boss has changed the values on some of the bits of paper. What I
>have here is one bit of paper with an A on, one with a B and and one
>with a Z. Please update the database accordingly.

That is impossible. We need to know how these bits of paper relate to the existing bits. Can we go by our gut feeling and assume that the Z bit stays the same while the X bit is relabeled A and the Y bit relabeled B? Or are the X and Y bits discarded entirely, and replaced with two new A and B bits? Or is perhaps the X bit relabeled to B, the Y bit to A, the Z bit discarded, and a new bit introduced with the label Z? The boss will have to tell us.

>Please tell me what you do to deal with this. No redefinitions, or
>tangents, just tell me how you ensure that the right PaperID's get the
>right updates? How do you known which paperID has changed, and to
>which value? How do you even know you have been presented with the
>same bits of paper as the initial three, and not new ones altogether?
>The DBA would be at a complete loss.

Exactly, but this is not because of the presence of paperIDs; it's because the boss did not provide that information.

As soon as the boss provides this information, we can think of how we implement the required updates. In the presence of paperID, it's easy, for instance,

  UPDATE Paper SET Value = "A" WHERE Value = "X";   UPDATE Paper SET Value = "B" WHERE Value = "Y";

Without paperID, what would we have instead? The Paper table would not be present at all, and all my foreign key attributes to the Paper.paperID attribute would just be plain Paper values. So what would the update look like? Well, you'd have to know, somehow, which of all the possible attributes in your tables stand for bits of paper, and write the same update queries for each of those attributes, that I wrote just for my Paper table. Not so easy. Why not? Because this Paper-less schema provides less information about paper than the one you proposed: it no longer tells us explicitly which attributes in the database stand for bits of paper.

But wait, you may object, we may be using "abstract domains" in our database; instead of directly using this enum as value type, we may have an explicitly declared a Paper value domain that is defined to be this enum. Some DBMSs promote this approach, e.g. Firebird. But that only tells us that we should rather be looking at entities that are a bit more complicated than enums; e.g. we can study what happens with entities that have ID-based foreign keys *themselves*; e.g. a Paper table may refer to types of paper, and one of its attributes may be a reference to its manufacturer.

But wait, you may try again, perhaps the boss has something more complicated in mind: e.g., the X paper may be *split* into A and B, with some of its uses being replaced with the use of A paper, and the other uses with B paper. So your nifty paperID doesn't always buy you simplicity of updates, nyaah nyaah. Well, I can only agree.

>If you do follow this example, then perhaps you could explain why you
>are having trouble generalizing it to any schema where the key
>attribute is hidden (i.e. non-visible to the business), and all other
>attributes are mutable.

The "trouble", as I tried to explain before, arises when a relation exists of which none of the keys can be decomposed (by following foreign key references) in such a way that only non-ID attributes remain. E.g. in the above example, the "trouble" arises when Value is not a key. If two Papers with the same Value are allowed, we run the danger of starting to accumulate multiple indistinguishable copies of Papers in the database. The trouble is limited; it's not that we can no longer write the correct queries, we can; but it will be harder. So I wouldn't be happy to see this in a finished database schema for an actual database.

BTW I did not state or assume that all other attributes are mutable. Neither did I assume that the ID-based attribute is the only key.

-- 
Reinier
Received on Sun Jan 27 2008 - 20:39:14 CET

Original text of this message