Re: What is a surrogate identifier

From: JOG <jog_at_cs.nott.ac.uk>
Date: 15 Mar 2007 08:32:44 -0700
Message-ID: <1173972764.387909.18090_at_e1g2000hsg.googlegroups.com>


On Mar 15, 2:52 pm, "Walt" <wami..._at_verizon.net> wrote:
> In the topic about building an OODB from an RDBMS point of view, I may have
> mistaken the meaning of Dmitry's term "surrogate identifier". I took
> objection to equating this to "pointer emulation". I also took "surrogate
> identifier" to be synonymous with "surrogate key".
>
> After a little reading, it strikes me that I may have misunderstood the
> term "surrogate identifier" (as well as misspelling it in my response).
> Dmitry could have meant "surrogate for a variable name", in which case his
> assertion that it's a pointer emulation is a completely different
> assertion.
>
> So what does the term "surrogate identifier" mean anyway? And specifically,
> what does it mean with respect to tuples (rows in a relational table)?
>
> I have problems connecting this to the relational model, to begin with.
> Here's my (simplified) understanding of the RM:
>
> Data is stored in relational tables, with each datum existing at the
> intersection of a row and a column of a table. Logically related tables are
> collected in schemas.
>
> Tables are referenced by name. Columns within tables are also referenced by
> name.

Well I think this might be skewed by viewing a database relation as a table. Relations have an alias (the name of the table, which means we don't have to enumerate the relation each time we want to refer to it), but they are at their heart just a set of propositions. In RM a proposition is encoded as a finite partial mapping of attributes to values, and so really has no concept of a 'column'. And given they don't exist in the theory, asking whether a column has a name or not is rather a moot point.

>
> Rows are NOT referenced by name, in the RM. Instead, rows are referenced
> by a (perhaps partial) description of their contents. Indeed, it isn't the
> row that's being referenced, but the content of the row.
>
> There's a whole lot more detail (over 100 pages in the best known
> introduction), but that's enough for this discussion.
>
> Since rows do not have names (or identity) in the RM, it is completely
> superfluous to come up with "surrogate identifiers" for rows. That means
> that rows don't need pointers in the RM.

I'd disagree with that. A proposition does have an identity - the attributes and values of which it is composed. This absolutely is identity - especially in the sense that Liebniz intended (and the sense that serves as the underlying mechanism of equality in math).

What is worth noting is that a key is used for identification - it is not the propositions identity itself. It only serves to ascertain that identity. Regards, J.

>
> Now, every implementation of an RDBMS where I've looked at the internals
> uses a system of pointers (direct or indirect) to point to rows (and to
> just about everything else). I don't know that that's the only way to build
> an RDBMS, but I'm willing to listen to arguments about that.
>
> Now you could build an OODBMS, based around pointers, and then layer an
> RDBMS on top of that. But the row pointers would be transparent to the
> relational user. And this seems to me to be at the root of the disconnect
> between Dmitry and most of the c.d.t. regulars.
Received on Thu Mar 15 2007 - 16:32:44 CET

Original text of this message