What is a surrogate identifier

From: Walt <wamitty_at_verizon.net>
Date: Thu, 15 Mar 2007 14:52:22 GMT
Message-ID: <G4dKh.19756$d8.59_at_trndny07>



In the topic about building an OODB from an RDBMS point of view, I may have mistaken the meaning of Dmitry's term "surrogate identifier". I took objection to equating this to "pointer emulation". I also took "surrogate identifier" to be synonymous with "surrogate key".

After a little reading, it strikes me that I may have misunderstood the term "surrogate identifier" (as well as misspelling it in my response). Dmitry could have meant "surrogate for a variable name", in which case his assertion that it's a pointer emulation is a completely different assertion.

So what does the term "surrogate identifier" mean anyway? And specifically, what does it mean with respect to tuples (rows in a relational table)?

I have problems connecting this to the relational model, to begin with. Here's my (simplified) understanding of the RM:

Data is stored in relational tables, with each datum existing at the intersection of a row and a column of a table. Logically related tables are collected in schemas.

Tables are referenced by name. Columns within tables are also referenced by name.

Rows are NOT referenced by name, in the RM. Instead, rows are referenced by a (perhaps partial) description of their contents. Indeed, it isn't the row that's being referenced, but the content of the row.

There's a whole lot more detail (over 100 pages in the best known introduction), but that's enough for this discussion.

Since rows do not have names (or identity) in the RM, it is completely superfluous to come up with "surrogate identifiers" for rows. That means that rows don't need pointers in the RM.

Now, every implementation of an RDBMS where I've looked at the internals uses a system of pointers (direct or indirect) to point to rows (and to just about everything else). I don't know that that's the only way to build an RDBMS, but I'm willing to listen to arguments about that.

Now you could build an OODBMS, based around pointers, and then layer an RDBMS on top of that. But the row pointers would be transparent to the relational user. And this seems to me to be at the root of the disconnect between Dmitry and most of the c.d.t. regulars. Received on Thu Mar 15 2007 - 15:52:22 CET

Original text of this message