Re: OID's vs Relational Keys?

From: Frank Hamersley <terabitemightbe_at_bigpond.com>
Date: Thu, 22 Dec 2005 06:49:30 GMT
Message-ID: <_%rqf.81664$V7.69498_at_news-server.bigpond.net.au>


David Cressey wrote:

> "jason.glumidge_at_gmail.com" <Jason.Glumidge_at_gmail.com> wrote in
> message news:1135176559.700113.120710_at_f14g2000cwb.googlegroups.com...
> 
> 

>> I have been reading an article by C.J.Date - "Object Identifiers
>> vs. Relational Keys" in Relational Database Writings 1994-1997, and
>> while I'm not really an OODBM's fan, there are a few things in it
>> that I've found confusing:
>>
>> 1) The first was that Date adamantly and repeatedly states that
>> OID's are pointers. Now I am really foxed by this - as far as I've
>> ever been concerned a pointer is a variable that stores a memory
>> address. So how an earth can an ID be such a thing? If anything an
>> OID is more like the memory address that a pointer stores. But even
>> then surely that is a misleading comparison, as a memory address is
>> a fixed physical location, wheres an OID is location independent.
> 
> I'm probably not the best person to interpret CJ Date, especially on
> this issue.
> 
> On the physical level, I agree with you.  A pointer has the effect of
>  "pinning" the data structure pointed to in the following sense:  if
> the data structure is shuffled in the address space of the pointer, 
> then the pointer becomes invalidated.

This is my take on the genesis of Codd's original comment and Date's subsequent comment(s).

At that time Codd formulated his statement the data management tools at hand commonly exposed pointer information and often for performance gains application builders decided to "use" this information on a first hand basis. When things broke they did in a big way and required a "local hero" to munge it all back into shape. This lifestyle was not deterministic enough for the business users and so it is openly accepted as a "bad thing" tm. These were "real" pointers - not arbitrary numbers and the hardcore data managers quite reasonably hated the practice because it limited their ability to restructure the physical data to meet other goals. Thus the interest of separating the logical from the physical was to become a Grail!

Date et al sees a similarity in the unrestrained use of simplistic Id's because they look similar and their value apparently bears no obvious relation to the tuple contents. They conclude that allowing the practice to persist is tantamount to creating the conditions required for a return to the bad old days and will lead to discrediting of the RM. Furthermore because they are disjoint conceptually from the aligned data it is easily argued that should a failure occur the mug punter gets no assistance from the number itself to confirm its veracity.

Personally, I see some small mischief in Date's agenda*, as there are clear distinctions between the pointer and Id such that an inference that the former by association will suffer the well documented failures of the latter is demonstrably erroneous. This point I believe is couched in more formal terms by Gittens.

Furthermore with some judicious adjustments to the practical implementation of Id's I have found it is possible to ensure beyond all doubt that any particular Id is being lodged in the appropriate FK attributes and those attributes only. In this case I am convinced this is actually better than the use of natural keys (such as colour or surname) because I can control the domain of Id values for every PK and reject attempted updates of FK's with invalid Id values.

I guess I separate the RM into more than the 2 traditional? layers. My lowest level which is the DBMS (and below) is the Physical layer. The highest layer is the Logical (or Application) and finally there is (at least one) intervening layer that I would perhaps call the Model. The difference in my mind between the Model and the Logical is "what could be" and "what actually is" from the users perspective. I concede** it might be better to name the layers Physical, Logical and Business.

  • to establish his pecuniary value as an industry commentator. ** sleep deprived state.
> Of course, there are pointers embedded in index structures,  but
> these pointers are created by and managed by the DBMS.  The DBMS can
> update them when the data structure gets shuffled.
> 
> At the logical level, CJ Date's comments make more sense.  An OID
> conveys no more information than a pointer would have.  But this
> question can be raised for any form of surrogate key at all.  For
> that  matter, it can be raised about any "artificial key",  such as
> Social Security Number,  whether that key is assigned inside the
> DBMS, or inside the application,  or externally by the Social
> Security Administration.

Yep, I agree per my preceding discourse.

>> 2) Date sees no worth in a distinction between identity and
>> equality, stating that it is a nonsense that "two objects that look
>> the same to the user (meaning the user has no way to tell them
>> apart) might in fact be distinct." going on to say that there is no
>> use for "duplicates". My first impression is that this seems to
>> happen all the time in the real world, where while we know full
>> well two things are distinct but do not currently possess the
>> attribute that defines this difference. Or there are two items
>> which are identical in all their defining attriubtes but are still
>> unique (their location is different for instance, which we can't
>> continually record).

I also disagree with Date particularly when temporality is introduced. Just because the user can not discern a difference from their perspective, that does not represent a certainty to this black duck. However I do sympathise with the perspective in that the Id should "generally" be obscured from the Business user.

> I wonder whether DNA analysis can determine which of two identical
> twins can be placed at the scene of the crime.  I doubt it.  This may
> seem a frivolous point to raise in this context, but it's not.  There
> is no definitive list of "natural" attributes that can be used for
> identification.  This is always subject to later review in thelight
> of later discovery.

True. Given DNA testing as I understand it is a fairly coarse tool - it will not detect the myriad minor transpositions and defects that occur post the meiosis/recombination/blastula stages.

>> Now we all employ surrogate keys to enforce this distinction, but
>> this is a value we've just artificially constructed, not part of
>> the real items uniqueness. While the mechanism is different (and I
>> agree with Date on the logistical differences) this seems to just
>> be giving the tuple an identifier, again to allow a distinction
>> between identity and equality (two items in an RDBMS are equal if
>> everything apart from their surrogate id is the same), that he
>> originally objects to.

IMO the simple surrogate key in common use by way of an INT attribute with an auto IDENTITY does play right into Dates argument! As I said above it can be done much better than that.

Cheers,
Frank zzzzzzzzzzzz. Received on Thu Dec 22 2005 - 07:49:30 CET

Original text of this message