Re: Database design, Keys and some other things

From: Anith Sen <anith_at_bizdatasolutions.com>
Date: Wed, 28 Sep 2005 03:51:27 GMT
Message-ID: <3ro_e.3941$zQ3.911_at_newsread1.news.pas.earthlink.net>


The idea behind surrogates is introduced not by Codd, but by Hall, Owlett & Todd in their '74 paper "Relations and Entities"; Codd just incorporated in to RM/T later in his 79 paper. Here is the relevant section of his ACM paper:
http://www.scism.sbu.ac.uk/~rmkemp/codd1979.pdf

Before taking Joe's interpretation of Codd's suggestions, consider Date's exposition of its inconsistencies at:
http://www.intelligententerprise.com/db_area/archives/1999/990106/online1.jhtml

It is interesting to note that, little importance has been given to notion of surrogates in formal database studies. Also, standard bodies like ISO, have never deemed surrogates to be worthy of a formal definition, due to its negligible distinction from other logical identifiers. But it is often a hot topic in newsgroups though.

For practical purposes, consider the following from Date's An Introduction to Database Systems, 7th Ed. ( ISBN 0201385902 ), p. 444 has the explanation:

[Quote]
Surrogate keys are keys in the usual relational sense but have the following specific properties:

  • They always involve exactly one attribute.
  • Their values serve solely as surrogates ( hence the name ) for the entities they stand for. In other words, such values serve merely to represent the fact that the corresponding entities exists -- they carry no additional information or meaning whatsoever.
  • When a new entity is inserted into the database, it is given a surrogate key value that has never been used before and will never be used again, even if the entity in question is subsequently deleted.

Ideally, surrogate key values would be system-generated, but whether they are system- or user-generated has nothing to do with the basic idea of surrogate keys as such.

It is worth emphasizing that surrogates are not ( as some writers seem to think ) the same thing as "tuple IDs." For one thing-to state the obvious-tuple IDs identify tuples and surrogates identify entities, and there is certainly nothing like a one-to-one correspondence between tuples and entities ( think of tuple IDs for derived tuples in particular ). Furthermore, tuple IDs have performance connotations, while surrogates do not; access to a tuple via its tuple ID is usually assumed to be fast (we are assuming here that tuples-at least, tuples in base relations-map fairly directly to physical storage, as is in fact the case in most of today's products). Also, tuple IDs are usually concealed from the user, while surrogates must not be ( because of The Information Principle ); in other words, it is not possible to store a tuple ID as an attribute value, while it certainly is possible to store a surrogate as an attribute value.

In a nutshell: Surrogates are a logical concept; tuple IDs are a physical concept.
[End Quote]

-- 
Anith 
Received on Wed Sep 28 2005 - 05:51:27 CEST

Original text of this message