Re: A pk is *both* a physical and a logical object.

From: paul c <>
Date: Fri, 13 Jul 2007 15:02:06 GMT
Message-ID: <OtMli.110977$1i1.96776_at_pd7urf3no>

David Cressey wrote:
> Specifically, does Tutorial D have the concept of "primary key" in it? If
> so, how does it relate to relational theory? If not, how does it address
> the same problems that lead SQL practitioners to lean on the concept of
> primary key?

Here's a partial quote from TTM, 2nd edition, RM prescription 15 (sorry in advance for any typing errors of mine):

"Note. Historically, it has been usual to insist that, at least in the case of real relvars, there be a distingished candidate key called the primary (italics) key. While this discipline might be useful in practice, we do not insist on it (and Tutorial D as defined in Chapter 5 does not support it), because we regard the idea of making one candidate key somehow "more equal thant the others" as a psychological issue merely [70]. Of course, we do not prohibit it either."

([70] refers to a CJ Date article - "The Primacy of Primary Keys: An Investigation.) TTM also repeats the usual properties, uniqueness and irreducibility and since the above is a TTM "prescription", I presume it goes beyond Tutorial D, whereas the A-algebra has no reference to keys of any kind.

Codd's 1970 paper is full of the term, but I find it's often echoing the IMS term of the same name, IMS being a hierarchical dbms that I was told pretty much everybody at IBM labs was paid to know and study. Here's some of what Codd had to say:

"1.2.2. Indexing Dependence. In the context of formatted data, an index is usually thought of as a purely performance-oriented component of the data representation. It tends to improve response to queries and updates and, at the same time, slow down response to insertions and deletions. From an informational standpoint, an index is a redundant component of the data representation. If a system uses indices at all and if it is to perform well in an environment with changing patterns of activity on the data bank, an ability to create and destroy indices from time to time will probably be necessary. The question then arises: Can application programs and terminal activities remain invariant as indices come and go?"

(This point about a dependence that he wished to eliminate followed one about Ordering Dependence.)

I remember IMS had another "key" called a "secondary key". Certainly years ago I know most programmers thought of it as a very clumsy index, usually added when an unforeseen query, what I think Codd called a symmetrical one, reared its ugly face. This all leads me to think that the "primary" adjective first appeared in either IBM's or some other vendor's hierarchical dbms'es. Whereas the term "key" was well entrenched from the late 1950's, when every programmer had to know peripheral hardware and be able to write programs that are today called device drivers. Lynn W who visits here sometimes might clarify my memory but as I recall, some tape drives had physical count blocks and when disks such as the Ramac came along, "key" physical blocks were added and some of the "ISAM" support was built into the hardware channels and devices, for example, cylinder searches. Whatever, it's for sure that programmers and everybody around those machines treated an index as synonomous with a key although the reverse wasn't always the case because some keys were direct pointers and some were hashes. I'm sure Honeywell, Burroughs, Sperry etc., had similar gizmos. (I think it was around this time that DEC and Wang were about to put the "BUNCH" of IBM competitors out of business, don't know much about their early machine peripherals.)

I believe the System R project at IBM was not sponsored because IBM management believed in the RM, rather because there was noise from customers and mgmt simply wanted to put the question to bed. No question that SQL today is a hash (eg., is one lifetime enough to understand its massive specification?) but that early effort was under a lot scrutiny, pressure and opposition. From what I've read or been told, Codd took or was put into, the position of being the proselytizer/prophet while the System R team went their own way and Codd took all kinds of flak from not only IMS fans but the powerful IBM marketing forces who made big money by selling big iron to run it on. Date and others sometimes mention how exasperated Codd was by this. Oracle/Ellison saw the future and jumped the gun, helping to perpetuate some System R mistakes. Today, SQL's vagaries are a fact of life for many programmers, while general society has no idea of those, having only recently become aware of the inefficiencies of the internal combustion engine.

Just my worm's-eye view. That's enough memory lane for me for today.

p Received on Fri Jul 13 2007 - 17:02:06 CEST

Original text of this message