Re: What are the design criteria for primary keys?

From: paul c <>
Date: Sat, 04 Sep 2010 00:05:26 GMT
Message-ID: <a3ggo.498$89.158_at_edtnps83>

On 03/09/2010 3:21 PM, Cimode wrote:
> On 3 sep, 22:48, paul c<> wrote:
>> On 02/09/2010 9:20 PM, Bob Badour wrote:
> For a change from the usual sterile rantings about current state of
> contemporary database design normative habits, and to play somehow
> devil's advocate, I'd have to say that the concept of *key* is almost
> orthogonal to RM. After all, the concept of *keys* mostly embodies
> the IBM context in which RM was theorized by Codd rather than the
> logical concept of unique tuple identifier in an algebraical sense.
> Additionally, defining quasi metaphysical subjective criteria such as
> *familiarity* somehow sounds wrong as opposed to the rigor of the
> fundamental mathematics that are supposed to act as a foundation for
> RM structural information representations. For some reason, I noticed
> such criteria is oftent an open door to diversion from the concept of
> *distinguishability* (which at least can be expressed algebraically)
> which seems more important than others.
> Finally, I'd have to say that looking for database theory definition
> into a board, we know is filled by undeducated audiences is at best
> pointless, at worst, sterile.
> In conclusion, I think uniqueness (I'd add distinguishability),
> irreducibility, stability are more useful for definition rather than
> *simplicity*, *familiarity* since the last two can hardly be
> formalized.

(I don't think it's sterile to suggest that more of today's programmers, at least the ones who post to websites and the ones who produce the systems I'm forced to follow as an ordinary consumer or citizen, might profit more from a course in "reading for comprehension" than another computer language course.)

As for keys, what comes first, keys or relations? I think some people do when faced with conceiving a db, is to think of what they want to identify and then decide on a possibly tentative identifier which might or might not end up as a key. Can't fault that. Perhaps more often, other people first think of what they want to record and then how to distinguish the 'records' (won't quibble about semantic differences among 'unique', 'identity', 'distinct' and 'distinguishable'). Can't fault that either.

I think Bob B is accurate when he says the effort to do this is more art than science because it involves a certain amount of critical thinking, a certain amount of lateral thinking, a certain amount of experimentation, a certain amount of iteration and a small amount of theory.

But they can't be satisfied until they've designed most or all of the relations/tables they think they need and take advantage of normalization theory to end up with as few key attributes as possible. That's not the same as saying as few keys as possible, eg., a system without surrogates can have fewer key attributes than one that mindlessly uses surrogates for every relation but the opposite isn't guaranteed. So that is another qualitative dimension that starts to show up when additional tables are involved. I don't know that it has a conventional label, but it appeals to me.

I think many people go at normalization in a knee-jerk kind of way, eg., because they think they're "supposed to" or because they want to avoid the oft-mentioned old bugaboo, "update anomalies". But the advantage of avoiding redundancy also has a lot to do with potential system efficiency. I say "potential" because few if any dbms' are built to take advantage of, for example, functional dependencies.

I've seen alternative db designs that looked equally useful and efficient to me even though one of them used keys that were very unfamiliar to me. If I had to choose, I think I would have chosen the one that was familiar. So I think we must always expect subjective attitudes / personal esthetics to slow down any group effort.

I abhor the fixed dictums many authors and consultants put out, maybe the worst of them being to apply, universally, surrogate values that are completely artificial, or to 'always use integer keys' as one of Bob B's cuckoo correspondents put it. On occasion I've been ordered to participate in some exercise or other run by psychologists or social 'scientists' of one ilk or another (often called the 'soft' sciences). Although I was usually skeptical that anything good would come of those exercises, there were a few times when I learned techniques that seemed to have repetitive advantage. When it comes to the 'art' Bob B mentions, I do think there are styles and approaches evolved by the soft sciences that might help people conceive schemas more determindedly (not deterministically!). But all the courses I took weren't aimed at single or group design specifically, for example one of them only dealt with recognizing different personality types and ways to avoid wasting time when dealing with them. Still that would be better than nothing for most group efforts. Personally, I prefer a division of schemas such that one individual can explain all of his piece to me off the top of his head.

It's likely that Codd must have been very much influenced by the 'physical' keys that were lingua franca in the mainframe world of the 1950's and 60's. (I started with that stuff and remember that you couldn't look another assembler programmer in the eye unless you could write a channel program even though that was hardly ever necessary compared to the packaged access methods of the time. For a time there was even a trend to move more access method logic outboard of the cpu.)   So the term 'key' has been quite 'familiar' to me for forty years. But I'd say my narrow view of it probably impeded my awareness of logical database for more than twenty of those years. The dbms products, most of them originated by people of my generation are, I think, equally responsible for my own lack of progress. Most younger people don't have the advantage of an historical perspective and sadly most don't want it either. I don't like some the historical baggage Codd's use of 'key' connotes but I also don't like the ignorance of its possibilities (like the FD leverage I referred to above) amongst dbms developers.

I wouldn't object if you called the above 'directionless', as opposed to 'sterile', ha, ha. Received on Fri Sep 03 2010 - 19:05:26 CDT

Original text of this message