Re: Natural keys vs Aritficial Keys

From: Walter Mitty <wamitty_at_verizon.net>
Date: Sat, 16 May 2009 17:22:09 GMT
Message-ID: <5DCPl.1714$5F2.627_at_nwrddc01.gnilink.net>


"Tony Toews [MVP]" <ttoews_at_telusplanet.net> wrote in message news:jf4s059mn7epor41g3cgfrdcoutiidbp5s_at_4ax.com...
> "Walter Mitty" <wamitty_at_verizon.net> wrote:
>
>>When I learned databases, about a quarter of a century ago, the preferred
>>practice was to use natural keys to refer to individual table rows, unless
>>there's some real good reason to go with an artificial key. I've run into
>>a
>>few cases where I chose to use an artificial key, but most of the time
>>I've
>>used a natural key. (I should clarify: if a university's registration
>>office assigns each student a StudentID, before entering that student
>>into
>>the database, I'm treating that key as "natural" in the context of the
>>database, even though one might argue that it's "artificial" in some other
>>context.)
>
> That's a decent natural key. But life is seldom so clear.
>
> Consider names. My brother has the same birth date, first, middle and
> last name as
> a career criminal in Canada. And we have a somewhat rare last name. It's
> a pain for
> him to cross the border into the US. He allows himself an extra hour.
> Although
> they must now have photo's in the US Customs computers as lately the delay
> as been
> quite a bit shorter.
>
> Consider business name. Duplicate business names can exist in adjacent
> provinces,
> states or countries.
>
> Now what happens when you get to the child tables? For example
> student/course
> table. What do you use as a primary key? Student ID, Course Number?
> What if a
> student takes the course a second time? Now the primary key needs three
> fields and
> some logic to ensure the third field is unique. What about child tables
> of that
> child table? Now it's a table with four fields, or more in the primary
> key.
>
>>Every time the issue of natural keys comes up in Stackoverflow.com, the
>>prevailing view seems to be that the best primary keys are artificial and
>>opaque. And responses that take this view get the votes. They emphasize
>>efficiency (mainly efficiency in the index that you usually get
>>automatically when you declare a PRIMARY KEY). But data integrity seems
>>to
>>be forgotten, here.
>
> How does this affect data integrity?
>
> When there is a possibility of duplicate data we should be presenting the
> users with
> the current data that is a close match so the user can decide if the
> person/company/whatever is already in the system or not. For example the
> first two
> letters of the first and last name give surprisingly few matches in a
> database I have
> of 10,000 names. Even Jo Sm for Joan Smythe.
>
>>The same is not generally tru when the conversation
>>turns to referential integrity. Slowly but surely the programming
>>community
>>seems to heve been dragged, kicking and screaming, into turning over
>>referential integrity enforcement to the DBMS.
>
> I don't quite understand that the problem is with turning referential
> integrity over
> to the DBMS.
>
> I'm quite happy with an artificial primary key (autonumber in Access) and
> unique
> indexes in data fields as appropriate.
>
> That said Access forms and reports don't handle multifield artificial keys
> very well.
> The wizards, for example, don't handle them at all. So you're first to
> labouriously
> enter the field names by hande in the various properties such as a
> subforms Link
> Child Fields and Link Master Fields properties.
>
> I should point out that
> 1) I build simple and moderately complex applications in MS Access so that
> clearly
> colours my viewpoints. A product many readers of my postings in this
> newsgroup hold
> in derision.
> 2) I look at this from both the DBAs viewpoint and the developers
> viewpoint although
> again many folks in this newsgroup would not consider me a DBA of any
> sort.
> 3) I've managed to irritate a number of the folks in this newsgroup with
> my postings
> in the past. As a result I've been plonked by a few. Oh well.
> 3a) As a result if one or two of those read replies to me they will be
> complaining
> that they were forced to read my opinion. <shrug>
>
> Tony
> --
> Tony Toews, Microsoft Access MVP
> Tony's Main MS Access pages - http://www.granite.ab.ca/accsmstr.htm
> Tony's Microsoft Access Blog - http://msmvps.com/blogs/access/
> Granite Fleet Manager http://www.granitefleet.com/

Thanks for your response. You've given me a lot to ponder.

On the question of names, I consider names to be so different from natural keys that I honestly didn't even give a thought to names when I first raised the question of natural versus artificial keys. We all know the reasons why names fail as keys, and you've outlined some of them. About all we can say about names is that they were used of identification and reference purposes in the era before computerization. When I say natural key, I mean something that's immutable, not used twice in two contradictory fashions, and not optional regarding the thing being identified. As I said in my OP, keys viewed as natural by the database may well be viewed as artificial in some larger system, one that includes applications and maybe even humans.

The student/course relation is a difficult example for considering parent/child relationships, because it's really a junction box to capture a relationship between many students and many courses. Is it a child of the student, or is it a child of the course? This is actually the kind of thing that resulted in so much trouble in the hierarchical and network databases that existed in 1970 and shortly therafter. As far as relationship tables go, I see nothing LOGICALLY wrong with assigning them a composite primary key. As to whether a given student and a given course can have more that one instance of a relationship between them, that's actually a matter of how the subject matter organizes the data, isn't it? And wouldn't the solution, in any event, be to come up with a ternary relationship in which the relationship among a student, a course, and some third thing has at most one row per instance?

This can only be discerned by studying the subject matter, and analyzing the data. Adding a ID PK field to the binary relationship just allows you postpone analysis until after the design and implementation are done. I try to avoid doing that, for obvious reasons.

With regard to presenting the user with the opportunity to disambiguate duplicate data, I think that's very relevant to a certain class of problems. But there is another class of problems where is of the essence to prevent such ambiguity from arising in the first place. I think it's that latter class of problems that I was mostly concerned with, back in the day.

The question of whether Access wizards deal with composite keys well, poorly, or not at all is a question that might influence the design of a database intended to be used with an Access application. But the fact that this benefit of a simple key attaches to a simgnle product (or a class of products) ought to be made explicit. Instead, today's database experts are telling neophytes that creating an autonumbered ID field is always a good practice, regardless of the tools one is using, whether the database is single user or multiuser, whether its embedded in a single application or integrates the management of data across many applications, and so on. There are consequences to permitting the same entity to be recorde twice in a table, in two rows that differ only in the value of the ID field. I believe Paul C outlined those difficulties in a separte response.

I've actually gotten beyond snobby attitudes towards Access. In my old age, I decided to return and learn the product, and there are actually some things about it that I like quite a lot. But respect should obscure the fact that Ed Codd's proposal for a way to organize data in order to share the data on an enterprise wide basis is fundamentally different from the goals of the early versions of MS Access, which is to facilitate management of data on a single user's personal computer. The difference between those two goals is so profound that translating what works well in one setting into the other ssetting can produce absurdities.

I hope I've established a tone here that permits us to explore big differences without simply engaging in insult and invective. I look forward to your reply. Received on Sat May 16 2009 - 19:22:09 CEST

Original text of this message