Natural keys vs Aritficial Keys

From: Walter Mitty <wamitty_at_verizon.net>
Date: Fri, 15 May 2009 16:57:39 GMT
Message-ID: <7ahPl.1491$5F2.1234_at_nwrddc01.gnilink.net>



I'm confused about some recent trends in database design.

When I learned databases, about a quarter of a century ago, the preferred practice was to use natural keys to refer to individual table rows, unless there's some real good reason to go with an artificial key. I've run into a few cases where I chose to use an artificial key, but most of the time I've used a natural key. (I should clarify: if a university's registration office assigns each student a StudentID, before entering that student into the database, I'm treating that key as "natural" in the context of the database, even though one might argue that it's "artificial" in some other context.) This generally surfaced in the choice of one or more columns to be declared as a PRIMARY KEY.

Every time the issue of natural keys comes up in Stackoverflow.com, the prevailing view seems to be that the best primary keys are artificial and opaque. And responses that take this view get the votes. They emphasize efficiency (mainly efficiency in the index that you usually get automatically when you declare a PRIMARY KEY). But data integrity seems to be forgotten, here. The same is not generally tru when the conversation turns to referential integrity. Slowly but surely the programming community seems to heve been dragged, kicking and screaming, into turning over referential integrity enforcement to the DBMS.

What I don't get is whether I've understood something that today's trendsetters never learned or whether they are taking into consideration matters that I never considered important, back in the day. I've read their arguments carefully, and it seems to me that failure to declare UNIQUE and NOT NULL constraints on the most widely used candidate keys is simply an invitation to degradation of data integrity. PRIMARY KEY is a convenient way to get those two constraints with one declaration.

Is data integrity less important than it used to be? Are most people building databases that get embedded in some appplication to be sold to the non technical buyer? Is something else going on that I'm unaware of? Can some body explain to me what these people are thinking? Received on Fri May 15 2009 - 18:57:39 CEST

Original text of this message