Re: candidate keys in abstract parent relations

From: Tony Rogerson <tonyrogerson_at_sqlserverfaq.com>
Date: Thu, 19 Jan 2006 15:41:13 -0000
Message-ID: <dqobtq$rha$1$8302bc10_at_news.demon.co.uk>


But don't forget there is another phase after the logical model and thats the implementation of it.

This is where surrogate keys really come into play.

It helps create a highly scalable database and provides an efficient interface into the applications using the database.

Consider a drop down listbox in say HTML, it has a VALUE which indicates which item in the drop down is selected, if we hadn't used a surrogate key (auto generated number) then we could well have a long value or even worse a composite key in there - which begs the question, if your primary key is a composite key which often it is, how would you code that?

I get a real bee in my bonet over the surrogate key issue, I'm not saying you do this David because I don't think you do; but a lot of designers (especially celko) try to directly implement the logical model without regard for anything other than writing pure SQL against it. The database from a business stand point is there to store data.

Tony.

-- 
Tony Rogerson
SQL Server MVP
http://sqlserverfaq.com - free video tutorials


"David Portas" <REMOVE_BEFORE_REPLYING_dportas_at_acm.org> wrote in message 
news:1137672084.357920.144470_at_g44g2000cwa.googlegroups.com...

> Forrest L Norvell wrote:
>
>> In this case, I guess I'm being finicky because if I use a generated
>> key column, it's *not* going to be a truly surrogate key: a surrogate
>> key implies that there is a candidate key for which the new key is
>> acting as a surrogate, and as defined, my album table lacks any
>> candidate keys. It's entirely possible that duplicate rows will be
>> added to the table without some kind of nonce or generated identifier
>> being added to its definition, so the entire justification for adding a
>> generated column is to guarantee uniqueness over the relation.
>> According to my (definitely shaky) grasp of the relational model, this
>> is a bad idea, and I'm afraid doing things this way exposes a weakness
>> in the underlying data model, even if I can't identify it.
>
> Yes it's a bad idea. Here's a generic example. Consider two tuples:
>
> a tuple = {'A','B','C'}
> a tuple = {'A','B','C'}
>
> They are the same. It serves no purpose to represent the same
> information twice, which is why we can safely say that a relvar must
> have at least one candidate key - the generality of the model isn't
> compromised by prohibiting redundancy. Now let's add an arbitrary
> artificial key on the understanding that this key has no external
> meaning whatever:
>
> a tuple = {1,'A','B','C'}
> another tuple = {2,'A','B','C'}
>
> Isn't this still just as redundant as in our first case? The artificial
> key isn't supplying any extra information in either tuple. In fact the
> artificial key usually isn't exposed to business users AT ALL. A
> relation containg these two tuples won't violate any principle of RM
> but it would cause us a major problem - the redundancy means that we
> have the potential for anomalies to arise and so we may get incorrect
> results from our database.
>
> Possibly {'A','B','C'} represents some entity that we are interested in
> counting - we want to know how many there are in our universe - but we
> don't need to add an artificial key to do that. We can just add another
> attribute ("quantity") and then a single tuple would be suffficient to
> represent any quantity of whatever we are modelling.
>
> So conceptually nothing is gained by adding the artificial key and
> nothing is lost by removing it. An artificial surrogate key may have
> certain practical advantages but an artificial key is redundant in the
> logical model by definition.
>
> The candidate key is defined by a subset of the attributes, so the real
> question is: what attributes do you need to model about an entity? If
> you aren't sure what the key is then you probably haven't answered that
> question properly.
>
> Now let's reconsider your example. You define an album by the tracks
> that it contains. There may be more than one album with the same name
> and in fact with all other attributes in common and those albums will
> be distinguishable only by the fact that they represent a different
> collection of tracks. Your mistake though is to identify a one-to-one
> correspondence between rows in your Album table (information about an
> album) and the album itself (a collection of tracks). Those are two
> different things. The missing piece in your model is a table or
> attribute(s) to supply the information that relates the two entities. I
> would have expected to see a "joining" table to represent the concept
> of a many-to-many relationship between the Album table and the Tracks
> table.
>
> --
> David Portas
>
Received on Thu Jan 19 2006 - 16:41:13 CET

Original text of this message