Re: Trying to define Surrogates

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Thu, 17 Aug 2006 14:38:38 GMT
Message-ID: <Ob%Eg.50001$pu3.587561_at_ursa-nb00s0.nbnet.nb.ca>


JOG wrote:

> [apologies for the cross posting].
>
> Bob Badour wrote:
>

>>I disagree that the concept of surrogate vs. natural is useful.

>
> Ok, I've had time to digest this now, and I have to say that I /do/
> believe the distinction can be important, and I think your
> interpretation is slightly awry. Let me explain:
>
> Bob Badour wrote:
>
>>It is a surrogate for whatever a surrogate key is for. Think of any
>>natural key. How is it not a surrogate?

>
> o.k., a surrogate is a subsitute for something. That's agreed.
>
>
>>My name is not me. It is an arbitrary identifier chosen by my parents.
>>It is familiar because I was conditioned from an early age to respond to it.

>
> I /strongly/ contest that your name is a surrogate. Your name is a
> 'label' applied to you, it is not a 'substitute' for you. I know it is
> a subtle distinction but it is important. (n.b. these are not really my
> deductions but a regurgitation of the writings of William Kent, highly
> rated by perople such as Date.)

I respectfully suggest that no important distinction exists between a label and a surrogate as used in the context of candidate keys.

>>My SSN is not me. It is an arbitrary identifier chosen by the IRS to
>>identify tax filings related to my income. It is familiar because I was
>>given a little blue card with it inscribed, and I was instructed to
>>transcribe it to a variety of documents.

>
> Again an SSN is a label applied to you just in a different context, and
> not a substitute for you. Same for the other examples supplied.
>
>
>>[snip]
>>I am not suitably represented for machine processing.

>
> Agreed, but the labels applied to you /are/ suitable for machine
> processing. Hence we don't need to provide any substitutes for them -
> they can go straight into propositions and ultimately the database.

And when we use the values of labels to identify the values of the labels, no surrogacy is required. When we use the labels to identify me, they stand as surrogates for me.

Candidate keys are labels. Values are self-identifying and self-labelling.

> However, there are some identifiers that we do not have suitably
> formatted labels for. Attributes that are currently not easy to enter
> into a proposition. Fingerprints for example.

Does it matter whether the fingerprint has been scanned and digitized and represented suitably for machine processing?

  Some attributes might not
> be easily recordable even though we know they exist. These are
> attributes not suitable for machine processing.

Define 'easily'. Fingerprints are recorded. Genomes are recorded. Feature-length films are recorded. CT scans are recorded. What is too difficult to record? The exact location and velocity of a sub-atomic particle? The only reason we cannot record that is we cannot measure it in the first place.

> Hence we 'subsitute' that key with a different artificial key we have
> generated, to act as its representative. That is what surrogacy is.

How exactly does that differ from labelling? We have a fingerprint and we label it 'Defense Exhibit 117' or we label it 532673294. We then use the label to refer to the fingerprint.

This is merely case of trading off simplicity and familiarity.

> The blur comes in that this is only really useful at design time,
> because as soon as the attribute is used externally, it becomes a new
> natural key. Hence I agree that after the fact the distinction is not
> useful, but a priori it is important to /understand/ exactly what's
> going on as it eliminates any foolhardy temptation to try and hide such
> attributes and violate the information principle.

I suggest it is more illuminating to /understand/ during design that one is making a pragmatic design tradeoff among a handful of sometimes conflicting design criteria: simplicity, familiarity, stability, irreducibility.

If one remembers that familiarity is a design criterion and why it is a design criterion, one won't feel any temptation to hide anything. Received on Thu Aug 17 2006 - 16:38:38 CEST

Original text of this message