Re: Abstract identifiers, logical pointers, or foreign keys considered not enough

From: Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com>
Date: 11 Oct 2003 12:49:04 -0700
Message-ID: <e9d83568.0310111149.42ebf7c3_at_posting.google.com>


Costin Cozianu <c_cozianu_at_hotmail.com> wrote in message news:<bm6o4c$j8uu5$1_at_ID-152540.news.uni-berlin.de>...
>
> You can group several unrelated things together without giving them an
> id. Like apples and oranges that we put in a basket, we don't need to
> give them apple_or_orange_id.
>
> Let's think of CUSTOMER, who have a default payment mechanism on record,
> where the PAYMENT_MECHANISM can be
>
> Credit Card, (CardtyTYPE, Number )
>
> OR
>
> Electronic Withdrawal From Account: (Bank Number, AccountNumber)
>
> You really don't get the business user to care that we create an extra
> ID for these two entities, they are already identifiable through their data.
>
> if we create a PAYMENT_MECHANISM_ID in current databases is just becuase
> the DBMS vendor doesn't allow us to say:
>
> type PaymentMechanism = CreditCard | ElectronicWithdrawalFromAccount
>
> So database developers will create a PAYMENT_MECHANISM table and invent
> PAYMENT_MECHANISM_ID, and even then it is either impossible or extremely
> awkward to specify the needed integrity constraint.
>
> But if the end user sees
>
> PAYMENT_MECHANISM_ID: 1020303030303,
>
> where he expects a credit card number or account details, he will say
> "What ???"
>
> Sometimes, they are forced to swallow such invention and make them part
> of their business model, just because they have to use our software, but
> not because a mathematical model of the business actually needs
> surrogate keys.

In my mind the rational for using surrogates in databases is that they insulate us from cascading changes in the case that we have to change the primary key.

An example:

we have the following relations:

VEHICLE(license_number primary key, year_of_purchase, etc...) VEHICLE_INSURANCE(insurance_no primary key, license_number, etc...)

Let's suppose for the sake of argument that no two vehicles have the same license number.

Now, the user enters the data of a new insurance policy but miss spells the license number and does not notice it until he has entered quite a lot of stuff in the system. At this point the value of 'license_number' has potentially, as a foreing key, been placed in many tables in the system.

There are now two possibilities:
1) He removes the policy and re-enters everything 2) The system has implemented a function to update all foreign key values
(OK, some databases, e.g. SQLServer2000 have a 'cascade' primary key update rule, but would I trust it with a cascading update of, say 50 tables?)

Alternative 1) means extra work and frustration for user Alternative 2) means lots of extra coding; note that this 'alter' function will have to be potentially updated over time as new tables are introduced with license_number as foreign key

Now, surrogates give us some insulation from that problem, because since the license number is found only in one table it is easy to change.

Our schema with surrogates would be:

VEHICLE(vehicle_id, license_number unique, year_of_purchase, etc...) VEHICLE_INSURANCE(insurance_no primary key, vehicle_id, etc...)

So in this case the use of surrogates is purely a pragmatic question and not something that results from some mathematical theory.

Please note, that the user is not supposed to even see the surrogate value, so it does not burden him in any way.

regards,
Lauri Pietarinen Received on Sat Oct 11 2003 - 21:49:04 CEST

Original text of this message