Abstract identifiers, logical pointers, or foreign keys considered not enough

From: Costin Cozianu <c_cozianu_at_hotmail.com>
Date: Thu, 09 Oct 2003 11:40:48 -0700
Message-ID: <bm49va$io39m$1_at_ID-152540.news.uni-berlin.de>


It has been claimed that logical pointers are harmful and not necessary.

Everything can be expressed in terms of values. Well, logical pointers are values, for any reasonable definition of value. They are elements of a well defined types (or sort) in many sorted algebra, just like integers and strings.

Surrogate keys on the other hand are nothing but logical pointers, they are an explicit, programmer hand made instantiation of "logical pointers". Actually logical pointers is maybe a wrong term, a better name would be abstract identifier. Surrogate keys introduce unnecessary difficulties in expressing constraints and logical operations. They are good, insofar as the outside world (i.e. the business world) incorporates them into their model (their language), but if they are used just by the software system (as most of them are, in practice) in order to workaround technical limitation in the DBMS, then they are definitely not better than logical pointers.

I put a small example justifying the technical difficulties.

Lee Fesperman wrote:

> Jan Hidders wrote:
>

>>Seun Osewa wrote:
>>
>>>I would also like to know the classical arguments against the network
>>>model or other "pointer based" models.  The only things I know are
>>>that:
>>>
>>>** using pointers to positions in memory or disk can be messy when
>>>data has to be moved around.  But then is seems there are several
>>>simple ways to solve this, e.g. what I can only call "logical
>>>pointers".
>>
>>Correct. There's absolutely no reason to believe that you cannot have
>>data independence with logical pointer or references. I would however
>>argue that allowing entities without representable keys is not a good idea.

>
>
> Incorrect. Even though you call them 'logical' pointers, they are still physical
> artifacts and have no place in a truly logical view of the database. Databases are about
> data, and pointers are not data (or meta-data).
>
> This is very old news.
>
> The OP needs to increase his knowledge of database concepts before he tries to 'fix'
> things. A newsgroup is the wrong place to get any real depth of understanding.
>

Here's a very basic problem for you, that you won't get by foreign keys:

So you have a table Orders where you keep and each Order has to be paid in full in one payment. So a payment identifier has to stay in the Orders table.

Now payments can be of different types:

     Credit Card payments. Those will be identified and tracked by a special number we get from the clearing house, and will have a whole range of other data, typical to credit card.

     Check payments: this has another idefntifying number and an entirely different record (tuple) type.

     Paypal payments. Yet another identifier, yet another tuple type, with different information.

     Cryptographic money. Cryptographic money is a self contained information token that doesn't need a tuple type, they can just stay in line where they pay for something (for example in the ORDERS table).

And other, unknown at this time, payments have to be added in the future (for example prepaid coupons or gift cards ), without any impact whatsoever (like table reorganization) on the existing tables.

For security and data integrity reasons all the identifiers (primary keys) for different entities have different datatype, because there's different algorithms to check a credit card payment identifier versus a check payment identifier, vs. a paypal.

The main requirement is for user to query if the order is paid, when it was paid, etc, and to display the details of the order together with the relevant payment information to the user, to resolve payment disputes, etc.

So how do you model this scenario in the ideal relational database without using abstract identifiers ?

One solution might be to introduce an artificial (surrogate key) payment_id. I reject this solution, as it introduces an unnecessary entity, not existent in the business model (users know of "The credit card payment identified by 10241024, or the check payment 10000009, or the cryptographic money 0xABCDEFABCDEF.... -- doesn't matter the representation as long as the software verifies their validity). Plus the constraints are difficult to express and they have to be reformulated once we introduce different types of payments.

The solution using abstract identifiers is very easy. Let there be the abstract datatype PAYMENT, then the ORDERS table just needs to contain a REF PAYMENT column. New subtypes of payment may be added at any time, each in their own tables, without anything in ORDERS table being touched. When the programmer is concerned with the ORDERS table he doesn't need to worry of the multiple joins involved with a lot of potential different tables to get the details for credit card, checks, etc. He just needs to use a simple operator, like the "->" (or DEREF can be useful if very verbose) to get access to a PAYMENT value.

For example he can use orders.payment->toTextDescription() to get a string of character to display to the user.

In any case, the mechanism is simple and doesn't pose any logical problem. Contrary to the claim that logical pointers are illogical. On the contrary, under the model proposed say in The Third Manifesto, this very simple problem is difficult to handle. Received on Thu Oct 09 2003 - 20:40:48 CEST

Original text of this message