Re: foreign key constraint versus referential integrity constraint

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Thu, 22 Oct 2009 04:34:22 -0700 (PDT)
Message-ID: <4bb2d0b1-48c7-45b9-8ff8-065ad1203950_at_p36g2000vbn.googlegroups.com>


> Thanks to everyone. I think I understand more clearly now the
> inherent asymmetry of foreign key constraints. But now then my
> question is what is the common name for the semantic constraint
> that "attribute RV1:X and RV2:Y are sematincally joinable"?

Hmm. Usually we talk about constraints when they're syntactic, because that's what makes it possible to enforce them automatically, simply by manipulating the symbols in the database. So what you actually mean is, what syntactic construct is appropriate for enforcing the semantics you describe. I'd argue that that's what domains were invented for.

> Because simply having the same type is not enough for example
> both OrderID and CouponID might both have type GenericID and
> yet we may want to state that it doesn't make sense to join
> them.

I see at least two ways to view this. The first would be that you just made a modelling error -- you assigned the same syntactic type to two things that are actually quite different. What you should have done is to assign them two different types/domains, which would then stop them from being joined.

The second way would be to say that you've now implicitly created a union type (Order|Coupon), and its key, GenericID, has to be such that OrderID's and CouponID's never overlap. That way you can join willy nilly, but the results will be identical to the case where you've isolated the ID's into two separate types. Carried out in full this approach eventually leads to a generalization hierarchy converging towards a fully generic Object, and thus database wide object/unique identifiers.

It is well known that this is rather a contentious and dangerous design choice. In fact we can already see the first signs of trouble in your minimal example: what on earth do Orders and Coupons have to do with each other? What precisely are you trying to express by creating the union type? In order for that generalization to make sense, at the very minimum you should be able to point to some hypothetical field that could be shared between the two types, i.e. data belonging uniquely to the union type. What would that be, precisely?

If you have the means of keeping such ID's private -- this would necessitate making the ID's fully opaque from an outside user's point of view, which cannot be done in any DBMS I know of -- you might just be able to justify their existence as surrogates. But this is still stuff that requires tremendous discipline and insight into the entity integrity issues that need to be addressed in addition to the referential integrity between the surrogates. I wouldn't recommend going down this slippery slope unless you're *absolutely* sure you know what you're doing -- I haven't seen a single wider scale deployment which got this right and didn't suffer integrity issues in the long run.

> Likewise attribute names for joinable attributes may be
> different in different relations (ie needing rename).

Yes. Ideally the attributes wouldn't carry just a name and a physical level type (e.g. string, integer) but also a semantic type/domain. That way the name would indicate the semantics, or role, of the attribute in this given relation, the domain would indicate what it can be joined with and where to look for the set of permissible values, and the syntacting type associated with the domain would tell how to actually store the data.

--
Sampo
Received on Thu Oct 22 2009 - 13:34:22 CEST

Original text of this message