Re: foreign key constraint versus referential integrity constraint

From: Keith H Duggar <duggar_at_alum.mit.edu>
Date: Sat, 24 Oct 2009 10:53:07 -0700 (PDT)
Message-ID: <14cbdc28-4b31-4cf8-954b-d7784dcf0092_at_f10g2000vbl.googlegroups.com>


On Oct 22, 7:34 am, Sampo Syreeni <de..._at_iki.fi> wrote:
> > Thanks to everyone. I think I understand more clearly now the
> > inherent asymmetry of foreign key constraints. But now then my
> > question is what is the common name for the semantic constraint
> > that "attribute RV1:X and RV2:Y are sematincally joinable"?
>
> Hmm. Usually we talk about constraints when they're syntactic, because
> that's what makes it possible to enforce them automatically, simply by
> manipulating the symbols in the database. So what you actually mean
> is, what syntactic construct is appropriate for enforcing the
> semantics you describe. I'd argue that that's what domains were
> invented for.

I agree that it is possible to enforce such join semantics with domains but I'm not sure it is entirely appropriate. I say that because to the greatest extent the RM is orthogonal to domains. To quote CJ Date "Databases In Depth" Chapter 2 Summary:

"It's a very common misconception that the relational model deals only with rather simple types: numbers, strings, perhaps dates and times, and not much else. In this chapter, I've tried to show that this is indeed a misconception. Rather, relations can have attributes of /any type whatsoever/ -- the relational model nowhere predicates what those types must be, and in fact they can be as complex as we like ... In other words, the question as to what types are supported is orthogonal to the question of support for the relational model itself. Or (less precisely but more catchily): /types are orthogonal to tables/."

To put this another way, the RM should be just as complete and effective for a universe with a single domain as it is for a universe with a rich domain set (such as you propose to handle my GenericID join constraint example). And placing constraints on /relational expressions/ seems as fundamental to the RM as placing constraints on /relational values/.

By pushing such expression constraints off to domains, we are admitting that the RM itself has no support for constraining relational expressions and I'm not prepared to admit that limitation. Especially since it does provide support for constraining the values of relational variables.

I will also note that both Codd and Date proposed RM mechanisms to operate the other way around ie to allow designers to force cross-domain operations. Codd with "Domain Check Overrides" and date with the THE_ operators that provide type coercion.

> > Because simply having the same type is not enough for example
> > both OrderID and CouponID might both have type GenericID and
> > yet we may want to state that it doesn't make sense to join
> > them.
>
> I see at least two ways to view this. The first would be that you just
> made a modelling error -- you assigned the same syntactic type to two
> things that are actually quite different. What you should have done is
> to assign them two different types/domains, which would then stop them
> from being joined.
>
> The second way would be to say that you've now implicitly created a
> union type (Order|Coupon), and its key, GenericID, has to be such that
> OrderID's and CouponID's never overlap. That way you can join willy
> nilly, but the results will be identical to the case where you've
> isolated the ID's into two separate types. Carried out in full this
> approach eventually leads to a generalization hierarchy converging
> towards a fully generic Object, and thus database wide object/unique
> identifiers.

I think there are other views, see above. And I don't think either of the views is accurate in this case and certainly may not apply to every case one can imagine. In short, we should not let our lack of imagination guide our design principles.

> It is well known that this is rather a contentious and dangerous
> design choice. In fact we can already see the first signs of trouble
> in your minimal example: what on earth do Orders and Coupons have to
> do with each other? What precisely are you trying to express by
> creating the union type? In order for that generalization to make
> sense, at the very minimum you should be able to point to some
> hypothetical field that could be shared between the two types, i.e.
> data belonging uniquely to the union type. What would that be,
> precisely?

It's not for us to wonder such whys because our imagination is usually quite limited. For example, GenericID seems an entirely reasonable thing to me if I labelled every item I sent out (even the coupons) with a barcode and have other relations expressing facts about those barcodes apart from what they label. But arguing about such specific design semantics I think is irrelevant to basic questions of what constraint capabilities the relational model supports.

Anyhow, the question here is not one of our imagination but rather simply this: if it makes sense for the RM to support constraints on relational /values/ (taken on by variables) why does it not make sense to support constraints on relational /expressions/? That is a question of general principle not specific design.

> If you have the means of keeping such ID's private -- this would
> necessitate making the ID's fully opaque from an outside user's point
> of view, which cannot be done in any DBMS I know of -- you might just
> be able to justify their existence as surrogates. But this is still
> stuff that requires tremendous discipline and insight into the entity
> integrity issues that need to be addressed in addition to the
> referential integrity between the surrogates. I wouldn't recommend
> going down this slippery slope unless you're *absolutely* sure you
> know what you're doing -- I haven't seen a single wider scale
> deployment which got this right and didn't suffer integrity issues in
> the long run.

Frankly I don't understand the point of the above paragraph at all. Also I don't see how entity integrity enters into this at all; I'm assuming we follow the sage advice of having no NULLs to begin with; so entity integrity is non-issue.

> > Likewise attribute names for joinable attributes may be
> > different in different relations (ie needing rename).
>
> Yes. Ideally the attributes wouldn't carry just a name and a physical
> level type (e.g. string, integer) but also a semantic type/domain.
> That way the name would indicate the semantics, or role, of the
> attribute in this given relation, the domain would indicate what it
> can be joined with and where to look for the set of permissible
> values, and the syntacting type associated with the domain would tell
> how to actually store the data.

I don't think I'm talking about "physical level" types (posreps) at all. And again, I'm not convinced of your conclusion that the responsibility of constraining /relational/ expressions should be shoved off to domains.

Consider Date's point that RM is orthogonal to type support. Then what argument from principle do you have for prohibiting the RM from defining constraints on relational expressions by pushing that responsibility off to domain support? How would we express the constraints in a universe with one domain?

Thanks!

KHD Received on Sat Oct 24 2009 - 19:53:07 CEST

Original text of this message