Re: foreign key constraint versus referential integrity constraint

From: Sampo Syreeni <decoy_at_iki.fi>
Date: Thu, 22 Oct 2009 12:16:53 -0700 (PDT)
Message-ID: <c36a42de-e892-4f21-acd7-77a9f8dc23b7_at_a6g2000vbp.googlegroups.com>


> Tempts me to use a new term (at least I'm guessing it's new) - exclusion
> dependency, even if there is nothing new about what it connotes.

I've seen that term being used, although far less commonly than is the case for inclusion. There is not a whole lot of theory on that sort of thing. (Which is why I forgot referential integrity can take that form as well; Thanks, Mr. Scott!)

I've never seen a real life schema where this sort of constraint was being enforced. Quite possibly because people don't come to think of the logical presence, and thus the necessity of the constraint. The main context in which I have in fact seen these used is a theoretical, data modelling one. There the commonest case is to enforce the distinction between plain and disjoint union types.

In practical, relational use that however mostly translates into a data model which places the separate pieces of a disjoint union into separate tables, and so makes the enforcement implicit. (Cf. dependency preservation and all that.) The other kind of union then tends to yield tables with lots of correlatedly null columns, or a constellation of tables with a shared key (often a surrogate/OID/ whatever in case the participants in the union are different enough; unsurprisingly the pattern often also leads to redundant columns as well).

> Personally, given that any

> dbms is likely to have a number of practical limitations, I don't see
> why a dbms couldn't allow restricted use of negation so that a foreign
> 'reference'/exclusion dependency might be read as "A{attr} = A{attr} AND
> (NOT B{attr})", which is basically of the same form as any inclusion
> dependency or what I call a reference.

Well, not quite the same form because it's more general. But no, there is no reason why that sort of functionality couldn't be available. And in fact, even SQL supports it nowadays, via assertions, eventhough just about nobody implements that part of the standard.

That's really quite a shame. Every single form of dependency I can think of could be asserted that way, and the wider use of assertions instead of more specialised forms of integrity constraint would basically for RDBMS vendors to treat constraints in a fully declarative, generalized, high level fashion. And if you then think about it, solving that problem would also pretty much solve the problem of incremental maintenance of materialized views, and vice versa.

> Also suspect that we have a number of qualified names

> for that basic form (of which 'primary key' was apparently the first)
> simply because using the terms in a dbms' language makes it easy for
> implementers to physically optimize.

Quite so. More specifically, if you special case for just a few forms of constraints (or other functionality at that), you can then special case your underlying implementation to deal particularly well with those special cases. That can considerably simplify implementation and lead to substantially higher performance in those cases which fit into your defined interface. Just like in our original example here: foreign keys as a concept have been heavily optimized by always demanding that the target is a candidate key and unconditionally creating a a unique index for it.

That optimization comes at the price of generality. So when we think about the relational model in all, such solutions don't really fit in too well. I mean, from the very start relational data management has been about making the model and its support machinery completely general and declarative. No difference between fields at the logical level. No difference between entities and relationships. No difference in the level at which data reside, unlike in hierarchical databases. No difference between pointers and the data pointed to, because pointers are explicitly forbidden. No difference between table and view. And so on.

Thus what the relational model would like is purely declarative, algebraically complete assertions as the means of guaranteeing integrity. Not specialized mechanisms like foreign keys -- those belong in the same era as pointers to data do.

> IMHO the implementation artifacts
> don't really contribute to, nor involve, any essential theory except for
> the usually ignored theory of optimization.

In this regard, there is a slight complication, though. If you think about foreign keys, they have one extra aspect in addition to the inclusion dependency they declare has to hold. That is the procedural rule used to deal with impending violation.

Roughly there are two general approaches to this, and SQL allows a limited form of both wrt foreign keys: 1) deny and roll back the transaction if a violation is about to happen, or the hazier, far more complicated option of 2) making it so that the update takes effect, and then using whatever means to automatically propagate the logical consequences of the update to once again make the integrity to fully and globally check out.

As I said, even SQL has the latter in the limited form of "on delete cascade". But in the fully general form, those operational semantics would be exceedingly complex, and evenmore complex to specify exactly if full generality was to be expected. So in that sense, we do have some very serious theory going on in here, and the kind of theory that hasn't been fully fleshed out yet even in the best of DB literature.

--
Sampo
Received on Thu Oct 22 2009 - 21:16:53 CEST

Original text of this message