Let me first say that this is certainly not a big mistake to make.
There is a straightforward generalization of the concept of FD over a
single relation to one that holds over different relations: it might
mean that the FD holds over the relation that you get if you join all
the tables in the schema into a single relation by using natural joins.
In fact many textbooks assume this without saying so. But note that
this assumes we can/should indeed join all the tables with natural
joins, which in practice is not alwayes true because some column
renaming is sometimes required. This is however true if you built your
schema by starting from a single relation and then normalizing. So even
then there is somehow still the idea in the background that the whole
schema is somehow still a single relation. For normal discussions of
normalization this is something that can be easily ignored, but in this
case it was at the core of the discussion.
So what to call them then? In advanced literature on normalization you
will find the notions of EGDs (equality generating dependencies) and
TGDs (tuple generating dependencies). I'l try to give a very brief
introduction:
An FD can be considered as a special case of a first-order logic
statement. If for a relation R(A,B,C) we have the FD A->B we could also
state this as:
ForAll x, y, z : R(A : x, B : y) & R(A : x, B : z) -> y=z
Or, if we are moving to the unlabeled perspective of the relational
model with ordered columns:
ForAll x, y, z, u, v : R(x, y, u) & R(x, z, v) -> y=z
The class of EGDs is now defined by generalizing this a little. We
stick to the general form of the formula, so it should still be:
ForAll .. : .. & .. & .. -> .. = ..
but we allow any number of variables, and any number of conjuctions
over any number of different relations. So you can write for example:
ForAll x, y, z, u, v : R1(x, y) & R2(y, z) & R1(x, u) & R2(u, v) -> z=v
An that of course expresses what the FD A->C becomes if we split R(A,
B, C) into R1(A, B) and R2(B, C).
Although the class of EGDs is quite big and generalizes FDs, it does
not allow you to express MVDs and JDs, and also not INDs. Those are
generalized in a different class called TGDs, tuple generating
dependencies, which is defined in a very simillar way. Look at how we
would express the IND R1[B] => R2[B] in first order logic:
ForAll x, y, z : R1(x, y) -> Exists z : R2(y, z)
Or the MVD A ->> B in R(A, B, C):
ForAll x, y, z : R(x, y, z) & R(x, u, v) -> R(x, y, v) & R(x, u, z)
So the class of TGDs is defined by formulas of the form:
ForAll .. : .. & .. & .. -> Exists .. : .. & .. & ..
This class indeed can express all MVDs, JDs and INDs, plus the
dependencies they turn into if we start splitting tables.
I could tell more, but need to get back to work now. :-)
Received on Thu May 30 2013 - 13:28:00 CEST