Re: How to normalize this?

From: Jan Hidders <hidders_at_gmail.com>
Date: Thu, 30 May 2013 13:28:00 +0200
Message-ID: <51a737c0$0$6090$e4fe514c_at_dreader36.news.xs4all.nl>

On 2013-05-29 14:20:06 +0000, hugokornelis_at_gmail.com said:

> On Thursday, May 23, 2013 6:43:42 PM UTC+2, Jan Hidders wrote:
>
> (snip)

>> It's to some extent nitpicking, but FDs only make sense in the context> 
>> of a single table, so in that sense you have assumed the existence of 
>> a> table.

>
> Okay, at the risk of goinig off-topic (and of showing my lack of formal
> education), but I have to ask. Is this a terminology issue? Suppose I
> find a FD: Student --> Mentor, and later find that this is a transitive
> FD because I also find Student --> Class and Class --> Mentor. For 3NF
> compliance, I would obviously split the table. This of course has no
> influence at all on the relationship between Student and Mentor. I have
> no problem accepting that this is now formally no longer called a
> functional dependency, but then what other term should I use instead?

Let me first say that this is certainly not a big mistake to make. There is a straightforward generalization of the concept of FD over a single relation to one that holds over different relations: it might mean that the FD holds over the relation that you get if you join all the tables in the schema into a single relation by using natural joins. In fact many textbooks assume this without saying so. But note that this assumes we can/should indeed join all the tables with natural joins, which in practice is not alwayes true because some column renaming is sometimes required. This is however true if you built your schema by starting from a single relation and then normalizing. So even then there is somehow still the idea in the background that the whole schema is somehow still a single relation. For normal discussions of normalization this is something that can be easily ignored, but in this case it was at the core of the discussion.

So what to call them then? In advanced literature on normalization you will find the notions of EGDs (equality generating dependencies) and TGDs (tuple generating dependencies). I'l try to give a very brief introduction:

An FD can be considered as a special case of a first-order logic statement. If for a relation R(A,B,C) we have the FD A->B we could also state this as:

ForAll x, y, z : R(A : x, B : y) & R(A : x, B : z) -> y=z

Or, if we are moving to the unlabeled perspective of the relational model with ordered columns:

ForAll x, y, z, u, v : R(x, y, u) & R(x, z, v) -> y=z

The class of EGDs is now defined by generalizing this a little. We stick to the general form of the formula, so it should still be:

ForAll .. : .. & .. & .. -> .. = ..

but we allow any number of variables, and any number of conjuctions over any number of different relations. So you can write for example:

ForAll x, y, z, u, v : R1(x, y) & R2(y, z) & R1(x, u) & R2(u, v) -> z=v

An that of course expresses what the FD A->C becomes if we split R(A, B, C) into R1(A, B) and R2(B, C).

Although the class of EGDs is quite big and generalizes FDs, it does not allow you to express MVDs and JDs, and also not INDs. Those are generalized in a different class called TGDs, tuple generating dependencies, which is defined in a very simillar way. Look at how we would express the IND R1[B] => R2[B] in first order logic: