Re: How to normalize this?

From: Jan Hidders <hidders_at_gmail.com>
Date: Fri, 31 May 2013 10:53:20 +0200
Message-ID: <51a86500$0$6360$e4fe514c_at_dreader35.news.xs4all.nl>


On 2013-05-30 22:02:16 +0000, Erwin said:

> Op donderdag 30 mei 2013 13:28:00 UTC+2 schreef Jan Hidders het volgende:

>> On 2013-05-29 14:20:06 +0000, ... said:
>>
>>
>>
>>> On Thursday, May 23, 2013 6:43:42 PM UTC+2, Jan Hidders wrote:

>>
>>> 

>>
>>> (snip)

>>
>>>> It's to some extent nitpicking, but FDs only make sense in the context>

>>
>>>> of a single table, so in that sense you have assumed the existence of

>>
>>>> a> table.

>>
>>> 

>>
>>> Okay, at the risk of goinig off-topic (and of showing my lack of formal

>>
>>> education), but I have to ask. Is this a terminology issue? Suppose I

>>
>>> find a FD: Student --> Mentor, and later find that this is a transitive

>>
>>> FD because I also find Student --> Class and Class --> Mentor. For 3NF

>>
>>> compliance, I would obviously split the table. This of course has no

>>
>>> influence at all on the relationship between Student and Mentor. I have

>>
>>> no problem accepting that this is now formally no longer called a

>>
>>> functional dependency, but then what other term should I use instead?

>>
>>
>>
>> Let me first say that this is certainly not a big mistake to make.
>>
>> There is a straightforward generalization of the concept of FD over a
>>
>> single relation to one that holds over different relations: it might
>>
>> mean that the FD holds over the relation that you get if you join all
>>
>> the tables in the schema into a single relation by using natural joins.
>>
>> In fact many textbooks assume this without saying so. But note that
>>
>> this assumes we can/should indeed join all the tables with natural
>>
>> joins, which in practice is not alwayes true because some column
>>
>> renaming is sometimes required. This is however true if you built your
>>
>> schema by starting from a single relation and then normalizing. So even
>>
>> then there is somehow still the idea in the background that the whole
>>
>> schema is somehow still a single relation. For normal discussions of
>>
>> normalization this is something that can be easily ignored, but in this
>>
>> case it was at the core of the discussion.
>>
>>
>>
>> So what to call them then? In advanced literature on normalization you
>>
>> will find the notions of EGDs (equality generating dependencies) and
>>
>> TGDs (tuple generating dependencies). I'l try to give a very brief
>>
>> introduction:
>>
>>
>>
>> An FD can be considered as a special case of a first-order logic
>>
>> statement. If for a relation R(A,B,C) we have the FD A->B we could also
>>
>> state this as:
>>
>>
>>
>> ForAll x, y, z : R(A : x, B : y) & R(A : x, B : z) -> y=z
>>
>>
>>
>> Or, if we are moving to the unlabeled perspective of the relational
>>
>> model with ordered columns:
>>
>>
>>
>> ForAll x, y, z, u, v : R(x, y, u) & R(x, z, v) -> y=z
>>
>>
>>
>> The class of EGDs is now defined by generalizing this a little. We
>>
>> stick to the general form of the formula, so it should still be:
>>
>>
>>
>> ForAll .. : .. & .. & .. -> .. = ..
>>
>>
>>
>> but we allow any number of variables, and any number of conjuctions
>>
>> over any number of different relations. So you can write for example:
>>
>>
>>
>> ForAll x, y, z, u, v : R1(x, y) & R2(y, z) & R1(x, u) & R2(u, v) -> z=v
>>
>>
>>
>> An that of course expresses what the FD A->C becomes if we split R(A,
>>
>> B, C) into R1(A, B) and R2(B, C).
>>
>>
>>
>> Although the class of EGDs is quite big and generalizes FDs, it does
>>
>> not allow you to express MVDs and JDs, and also not INDs. Those are
>>
>> generalized in a different class called TGDs, tuple generating
>>
>> dependencies, which is defined in a very simillar way. Look at how we
>>
>> would express the IND R1[B] => R2[B] in first order logic:
>>
>>
>>
>> ForAll x, y, z : R1(x, y) -> Exists z : R2(y, z)
>>
>>
>>
>> Or the MVD A ->> B in R(A, B, C):
>>
>>
>>
>> ForAll x, y, z : R(x, y, z) & R(x, u, v) -> R(x, y, v) & R(x, u, z)
>>
>>
>>
>> So the class of TGDs is defined by formulas of the form:
>>
>>
>>
>> ForAll .. : .. & .. & .. -> Exists .. : .. & .. & ..
>>
>>
>>
>> This class indeed can express all MVDs, JDs and INDs, plus the
>>
>> dependencies they turn into if we start splitting tables.
>>
>>
>>
>> I could tell more, but need to get back to work now. :-)
>>
>>
>>
>> -- Jan Hidders
>
> Are EGD's in any way related to DKNF ?

Not besides the obvious connection that they generalize FDs and therefore are connected to DKNF.

> Are the individual conjuncts in an EGD required to be a rel(ation/var) > of the schema ?

Yes.

> Does the same apply to TGD's ?

Yes.

> If all and any individual conjuncts of a TGD are allowed to be just any 
> arbitrary relational expression defined on the relvars, do you then 
> think that TGD's are sufficient to express all and any possibly 
> conceivable database constraint ?

Depends on how you define the language of "any arbitrary relational expression". If that language can express any computable query, then yes. If you mean bu that language the classical relational algbra, then no, since you would still be within first-order logic.

  • Jan Hidders
Received on Fri May 31 2013 - 10:53:20 CEST

Original text of this message