# Re: Principle of Orthogonal Design

Date: Tue, 22 Jan 2008 04:49:26 -0800 (PST)

Message-ID: <0712c556-9d28-4e87-95dd-48eff9219ab8_at_q39g2000hsf.googlegroups.com>

On 21 jan, 23:10, mAsterdam <mAster..._at_vrijdag.org> wrote:

*> Jan Hidders schreef:
**>
**> > mAsterdam wrote:
*

> >> Jan Hidders wrote:

*> >>> ..., you can always trivially satisfy
**> >>> it by renaming R(a,b) to R(r_a,r_b).
**> >>> How exactly does that improve the design?
**> >> Not. But just renaming the attributes does not change the
**> >> tuple-type (which I earlier referred to as signature of
**> >> the relation), so, while it indeed does not improve the
**> >> design, it also has no bearing on satisfying PoOD as
**> >> I understand it from EE.
**>
**> > As far as I can see their definition of both tuple type and tuple
**> > includes the attribute names, so then it matters.
**>
**> Yes, depending on 'it matters', because , as you said,
**> compliance by renaming becomes trivial.
**>
**> >> I could not find your definition, BTW.
**>
**> > That's probably because I didn't explicitly state it as a definition.
**> > Apologies for making you go through the thread again. Here it is
**> > again:
**>
**> >>>>> What they probably should have said is: R and S are said to have
**> >>>>> overlapping meaning if it does not follow from the constraints /
**> >>>>> dependencies that the intersection of R and S is always empty.
**>
**> Thank you.
**>
**>
**>
**> > For the record. I think this definition is better than Erki Eesaar's,
**> > but as I will argue later on I also think that it is still too crude
**> > and that there is a better definition possible.
**>
**> >> It is clear from JOG's OP that renaming the attribute doesn't
**> >> PoOdify the design - which made Darwen reject it.
**>
**> >> From JOG's OP:
**>
**> >> OP> Darwen rejected the original POOD paper outright given that
**> >> OP> McGovern posits that:
**> >> OP>
**> >> OP> R1 { X INTEGER, Y INTEGER }
**> >> OP> R2 { A INTEGER, B INTEGER }
**> >> OP>
**> >> OP> violates the principle, whatever the relations' attribute names.
**>
**> > Interesting. I'm missing the context here, so I'm not sure about their
**> > positions, but I suspect that to some extent both are right. Darwen is
**> > right that McGoverns' definition is probably too strict, but McGovern
**> > is right that the attribute names shouldn't matter. I'll come back to
**> > this later, because I think there is a solution that might be
**> > acceptable for both. Well, acceptable for me, anyway. :-)
**>
**> >>>>> What they probably should have said is: R and S are said to have
**> >>>>> overlapping meaning if it does not follow from the constraints /
**> >>>>> dependencies that the intersection of R and S is always empty.
**> >>>> Maybe, but it would not take away my objection.
**> >>>> As long as R\S and S\R are allowed to be non-empty,
**> >>>> R and S are independent, regardless of their heading.
**> >>> In that case you still might have redundancy.
**> >> Redundancy in what sense?
**>
**> > The same fact being represented in more than one place. Note the
**> > "might have" in the sentence. It could very well be that there is in
**> > fact no such redundancy, even if it does have overlapping meaning
**> > according to my definition. In that sense my definition of overlapping
**> > meaning is still too crude because it is a sufficient condition but
**> > not a necessary condition. This gets even worse if we start ignoring
**> > the attribute names.
**>
**> > Can this be solved with a more refined notion of "overlapping meaning"
**> > that still is defined in terms of dependencies? I think it can. For
**> > that I need a new notion for a certain restricted class of
**> > dependencies: a qualified inclusion dependency. An example of such a
**> > dependency is a constraint like "if R(a=x,b=y) and x > 5 then
**> > S(c=x,d=y)". If such a constraint holds then certain facts represented
**> > in R will also be represented in S, and so there will very probably be
**> > redundancy and update anomalies.
**>
**> The constraint only applies to a subset of R
**> I.o.w. R is a mix of constrained and unconstrained data.
**> We are talking a /design/ principle here, so my question
**> would be: how did these different sets end up disguised
**> as one in the first place?
*

Very good point. But note that just horizontally splitting R turns the qualified inclusion dependency into a normal inclusion dependency, which is a good sign, but does not remove the redundancy.

> But indeed, redundancy.

*>
**> > In general a qualified inclusion dependency is of the form "if Q1 then
**> > Q2" where Q1 is a conjunction of atoms and simple equations, Q2 is a
**> > single atom, and there is at least one atom in Q1 with the same free
**> > variables as Q2. Such an inclusion dependency is said to hold between
**> > R and S if at least one atom in Q1 that has the same free variables as
**> > Q2 concerns R and the atom in Q2 concerns S.
**> > Our new definition of "overlapping meaning" might then be as follows:
**> > R and S are said to have overlapping meaning if there is a qualified
**> > dependency between R and S or between S and R that does not follow
**> > from the dependencies at relation level.
**>
**> An inclusion dependency on a proper subset would be
**> a tell, right?
*

Close, but not exactly. *Every* inclusion dependency in some sense causes redundancy, but if the set of attributes in which it ends is also a component of non-trivial join dependency, then you can remove this redundancy. So that is why the rule should say that only then there is a problem.

> > As you can see it also deals with the case where R and S have

*> > differently named attributes, and at the same time arguably does not
**> > see overlapping meaning where there actually is none, so it is more
**> > refined then my preceding definition. Darwen and McGovern could
**> > perhaps both be happy with this. :-)
**>
**> > What would really establish its correctness is a theorem that says
**> > that it characterizes exactly if there is a certain type of redundancy
**> > (defined for example a la Libkin) at schema level or not. Much like we
**> > also know is the case for 5NF. I think that's possible, although we
**> > should probably take all full, single-head dependencies into account
**> > for that and redefine the normal form accordingly, but unfortunately I
**> > don't have time right now.
**>
**> >>> If you decompose in to the following three,
**> >>> R' = R/S,
**> >>> RS' = R intersect S,
**> >> RS' = R ⋂ S (long live Unicode!)
**>
**> > Mmm. I'm not sure if all clients support this (although Google groups
**> > does), or even if all relay servers relay it well. AFAIK the usenet
**> > standards do not require that all 8 bits of every byte in a message
**> > are relayed.
**>
**> Last time I checked, with レ (re) and ル (ru),
**> there were no complaints.
**>
**> ID: R(a) レ S(b) japanese re, for 'references'
**> (mnemonic: check)
**> FK: R(a) ル S(b) japanese ru, for 'references unique'
**> (mnemonic: check one)
*

:-) FWIW, that works for me also.

- Jan Hidders