Re: Principle of Orthogonal Design

From: Jan Hidders <hidders_at_gmail.com>
Date: Mon, 21 Jan 2008 08:03:59 -0800 (PST)
Message-ID: <61940c72-ee9a-452e-8e00-117fc86f0505_at_k39g2000hsf.googlegroups.com>


On 21 jan, 00:35, mAsterdam <mAster..._at_vrijdag.org> wrote:
> Jan Hidders wrote:
> > mAsterdam wrote:
> >> Jan Hidders wrote:
> >>> mAsterdam wrote:
> >>>> Jan Hidders wrote:
> >>>>> mAsterdam wrote:
> >>>>>> Jan Hidders wrote:
> >>>> ...
> >>>>>>> Anyway, the stronger POOD that requires that headers are
> >>>>>>> distinct sounds like nonsense to me. Why would R(A, B)
> >>>>>>>  be a worse design than R(R_A, R_B)?
> >>>>>>> The weaker POOD looks more interesting to me.
> >>>>>>>  I even  found a published paper about it:
> >>>>>>>http://www.wseas.us/e-library/conferences/2006madrid/papers/512-451.pdf
> >>>>>> Unfortunately the link times out.
> >>>>> Hmm, not for me. But to help you out:
> >>>>>http://www.adrem.ua.ac.be/bibrem/pubs/pood.pdf
> >>>> Thank you for helping me out by providing the document.
> >>>> I started reading "Extended Principle of
> >>>> Orthogonal Design" by Erki Eessaar
> >> below: EE
> >>>> User defined datatypes (UDT) give the designer of a database
> >>>> more decisions, more room for wrong decisions.
> >>>> Row-, array-, reference-, multiset collection types
> >>>> - choices, choices, choices.
> >>>> How to deal with that freedom?
> >>>> Enter the Principle of Orthogonal Design (POD):
> >>>> If you have a tuple-type, make sure to have only one base
> >>>> relvar for recording that tuple-type.
> >>> Hmm, I would call that the strong POOD (or strong POD), and
> >>> that, I would agree, seems to make no sense.
> >> We agree on that, that is clear.
> >> So ok, but - from the huge category of easily asked but
> >> hard to answer questions: - why?

>

> >> Why doesn't the (strong) PoOD make sense to you?
>

> > It is at the same time too strong and too weak. It is too
> > strong because it forbids cases where there is in fact no
> > redundancy, and at the same time allows cases where there
> > is redundancy. Moreover, you can always trivially satisfy
> > it by renaming R(a,b) to R(r_a,r_b).
> > How exactly does that improve the design?
>

> Not. But just renaming the attributes does not change the
> tuple-type (which I earlier referred to as signature of
> the relation), so, while it indeed does not improve the
> design, it also has no bearing on satisfying PoOD as
> I understand it from EE.

As far as I can see their definition of both tuple type and tuple includes the attribute names, so then it matters.

> I could not find your definition, BTW.

That's probably because I didn't explicitly state it as a definition. Apologies for making you go through the thread again. Here it is again:

> >>> What they probably should have said is: R and S are said to have
> >>> overlapping meaning if it does not follow from the constraints /
> >>> dependencies that the intersection of R and S is always empty.

For the record. I think this definition is better than Erki Eesaar's, but as I will argue later on I also think that it is still too crude and that there is a better definition possible.

> It is clear from JOG's OP that renaming the attribute doesn't
> PoOdify the design - which made Darwen reject it.

>

>  From JOG's OP:
>

> OP> Darwen rejected the original POOD paper outright given that
> OP> McGovern posits that:
> OP>
> OP> R1 { X INTEGER, Y INTEGER }
> OP> R2 { A INTEGER, B INTEGER }
> OP>
> OP> violates the principle, whatever the relations' attribute names.

Interesting. I'm missing the context here, so I'm not sure about their positions, but I suspect that to some extent both are right. Darwen is right that McGoverns' definition is probably too strict, but McGovern is right that the attribute names shouldn't matter. I'll come back to this later, because I think there is a solution that might be acceptable for both. Well, acceptable for me, anyway. :-)

> >>> What they probably should have said is: R and S are said to have
> >>> overlapping meaning if it does not follow from the constraints /
> >>> dependencies that the intersection of R and S is always empty.
> >> Maybe, but it would not take away my objection.
> >> As long as R\S and S\R are allowed to be non-empty,
> >> R and S are independent, regardless of their heading.

>

> > In that case you still might have redundancy.
>
> Redundancy in what sense?

The same fact being represented in more than one place. Note the "might have" in the sentence. It could very well be that there is in fact no such redundancy, even if it does have overlapping meaning according to my definition. In that sense my definition of overlapping meaning is still too crude because it is a sufficient condition but not a necessary condition. This gets even worse if we start ignoring the attribute names.

Can this be solved with a more refined notion of "overlapping meaning" that still is defined in terms of dependencies? I think it can. For that I need a new notion for a certain restricted class of dependencies: a qualified inclusion dependency. An example of such a dependency is a constraint like "if R(a=x,b=y) and x > 5 then S(c=x,d=y)". If such a constraint holds then certain facts represented in R will also be represented in S, and so there will very probably be redundancy and update anomalies.

In general a qualified inclusion dependency is of the form "if Q1 then Q2" where Q1 is a conjunction of atoms and simple equations, Q2 is a single atom, and there is at least one atom in Q1 with the same free variables as Q2. Such an inclusion dependency is said to hold between R and S if at least one atom in Q1 that has the same free variables as Q2 concerns R and the atom in Q2 concerns S.

Our new definition of "overlapping meaning" might then be as follows: R and S are said to have overlapping meaning if there is a qualified dependency between R and S or between S and R that does not follow from the dependencies at relation level.

As you can see it also deals with the case where R and S have differently named attributes, and at the same time arguably does not see overlapping meaning where there actually is none, so it is more refined then my preceding definition. Darwen and McGovern could perhaps both be happy with this. :-)

What would really establish its correctness is a theorem that says that it characterizes exactly if there is a certain type of redundancy (defined for example a la Libkin) at schema level or not. Much like we also know is the case for 5NF. I think that's possible, although we should probably take all full, single-head dependencies into account for that and redefine the normal form accordingly, but unfortunately I don't have time right now.

> > If you decompose in to the following three,
> > R' = R/S,
> > RS' = R intersect S,
>
> RS' =  R ⋂ S     (long live Unicode!)

Mmm. I'm not sure if all clients support this (although Google groups does), or even if all relay servers relay it well. AFAIK the usenet standards do not require that all 8 bits of every byte in a message are relayed.

> > S'  = S/R,
> > then you have removed that redundancy.

>

> I can see that it is a lossless decomposition.
> What I don't see is which update anomaly is prevented by it.
> If I apply it to less abstract relations (say Brians example),
> I find myself having much more difficulty to come up with
> sensible predicates for R' , S' and especially for RS'.
> This makes me suspicious.

Yes, me too. But I strongly suspect my new definition doesn't have that problem. It's hard for me to see how there could be a qualified inclusion dependency without some corresponding real overlap in meaning.

  • Jan Hidders
Received on Mon Jan 21 2008 - 17:03:59 CET

Original text of this message