Re: Principle of Orthogonal Design
Date: Mon, 21 Jan 2008 08:03:59 -0800 (PST)
Message-ID: <61940c72-ee9a-452e-8e00-117fc86f0505_at_k39g2000hsf.googlegroups.com>
On 21 jan, 00:35, mAsterdam <mAster..._at_vrijdag.org> wrote:
> Jan Hidders wrote:
> > mAsterdam wrote:
> >> Jan Hidders wrote:
> >>> mAsterdam wrote:
> >>>> Jan Hidders wrote:
> >>>>> mAsterdam wrote:
> >>>>>> Jan Hidders wrote:
> >>>> ...
> >>>>>>> Anyway, the stronger POOD that requires that headers are
> >>>>>>> distinct sounds like nonsense to me. Why would R(A, B)
> >>>>>>> be a worse design than R(R_A, R_B)?
> >>>>>>> The weaker POOD looks more interesting to me.
> >>>>>>> I even found a published paper about it:
> >>>>>>>http://www.wseas.us/e-library/conferences/2006madrid/papers/512-451.pdf
> >>>>>> Unfortunately the link times out.
> >>>>> Hmm, not for me. But to help you out:
> >>>>>http://www.adrem.ua.ac.be/bibrem/pubs/pood.pdf
> >>>> Thank you for helping me out by providing the document.
> >>>> I started reading "Extended Principle of
> >>>> Orthogonal Design" by Erki Eessaar
> >> below: EE
> >>>> User defined datatypes (UDT) give the designer of a database
> >>>> more decisions, more room for wrong decisions.
> >>>> Row-, array-, reference-, multiset collection types
> >>>> - choices, choices, choices.
> >>>> How to deal with that freedom?
> >>>> Enter the Principle of Orthogonal Design (POD):
> >>>> If you have a tuple-type, make sure to have only one base
> >>>> relvar for recording that tuple-type.
> >>> Hmm, I would call that the strong POOD (or strong POD), and
> >>> that, I would agree, seems to make no sense.
> >> We agree on that, that is clear.
> >> So ok, but - from the huge category of easily asked but
> >> hard to answer questions: - why?
>
> >> Why doesn't the (strong) PoOD make sense to you?
>> > it by renaming R(a,b) to R(r_a,r_b).
> > It is at the same time too strong and too weak. It is too
> > strong because it forbids cases where there is in fact no
> > redundancy, and at the same time allows cases where there
> > is redundancy. Moreover, you can always trivially satisfy
> > How exactly does that improve the design?
>
> Not. But just renaming the attributes does not change the
> tuple-type (which I earlier referred to as signature of
> the relation), so, while it indeed does not improve the
> design, it also has no bearing on satisfying PoOD as
> I understand it from EE.
> I could not find your definition, BTW.
> >>> What they probably should have said is: R and S are said to have
> >>> overlapping meaning if it does not follow from the constraints /
> >>> dependencies that the intersection of R and S is always empty.
> It is clear from JOG's OP that renaming the attribute doesn't
> PoOdify the design - which made Darwen reject it.
>
> From JOG's OP:
>> OP>
> OP> Darwen rejected the original POOD paper outright given that
> OP> McGovern posits that:
> OP>
> OP> R1 { X INTEGER, Y INTEGER }
> OP> R2 { A INTEGER, B INTEGER }
> OP> violates the principle, whatever the relations' attribute names.
Interesting. I'm missing the context here, so I'm not sure about their positions, but I suspect that to some extent both are right. Darwen is right that McGoverns' definition is probably too strict, but McGovern is right that the attribute names shouldn't matter. I'll come back to this later, because I think there is a solution that might be acceptable for both. Well, acceptable for me, anyway. :-)
> >>> What they probably should have said is: R and S are said to have
> >>> overlapping meaning if it does not follow from the constraints /
> >>> dependencies that the intersection of R and S is always empty.
> >> Maybe, but it would not take away my objection.
> >> As long as R\S and S\R are allowed to be non-empty,
> >> R and S are independent, regardless of their heading.
>
> > In that case you still might have redundancy.
>
> Redundancy in what sense?
The same fact being represented in more than one place. Note the
"might have" in the sentence. It could very well be that there is in
fact no such redundancy, even if it does have overlapping meaning
according to my definition. In that sense my definition of overlapping
meaning is still too crude because it is a sufficient condition but
not a necessary condition. This gets even worse if we start ignoring
the attribute names.
Can this be solved with a more refined notion of "overlapping meaning"
that still is defined in terms of dependencies? I think it can. For
that I need a new notion for a certain restricted class of
dependencies: a qualified inclusion dependency. An example of such a
dependency is a constraint like "if R(a=x,b=y) and x > 5 then
S(c=x,d=y)". If such a constraint holds then certain facts represented
in R will also be represented in S, and so there will very probably be
In general a qualified inclusion dependency is of the form "if Q1 then Q2" where Q1 is a conjunction of atoms and simple equations, Q2 is a single atom, and there is at least one atom in Q1 with the same free variables as Q2. Such an inclusion dependency is said to hold between R and S if at least one atom in Q1 that has the same free variables as Q2 concerns R and the atom in Q2 concerns S.
As you can see it also deals with the case where R and S have differently named attributes, and at the same time arguably does not see overlapping meaning where there actually is none, so it is more refined then my preceding definition. Darwen and McGovern could perhaps both be happy with this. :-)
What would really establish its correctness is a theorem that says that it characterizes exactly if there is a certain type of redundancy (defined for example a la Libkin) at schema level or not. Much like we also know is the case for 5NF. I think that's possible, although we should probably take all full, single-head dependencies into account for that and redefine the normal form accordingly, but unfortunately I don't have time right now.
> > If you decompose in to the following three,
> > R' = R/S,
> > RS' = R intersect S,
>
> RS' = R ⋂ S (long live Unicode!)
> > S' = S/R,
> > then you have removed that redundancy.
>
> I can see that it is a lossless decomposition.
> What I don't see is which update anomaly is prevented by it.
> If I apply it to less abstract relations (say Brians example),
> I find myself having much more difficulty to come up with
> sensible predicates for R' , S' and especially for RS'.
> This makes me suspicious.
- Jan Hidders