Re: Principle of Orthogonal Design

From: Jan Hidders <hidders_at_gmail.com>
Date: Fri, 25 Jan 2008 08:25:44 -0800 (PST)
Message-ID: <8e2c55d4-2a37-4ce8-9f49-37d4617ebe07_at_x69g2000hsx.googlegroups.com>


On 25 jan, 09:18, mAsterdam <mAster..._at_vrijdag.org> wrote:
> Jan Hidders wrote:
> > ... I wanted to start simple ...
>
> Even though this suggests otherwise, I'll risk assuming
> that you understood the coffee analogy, and snipped it
> for focus.

I *think* I understood it, but I didn't want to get into a debate about it's appropriateness. My position is that in this case we don't really know precisely what this "coffee" thing is anyway. It is a rather vague and informal notion, so we can take the liberty to redefined in a way that seems to most practical for the time being. What matters is if we end up with a normal form that makes sense, not if we have precisely established what meaning overlap is. The latter is more a philosophical question anyway. But if this confuses you, then I have no problem with calling it something else.

> Let's do some coffee-machine refactoring,..
>
> > DEFINITION: Two relations R and S are said to have overlap in meaning
> > if there is a dependency that requires that sometimes some tuples of R
> > are also in S after renaming the attribute names, or vice versa.
>
> ..and start simple ;-)
>
> 1. The 'renaming etc...' is only there to exclude trivialities,
> caused by a clumsy (at least for this coffee-machine)
> definition of tuple-types, clumsy because it includes attribute names.
> We'll have a definiton without attribute names (to be formulated
> later if necessary).

No. I'm sorry, but that is not acceptable. Not allowing renaming in the tuples changes the properties of the normal form. If you want we can change from the named to the unnamed perspective where tuples are ordered unlabeled tuples, but then you will need to replace the end with "after reordering the elements of the tuple", so not much would be gained. Note that the definition is talking about tuples, not tuple types. Redefining the notion of tuple type would nog change anything. Redefining the notion of tuple would, but as I already said, then there is not much simplification.

Btw. not allowing relabeling / reordering in inclusion dependencies provably restricts their expressive power in ways that cannot be solved by picking better attribute names in your schema. So I would not say that these are "trivialities that are caused by a clumsy definition of tuple types".

> 2. It is not at all clear a priori what if anything this
> has to with meaning. To avoid distraction by connotation
> we'll substitute the definiendum by one with less connotation,
> and see if we can substitute it back when the conclusions are
> clear.
> My first hunch was 'symbolic redundancy' (thinking of Neo) but
> that would be loaded as well. It is the chaff to be filtered out
> by the PoOD, so PoOD-chaff.

Not *all* overlap has to be removed, so I wouldn't always want to call it chaff. It would also be nicer if the name reflected that there is some kind of overlap or that the relations partially coincide and that this is caused by dependencies. Could we settle on "two relations have dependency-induced overlap" for the time being?

> 1. 'renaming etc...' -- out
> 2. PoOD-chaff
>
> DEFINITION: Two relations R and S are said to have PoOD-chaff
> if there is a dependency that requires that sometimes some
> tuples of R are also in S.
>
> > It's a bit hand-wavy for my taste, because it doesn't define the type
> > of dependencies but it will probably do for the moment.
>
> Ok.
>
> Note: The isomorphy of R and S is implicit. The tuples
> can be in both R and S so R-tuples and S-tuples have to be
> of the same type.

You mean that the headers are isomorphic, not the relations themselves, right? But what do you mean by isomorphic? Can I replace attribute names or domains or both? Actually, now that I think about it, whatever your definition, this is only true if you assume that different domains are always disjoint.

> > ...additional terminology:
>
> > DEFINITION: A join dependency is said to be a proper if it does not
> > hold that any of its components is a subset of another component.
>
> Proper is a nicer than non-trivial.

Note that it's semantically different. Not all proper join dependencies are non-trivial, and not all non-trivial join dependencies are proper.

> > Now the rule:
>
> > DEFINITION: A schema is said to violate POOD if it contains relations
> > R and S such that a component C of a proper join dependency of R
> > overlaps in meaning with a component D of a proper join dependency of
> > S, and either R and S are different, C and D are different, or both.
>
> >>> So far so good?
> >> Does PoODling flush bathwater, baby, or part of both?
> >> The jury isn't even out yet.
>
> > I suspect with the new POOD the baby is quite safe. :-)
>
> DEFINITION: A schema is said to violate POOD if it contains relations
> R and S such that a component C of a proper join dependency of R
> has PoOD-chaff with a component D of a proper join dependency of
> S, and either R and S are different, C and D are different, or both.
>
> Now let's see if there is meaning overlap, that is, is it ok to
> s/PoOD-chaff/meaning overlap/ back? To check that, I'll
> first have to make sure I understand the definition.

Ok.

> It looks to me that in order to have this problem,
> we need to have two relations R and S, where 5NF is
> relevant in both, with isomorphy between components
> of R and components of S
> (which is quite imaginable when integrating databases
> in the same business domain, e.g. after a merger, or
> with similar but distinct types of complex contracts).
> Am I correct thusfar?

Roughly. The "5NF is relevant" is a bit misleading. All that is required is that there are proper join dependencies, but these are always present. For example, any relation R(a,b,c) will have always the trivial join dependency JD [{a,b,c}], i.e., the join dependency with one component that contains all attribute names in the header. Although trivial, it is also proper. So you can violate the rule, even if there is no non-trivial join dependency. A simple example would be as follows: R(a,b) and S(c,d) with ID R[a,b] -> S[c,d], It violates the rule because you also have for S the trivial and proper JD [{c,d}]. So do you think that there is redundancy here that can be removed?

Btw. the "with isomorphism between components" can be simplified to "with components of equal size". After all, components are just sets of attribute names, so they have equal size iff they are isomorphic.

> I don't have much experience with reasoning about 5NF.

I actually don't think that is necessary. Whether the relations are in 5NF or not is not really irrelevant. But you understand what lossless decompositions are, no? And I assume you also understand how they are implied by functional dependencies. So then you might start first with considering cases where all the JDs are implied by FDs. For example, say we have R(a,b,c) and S(d,e,f) with POOD chaff / dependency-induced overlap between R[a,b] and S[d,e], for example because we have ID R[a,b] -> S[d,e]. We would then violate the rule if there is for R a JD [{a,b}, {a,c}] and for S a JD [{d,e}, {d,f}]. These JDs are implied if we have for R the FD a -> b,c and for S the FD d -> e,f. So, in that case, do you think that there is redundancy that can be removed?

  • Jan Hidders
Received on Fri Jan 25 2008 - 17:25:44 CET

Original text of this message