# Re: Principle of Orthogonal Design

Date: Sun, 27 Jan 2008 04:20:54 -0800 (PST)

Message-ID: <c4a59c20-5686-4f4a-aa43-512cdee1df64_at_d70g2000hsb.googlegroups.com>

mAsterdam schreef:

> This is all way over my head. Jan, you may be used to

*> sailing these waters; I am not. I think it is interesting,
**> but very time-consuming.
**> Am I talking rubbish? I try not to, but I don't know - nobody
**> else chimes in. You keep replying, but hey, you are a nice guy :-)
**>
**> I'll make some bold statements here anyway, but I can't keep this up.
*

Your non-bold statements are also be highly appreciated. :-)

Too keep things manageable I'll try to trim the discussion heavily and keep what I think is really essential. So if I snip any of your comments it is not because I don't think they are interesting, but just that at the moment I think they are not at the core of the issue.

> Jan Hidders wrote:

*> > mAsterdam wrote:
**> >> Jan Hidders wrote:
**> >>> ... I wanted to start simple ...
**> >> Even though this suggests otherwise, I'll risk assuming
**> >> that you understood the coffee analogy, and snipped it
**> >> for focus.
**> >
**> > I *think* I understood it, but I didn't want to get into a debate
**> > about it's appropriateness. My position is that in this case we don't
**> > really know precisely what this "coffee" thing is anyway. It is a
**> > rather vague and informal notion, so we can take the liberty to
**> > redefined in a way that seems to most practical for the time being...
**>
**> Informal as far as most of its nature is concerned, sure.
**> Its existence, however, is neither vague nor informal.
**> Without it, the coffee-machine loses its purpose.
**> It is because of this, that pragmatical redefinitions
**> must indeed be temporary and tracked.
*

Sure, we agree on that. It's just that I think it is sufficient to say at the beginning something like "for this discussion we are going to define 'overlapping meaning' as .." and you want to be more careful and really use another word. As I already said, I'm fine with that, so I don't think more discussion is needed here.

[... very big and painful snip ...]

> DEFINITION: Two relations R and S are said to have dependency-induced

*> overlap if there is a dependency that requires that sometimes some
**> tuples(1) of R are also in S.
**>
**> (1) for some definition of tuples that allows restricted
**> reshuffling of its values. To do.
*

The only way to achieve (1) so that it also takes all normal inclusion dependencies into account is to define tuples as something equivalent to bags of values. Such an operation on its internal organs is going to change the relational model beyond recognition. I'm going to strongly insist that we stick to the classical definitions of the named perspective and state in the definition that we are talking about inclusion up to relabeling:

DEFINITION: Two relations R and S are said to have dependency-induced overlap if there is a dependency that requires that sometimes some tuples of R are also in S after renaming the attribute names.

[... another big and painful snip ...]

> >>> ...additional terminology:

*> >>> DEFINITION: A join dependency is said to be a proper if it does not
**> >>> hold that any of its components is a subset of another component.
**> >> Proper is a nicer than non-trivial.
**> > Note that it's semantically different. Not all proper join
**> > dependencies are non-trivial, and not all non-trivial join
**> > dependencies are proper.
**>
**> Thanks, I have to read more carefully.
**>
**> Trivial dependencies are the ones implied by the
**> candidate keys.
*

Not precisely. Trivial dependencies are those that hold in any schema. For example the FK a,b -> a is always true as long as you have attributes 'a' and 'b' in the header. Another example is the JD [ {a,b}, {b,c}, {a,b,c} ] if the attributes in the header are {a,b,c}. This is always a lossless decomposition, but also a very redundant one. Another is the JD [ {a,b,c} ].

> You include some of those in the problem space

*> while leaving some non-trivial dependencies out based on wether
**> any of its components is a subset of another component?
**>
**> Let's see. This is because we are dealing with two relations
**> S and R, and we have to make sure that we get all
**> dependency-overlap, including the overlap of the trivial JD's.
*

Yes. The intuition is that if there is a JD the relation is in fact a combination of one or more predicates namely the ones that correspond to each component of the JD. It is for these predicates that we want to check overlap. We want to explicitly include the case where it is actually just one, so we also want to consider the trivial JD [ {a, b, c} ] for the header {a, b, c}.

In that terminology you can also perhaps understand why we only care about proper JDs. Suppose we have JD [ {a, b}, {b, c} ]. Then it follows that the following JD also holds: JD [ {a}, {a, b}, {b, c} ]. I think you'll understand that if the first defines a lossless decomposition, then so does the second for the simple reason that it contains at least as much information as the first. But it is not true that the component {a} also necessarily corresponds to some meaningful predicate. In fact, it probably doesn't. So that is why we want to exclude those.

Having said that, there are actually more JD's that need to be excluded than I initially thought. For example, if the JD [ {a, b}, {b, c} ] holds then also the JD [ {a, c}, {a, b}, {b, c} ] holds. Also here there is not necessarily a meaningful intuitive predicate associated with the component {a, c}. So, at the risk of causing you a serious migraine I'm going to have to tweak my definitions a little.

DEFINITION: A join dependency is said to be minimal if there is not another join dependency with a set of components that is a proper subset of the set of components of the first.

Examples: If JD [ {a, b}, {b, c} ] holds then JD [ {a, b}, {b, c}, {a, c} ] is not minimal. The JD [ {a}, {a, b} ] is never minimal because the trivial JD [ {a, b} ] always holds.

Note that a minimal join dependency is always also a proper join dependency.

The new rule then becomes:

DEFINITION: A schema is said to violate POOD if it contains relations R and S such that a component C of a minimal join dependency of R overlaps in meaning with a component D of a minimal join dependency of S, and either R and S are different, C and D are different, or both.

I just realized by rereading your comments that there might be some confusion on the notion of "component". From now on I'll strictly use the word component for a set of attribute names in a join dependency. So I would like to make the wording in the rule a bit more explicit: (changed parts are emphasized with "_")

DEFINITION: A schema is said to violate POOD if it contains relations R and S such that _the projection of R_ on a component C of a minimal join dependency of R overlaps in meaning with _the projection of S_ on a component D of a minimal join dependency of S, and either R and S are different, C and D are different, or both.

> >> It looks to me that in order to have this problem,

*> >> we need to have two relations R and S, where 5NF is
**> >> relevant in both, with isomorphy between components
**> >> of R and components of S
**> >> (which is quite imaginable when integrating databases
**> >> in the same business domain, e.g. after a merger, or
**> >> with similar but distinct types of complex contracts).
**> >> Am I correct thusfar?
**> >
**> > Roughly. The "5NF is relevant" is a bit misleading. All that is
**> > required is that there are proper join dependencies, but these are
**> > always present. For example, any relation R(a,b,c) will have always
**> > the trivial join dependency JD [{a,b,c}], i.e., the join dependency
**> > with one component that contains all attribute names in the header.
**> > Although trivial, it is also proper.
**>
**> Yep, I think I get that.
**>
**> > So you can violate the rule, even
**> > if there is no non-trivial join dependency. A simple example would be
**> > as follows: R(a,b) and S(c,d) with ID R[a,b] -> S[c,d], It violates
**> > the rule because you also have for S the trivial and proper JD
**> > [{c,d}]. So do you think that there is redundancy here that can be
**> > removed?
**> >
**> > Btw. the "with isomorphism between components" can be simplified to
**> > "with components of equal size". After all, components are just sets
**> > of attribute names, so they have equal size iff they are isomorphic.
**>
**> Now I'm lost again.
*

My fault. I thought you were talking about components of JDs.

> >> I don't have much experience with reasoning about 5NF.

*> >
**> > I actually don't think that is necessary. Whether the relations are in
**> > 5NF or not is not really irrelevant. But you understand what lossless
**> > decompositions are, no? And I assume you also understand how they are
**> > implied by functional dependencies.
**>
**> Up to BCNF.
*

Good. Let's for the moment restrict ourselves to just CKs. So say we have a relation R(a,b,c,d) with a CK {a,b}. What are the minimal join dependencies that hold?

And what if we have the CKs {a,b} and {b,c}?

So do you see for which projections we need to check overlap with other projections of other relations?

- Jan Hidders