Re: Principle of Orthogonal Design

From: mAsterdam <mAsterdam_at_vrijdag.org>
Date: Sat, 26 Jan 2008 22:06:00 +0100
Message-ID: <479b9fbd$0$85794$e4fe514c_at_news.xs4all.nl>


This is all way over my head. Jan, you may be used to sailing these waters; I am not. I think it is interesting, but very time-consuming.
Am I talking rubbish? I try not to, but I don't know - nobody else chimes in. You keep replying, but hey, you are a nice guy :-)

I'll make some bold statements here anyway, but I can't keep this up.

Jan Hidders wrote:
> mAsterdam wrote:

>> Jan Hidders wrote:
>>> ... I wanted to start simple ...
>> Even though this suggests otherwise, I'll risk assuming
>> that you understood the coffee analogy, and snipped it
>> for focus.

>
> I *think* I understood it, but I didn't want to get into a debate
> about it's appropriateness. My position is that in this case we don't
> really know precisely what this "coffee" thing is anyway. It is a
> rather vague and informal notion, so we can take the liberty to
> redefined in a way that seems to most practical for the time being...

Informal as far as most of its nature is concerned, sure. Its existence, however, is neither vague nor informal. Without it, the coffee-machine loses its purpose. It is because of this, that pragmatical redefinitions must indeed be temporary and tracked.

 From D&M's draft, ch 5 'The question of meaning', ( http://www.dbdebunk.com/page/page/622331.htm ): "... every table, be it a base table, a view, a query result, or whatever certainly does have an associated meaning. And, of course, users must be aware of those meanings if they are to use the database correctly and effectively."

>> Let's do some coffee-machine refactoring,..
>>
>>> DEFINITION: Two relations R and S are said to have overlap in meaning
>>> if there is a dependency that requires that sometimes some tuples of R
>>> are also in S after renaming the attribute names, or vice versa.
>> ..and start simple ;-)
>>
>> 1. The 'renaming etc...' is only there to exclude trivialities,
>> caused by a clumsy (at least for this coffee-machine)
>> definition of tuple-types, clumsy because it includes attribute names.
>> We'll have a definiton without attribute names (to be formulated
>> later if necessary).

>
> No. I'm sorry, but that is not acceptable. Not allowing renaming in
> the tuples changes the properties of the normal form. If you want we
> can change from the named to the unnamed perspective where tuples are
> ordered unlabeled tuples, but then you will need to replace the end
> with "after reordering the elements of the tuple", so not much would
> be gained.

Wether the relation between heading and tuples goes via names or ordering is relevant or not. If it is not I want it out of scope. Remarks by others make me doubt wether it is relevant or not.

Very early in this thread,



JOG wrote:
> DM Unseen wrote:
>> My understanding is that the debate centers around typing.
>>
>> Date advocates strong typing: ie X is of type "shop number" and A is a
>> "roll number", both represent different aspects of the UOD so they
>> have a different type (I disregard subtyping here for simplicity),
>> *even if* the actual underlying representations (e.g. natural numbers)
>> would be the same. Having different types, means the type engine can
>> distinguish them.

>
> Hmmm, I see. I have always viewed a type as a set of values and a
> collection of allowable operations. If there exist two data types
> where these values and operations are the same, well as far as I can
> logically see, that makes them the same thing (even if they have two
> references).
>
> Now I could follow the concept more if Date & McGovern weren't
> conflating the concept of attribute and type, and defined an attribute
> as a (role-name, type) pair, but as it stands is seems very odd to me.

Jan Hidders wrote:
> Note that the definition is talking about tuples, not tuple
> types. Redefining the notion of tuple type would nog change anything.

That is simply not true.
It just would not change the /wording/ of the definition. Redefining the notion of tuple type may change the notion of tuple.

> Redefining the notion of tuple would, but as I already said, then
> there is not much simplification.
>
> Btw. not allowing relabeling / reordering in inclusion dependencies
> provably restricts their expressive power in ways that cannot be
> solved by picking better attribute names in your schema. So I would
> not say that these are "trivialities that are caused by a clumsy
> definition of tuple types".

And the remedy is to include the attribute names in the tuple-types? So it seems. D&M draft: "...type-compatibility as we define it requires the two tables to have identical column names." I am trying to be careful about the side-effects. The example below shows that they may not be nice.

>> 2. It is not at all clear a priori what if anything this
>> has to with meaning. To avoid distraction by connotation
>> we'll substitute the definiendum by one with less connotation,
>> and see if we can substitute it back when the conclusions are
>> clear.
>> My first hunch was 'symbolic redundancy' (thinking of Neo) but
>> that would be loaded as well. It is the chaff to be filtered out
>> by the PoOD, so PoOD-chaff.

>
> Not *all* overlap has to be removed, so I wouldn't always want to call
> it chaff. It would also be nicer if the name reflected that there is
> some kind of overlap or that the relations partially coincide and that
> this is caused by dependencies. Could we settle on "two relations have
> dependency-induced overlap" for the time being? ...

Ok.

DEFINITION: Two relations R and S are said to have dependency-induced overlap if there is a dependency that requires that sometimes some tuples(1) of R are also in S.

(1) for some definition of tuples that allows restricted reshuffling of its values. To do.

>> Note: The isomorphy of R and S is implicit. The tuples
>> can be in both R and S so R-tuples and S-tuples have to be
>> of the same type.

>
> You mean that the headers are isomorphic, not the relations
> themselves, right?

No, that is not what I meant. In the vocabulary I am more or less used to: when the headers are isomorph, the relations are. Iff, even.

> But what do you mean by isomorphic?
> Can I replace attribute names or domains or both?

Attribute name replacement is irrelevant in this.

Maybe an example helps.

Three relations, BOM, MOB and WIRE.

+---------------------------------------------+
|Name: BOM                                    |
|Predicate: Part PART# is part of IS_PART_OF# |
|Key: (PART#, IS_PART_OF#)                    |
+---------------------------------------------+
|Domain:    |PTNUM  | PTNUM                   |
|Attribute: |PART#  | IS_PART_OF#             |
+---------------------------------------------+
|Values:        2        1                    |
|               3        1                    |
|               4        1                    |
|               4        5                    |
+---------------------------------------------+

MOB is a view on BOM with the attributes switched.

+--------------------------------------------------------+
|Name: WIRES                                             |
|Predicate: We have a wire to connect L_PART# to R_PART# |
|Key: (L_PART#, R_PART#)                                 |
+--------------------------------------------------------+
|Domain:    |PTNUM    | PTNUM                            |
|Attribute: |L_PART#  | R_PART#                          |
+--------------------------------------------------------+
|Values:        2        3                               |
|               3        2                               |
|               4        3                               |
|               4        5                               |
+--------------------------------------------------------+

BOM and WIRES are isomorph,
BOM and MOB are isomorph,
WIRES and MOB are isomorph.
The attributenames are irrelevant to this. Their isomorphism is determined by
their tupletypes.
All tuples of BOM, MOB and WIRES are of
type (PTNUM, PTNUM).

This seems pretty clear and straightfoward to me, but it is not compliant with D&M's
"...type-compatibility as we define it requires the two tables to have identical column names."
Making it compliant I'll lose clarity for what? For /maybe/ a feature that can determine which tuples to touch based on their type alone.

> Actually, now that I think about
> it, whatever your definition, this is only true if you assume that
> different domains are always disjoint.

How so?

>>> ...additional terminology:
>>> DEFINITION: A join dependency is said to be a proper if it does not
>>> hold that any of its components is a subset of another component.
>> Proper is a nicer than non-trivial.

> Note that it's semantically different. Not all proper join
> dependencies are non-trivial, and not all non-trivial join
> dependencies are proper.

Thanks, I have to read more carefully.

Trivial dependencies are the ones implied by the candidate keys. You include some of those in the problem space while leaving some non-trivial dependencies out based on wether any of its components is a subset of another component?

Let's see. This is because we are dealing with two relations S and R, and we have to make sure that we get all dependency-overlap, including the overlap of the trivial JD's.

Ok. I missed that.

>>> Now the rule:
>>> DEFINITION: A schema is said to violate POOD if it contains relations
>>> R and S such that a component C of a proper join dependency of R
>>> overlaps in meaning with a component D of a proper join dependency of
>>> S, and either R and S are different, C and D are different, or both.
>>>>> So far so good?
>>>> Does PoODling flush bathwater, baby, or part of both?
>>>> The jury isn't even out yet.
>>> I suspect with the new POOD the baby is quite safe. :-)
>> DEFINITION: A schema is said to violate POOD if it contains relations
>> R and S such that a component C of a proper join dependency of R
>> has PoOD-chaff with a component D of a proper join dependency of
>> S, and either R and S are different, C and D are different, or both.
>>
>> Now let's see if there is meaning overlap, that is, is it ok to
>> s/PoOD-chaff/meaning overlap/ back? To check that, I'll
>> first have to make sure I understand the definition.

>
> Ok.
>
>> It looks to me that in order to have this problem,
>> we need to have two relations R and S, where 5NF is
>> relevant in both, with isomorphy between components
>> of R and components of S
>> (which is quite imaginable when integrating databases
>> in the same business domain, e.g. after a merger, or
>> with similar but distinct types of complex contracts).
>> Am I correct thusfar?

>
> Roughly. The "5NF is relevant" is a bit misleading. All that is
> required is that there are proper join dependencies, but these are
> always present. For example, any relation R(a,b,c) will have always
> the trivial join dependency JD [{a,b,c}], i.e., the join dependency
> with one component that contains all attribute names in the header.
> Although trivial, it is also proper.

Yep, I think I get that.

> So you can violate the rule, even
> if there is no non-trivial join dependency. A simple example would be
> as follows: R(a,b) and S(c,d) with ID R[a,b] -> S[c,d], It violates
> the rule because you also have for S the trivial and proper JD
> [{c,d}]. So do you think that there is redundancy here that can be
> removed?
>
> Btw. the "with isomorphism between components" can be simplified to
> "with components of equal size". After all, components are just sets
> of attribute names, so they have equal size iff they are isomorphic.

Now I'm lost again.

>> I don't have much experience with reasoning about 5NF.

>
> I actually don't think that is necessary. Whether the relations are in
> 5NF or not is not really irrelevant. But you understand what lossless
> decompositions are, no? And I assume you also understand how they are
> implied by functional dependencies.

Up to BCNF.

> So then you might start first with
> considering cases where all the JDs are implied by FDs. For example,
> say we have R(a,b,c) and S(d,e,f) with POOD chaff / dependency-induced
> overlap between R[a,b] and S[d,e], for example because we have ID
> R[a,b] -> S[d,e]. We would then violate the rule if there is for R a
> JD [{a,b}, {a,c}] and for S a JD [{d,e}, {d,f}]. These JDs are implied
> if we have for R the FD a -> b,c and for S the FD d -> e,f. So, in
> that case, do you think that there is redundancy that can be removed?
Received on Sat Jan 26 2008 - 22:06:00 CET

Original text of this message