Re: Canonical DB

From: mAsterdam <>
Date: Fri, 23 Jun 2006 15:37:54 +0200
Message-ID: <449bee2f$0$31642$>

Hi Dmitry,

The posts are getting way to long for my taste, so I snipped (... and [snip]), maybe oversnipped - feel free to unsnip :-)

Dmitry A. Kazakov wrote:
> mAsterdam wrote:

>>Dmitry A. Kazakov wrote:
>>>mAsterdam wrote:
>>>> ...Which (or which types of) computations are 
>>>> easier with a navigational structure?
> The world isn't relational, it just isn't! (:-))

Heh. Agreed.

>>>... ordered structure induces some distance.

>>...The keys in the table on paper even have a distance in cm.
>>On the keys in the SQL table distance is undefined.

> Let you have some set of keys: K={k1,k2,...} (a countable set). If K is
> linearly ordered: forall k1,k2,k3 in K
> k1>k2 V k1<k2 V k1=k2
> not (k1<k2 & k2<k1)
> not (k1<k2 & k1=k2)
> k1<k2 & k2<k3 => k1<k3

Yes, that is clear.
These requirements establish K as a clean point type. Aside: With the last condition deleted one can make a circular K, having distance(k1, k2) <> distance(k2, k1).

> Then you can enumerate all K preserving the order. I.e. if k# denotes the
> ordinal number of k, then k1#<k2# <=> k1<k2.
> So you *can* define a distance on rows as
> d(r1,r2) = | key(r1)# - key(r2)# |
> (which will satisfy axioms of distance)
> It can be shown, that this apparently arbitrary distance, plays a
> sufficient role in some cases.

K is now a point type, so distance(k1, k2) is defined.

Aside: That is, indeed, assuming we have agreed on some algorithm to calculate the distance, given two points. See for instance

I'm still missing a (IMO crucial) step:
What does it mean - or how to interpret
this distance between two values in K as something in the problem space?

The reason I had to state the above in a backward way is: Normally when modeling
this isn't a problem, because at some time we introduced K into the model for
some purpose - but here we have a

partial solution wanting a problem space: the computations we are looking for.

>>Common sense (as in: I have no theory to back
>>this up) tells me that for k2-k1 to denote a
>>distance, both k1 and k2 should denote a point
>>in some space.

> Yes.

Now I do (have some theory to back it up). Thanks :-)

> The space is K, and, equivalently, the space of rows R,
> because key:R->K is a bijection.

Well, yes.

In order to have a distance between keys, the keys must be of a point type.

Now I think you are suggesting that this bijection implies that there is also a distance defined between the tuples.

I reject that. There maybe other point-typed attributes in the relation and hence several valid distances between any two tuples.
Which one is *the* distance?
The crucial step here is:
What does it mean: how to interpret
this distance between tuples as
something in the problem space?
(Again, necessarily still phrased backwards. I think that if I had a concrete problem example it would be easier to continue).

>>... To the problem space ... tree-distance is meaningless.

> As any artifact of any model is. Models are meaningless. We give them
> meaning. We could fail to give a meaning or choose a wrong one. That would
> make a model inadequate.

I'd say until there is meaning there is no model of the UOD (maybe a template, a pattern, a proto-model, but no model of the UOD). This may be a slight terminological difference, this may be important - I don't know yet.

> So if the distance immanent to our model does not
> map the problem space distance, and, at the same time, this model distance
> is directly or indirectly used in a reasoning based on the model, then we
> are in a deep trouble. Consider a model: "the skin color determines the
> intellectual level of a person." I.e. d(White, Person) --> IQ(Person). It
> is an inadequate model.

I reject this model because of its meaning.

>>It may say something about something like a
>>computational distance for a (navigational)
>>computation, but nothing at all about the UOD.

> Yes. It is our belief which does.

The modeler comes up with artifacts
and interpretations.
It is the modelers responsibility
to explore whether the UOD interpretations and the model's limitations are understood.

>>>Which might have an immense computational impact. 
>>How so? We haven't any way yet to compute problem space distance.
>>That's it. No computations to compare yet.

> A problem can be stated in terms of the problem space distances. Most of
> classification/optimization/approximation problems this or that way are.

Could you please give some problem space examples?

>>>Example: kd-trees vs. relational tables.
>>Nodes on a kd-tree *do* denote a (carefully chosen)
>>point/line/plane (k - 1 d thingy) in problem space.

> Yes. This makes the tree distance close to the Euclidean one.
>>>[Euclidean (any 3D) distance cannot be mapped to *any* linear order. It is
>>>a hard fact. So there is no key which could work. This was the thing which
>>>killed Prolog in machine learning (nobody would even try RA.)]
>>Using something completely meaningless to compute
>>something meaningful kills nothing but the idea
>>to do so in the first place. It is a bad idea (TM).
>>Nodes on a kd-tree denote a point/.. in problem
>>space, so it is a candidate navigational structure
>>for the question at hand - if we have some computations
>>to compare.
>>A kd-tree is built from a set of points.
>>We don't start from the kd-tree, so it
>>can only support /part/ of a solution.

> I feel you still have the idea: let's feed the monster with "facts" and see
> what it will come from, hmm, the other end. No need to guess, what it will
> be! (:-))

Heh. Sorry you feel that way. I just thought, hey where do all these data come from?
They must have been captured somehow.

I'd welcome ideas from other posters as to which types of computations are easier with a navigational structure, to be honest preferably a bit more tangible than kd-trees (which to me are a solution space template).

> Unfortunately, it does not work this way.

I am not surprised.

Somebody who would "feed the monster with "facts" and see what it will come from, hmm, the other end." would be disappointed, I imagine.

(Same scarequotes, yes?)

> So far it is impossible to do
> without substantial re-wiring the monster for each new problem. It is also
> a great controversy, if that were fundamentally possible to do. There are
> strong arguments against (Goedel) and for (existence of Goedel himself and
> other intelligent beings). Let's leave that to philosophers.

>>>Another advantage of graphs and other "non-linear" structures, is that you
>>>might do things which would otherwise illegal. For example,
>>>self-referential structures. Self-containment is illegal, but a thing which
>>>behaves as if it contained itself is possible through indirection. There
>>>are many cases where you have to use such models, because other models
>>>would be complex/inadequate/illegal. 
>>While I agree that recursion can be very convenient in order
>>to reduce complexity and to enhance adequacy, I don't see
>>how it makes possible (or legal) what otherwise isn't.

> Indirection does. You cannot put a set into itself because that would be
> illegal. But you can put some other object into it, which resembles the
> enclosing set. It is almost the set. To put it differently, you more the
> recursion from the language to the objects. You talk about recursion
> instead of recursively talking.

Sorry, I do not understand this. Is it crucial to the topic?


>>One candidate type/class of computations is:
>>	those computations that benefit
>>	from a representation of the problem space
>>	as (nodes on) a kd-tree:
>>	with distance mappings in a kd-tree
>>	there is a definable computational
>>	distance, which relates to the problem
>>	space distance in a definable way.
>>Am I understanding correctly?

> Yes. "Navigational" presumes existence of some context. Your actions depend
> on it. It is the focus of attention. You value things depending on how
> close are they to the focus. This allows you to reduce resources needed, by
> ignoring anything out of context. It also allows using adaptive techniques,
> i.e. to vary your strategy from point to point or/and from time to time.
> But it also may lead to aberrations, dead ends etc, which are quite common
> for "local" methods as opposed to "global" ones.
> I think that local vs. global is a better dichotomy than navigational vs.
> not. We could find it everywhere.

It is another dichotomy.
local vs. global is better for what?
ISTM local vs. global is within the navigational realm, so inappropriate for comparisons of computations with navigational vs. relational structures.

Aside: I replaced 'not' by relational.
Is non-navigational synonymous to relational? Some are reluctant to even consider that. Received on Fri Jun 23 2006 - 15:37:54 CEST

Original text of this message