Re: Canonical DB

From: Dmitry A. Kazakov <mailbox_at_dmitry-kazakov.de>
Date: Fri, 23 Jun 2006 12:08:41 +0200
Message-ID: <kwk2qm82eg95.ffan9h0fm1se$.dlg_at_40tude.net>


On Fri, 23 Jun 2006 00:05:42 +0200, mAsterdam wrote:

> Dmitry A. Kazakov wrote:

>> mAsterdam wrote:
>>>Robert Martin wrote:
>> 
>>>>Things like tree searches, graph walks, etc.
>>>
>>>No, tree searches and graph walks are things
>>>you *need* to do (and specify) when all you have
>>>is navigational structures. They are part of their cost.
>>>
>>>Now where is the benefit - what are you computing:
>>>Which (or which types of) computations are easier
>>>with a navigational structure?
>>  
>> That's easy. 

>
> Thank you for your reply. I can go along with
> the direction of it as a good start.
> The nit-picks below are not to be taken as
> disrespect, I hope they help to get things sharp.
>
>> The benefit is in reducing the
>> cardinality of the problem space. 

>
> That is the benefit of modeling in general, not a
> benefit of any particular way of modeling, in this
> case modeling with navigational structures.

Sure

>> Talking about cases where usage of the "original" relational structure is
>> possible, but computationally inefficient, consider distance mappings.

>
> Ok. Why the scarequotes?

Because we are talking about models.

The world isn't relational, it just isn't! (:-))

I wished to consider cases where relational models are thinkable, but not optimal.

>> Technically each ordered structure induces some distance. 

>
> Yes.
>
>> For example, a table with ordered keys induces one 
>> between rows d(k1,k2)=k2-k1. 

>
> That depends on whether you are talking about tables drawn
> on paper (or some other media) or about 'table' as an SQL
> reserved word which (to avoid confusing implementation
> issues with modeling issues) are to be treated as relation
> variables. They have, by definition, no order.
>
> The keys in the table on paper even have a distance in cm.
> On the keys in the SQL table distance is undefined.
 

Let you have some set of keys: K={k1,k2,...} (a countable set). If K is linearly ordered: forall k1,k2,k3 in K

k1>k2 V k1<k2 V k1=k2
not (k1<k2 & k2<k1)
not (k1<k2 & k1=k2)
k1<k2 & k2<k3 => k1<k3

Then you can enumerate all K preserving the order. I.e. if k# denotes the ordinal number of k, then k1#<k2# <=> k1<k2.

So you *can* define a distance on rows as

d(r1,r2) = | key(r1)# - key(r2)# |

(which will satisfy axioms of distance)

It can be shown, that this apparently arbitrary distance, plays a sufficient role in some cases.

> Common sense (as in: I have no theory to back
> this up) tells me that for k2-k1 to denote a
> distance, both k1 and k2 should denote a point
> in some space.

Yes.

The space is K, and, equivalently, the space of rows R, because key:R->K is a bijection.

>> A tree structure does it as well, it is so-called tree distance.
>> This induced distance is obviously not the problem space distance.

>
> Indeed. To the problem space this tree-distance is meaningless.

As any artifact of any model is. Models are meaningless. We give them meaning. We could fail to give a meaning or choose a wrong one. That would make a model inadequate. So if the distance immanent to our model does not map the problem space distance, and, at the same time, this model distance is directly or indirectly used in a reasoning based on the model, then we are in a deep trouble. Consider a model: "the skin color determines the intellectual level of a person." I.e. d(White, Person) --> IQ(Person). It is an inadequate model.

> It may say something about something like a
> computational distance for a (navigational)
> computation, but nothing at all about the UOD.

Yes. It is our belief which does.

>> Which might have an immense computational impact. 

>
> How so? We haven't any way yet to compute problem space distance.
> That's it. No computations to compare yet.

A problem can be stated in terms of the problem space distances. Most of classification/optimization/approximation problems this or that way are.

>> Example: kd-trees vs. relational tables.

>
> Nodes on a kd-tree *do* denote a (carefully chosen)
> point/line/plane (k - 1 d thingy) in problem space.

Yes. This makes the tree distance close to the Euclidean one.

>> [Euclidean (any 3D) distance cannot be mapped to *any* linear order. It is
>> a hard fact. So there is no key which could work. This was the thing which
>> killed Prolog in machine learning (nobody would even try RA.)]

>
> Using something completely meaningless to compute
> something meaningful kills nothing but the idea
> to do so in the first place. It is a bad idea (TM).
>
> Nodes on a kd-tree denote a point/.. in problem
> space, so it is a candidate navigational structure
> for the question at hand - if we have some computations
> to compare.
>
> A kd-tree is built from a set of points.
> We don't start from the kd-tree, so it
> can only support /part/ of a solution.

I feel you still have the idea: let's feed the monster with "facts" and see what it will come from, hmm, the other end. No need to guess, what it will be! (:-))

Unfortunately, it does not work this way. So far it is impossible to do without substantial re-wiring the monster for each new problem. It is also a great controversy, if that were fundamentally possible to do. There are strong arguments against (Goedel) and for (existence of Goedel himself and other intelligent beings). Let's leave that to philosophers.

>> Another advantage of graphs and other "non-linear" structures, is that you
>> might do things which would otherwise illegal. For example,
>> self-referential structures. Self-containment is illegal, but a thing which
>> behaves as if it contained itself is possible through indirection. There
>> are many cases where you have to use such models, because other models
>> would be complex/inadequate/illegal. 

>
> While I agree that recursion can be very convenient in order
> to reduce complexity and to enhance adequacy, I don't see
> how it makes possible (or legal) what otherwise isn't.

Indirection does. You cannot put a set into itself because that would be illegal. But you can put some other object into it, which resembles the enclosing set. It is almost the set. To put it differently, you more the recursion from the language to the objects. You talk about recursion instead of recursively talking.

>> If you want a generalized description,
>> it is the problems where local descriptions are still "good-behaving",
>> linear, but global ones are a nightmare. Graph is locally is almost as good
>> as a table.

>
> A kd-tree is a graph, so again we have sets of nodes
> and links to maintain. However,
>
> http://en.wikipedia.org/wiki/Kd-tree:
>
> "Balancing a kd-tree requires care. Because kd-trees are sorted in
> multiple dimensions, the tree rotation technique cannot be used to
> balance them ¡X this may break the invariant."

Right

> ISTM this means that kd-trees are suitable for
> data-rendering, but not for capturing or
> maintenance of the same data these rendering
> computations build on.

They aren't good when you want to add and remove many points. Balancing is a problem even when nodes are only added and never removed. There are cases when you might get a very bad tree. Higher dimensions k>10 are also problematic.

> My question was:
>
> "Which (or which types of) computations are easier
> with a navigational structure?".
>
> So far, your answer appears to me as:
>
> One candidate type/class of computations is:
> those computations that benefit
> from a representation of the problem space
> as (nodes on) a kd-tree:
> with distance mappings in a kd-tree
> there is a definable computational
> distance, which relates to the problem
> space distance in a definable way.
>
> Am I understanding correctly?

Yes. "Navigational" presumes existence of some context. Your actions depend on it. It is the focus of attention. You value things depending on how close are they to the focus. This allows you to reduce resources needed, by ignoring anything out of context. It also allows using adaptive techniques, i.e. to vary your strategy from point to point or/and from time to time. But it also may lead to aberrations, dead ends etc, which are quite common for "local" methods as opposed to "global" ones.

I think that local vs. global is a better dichotomy than navigational vs. not. We could find it everywhere.

-- 
Regards,
Dmitry A. Kazakov
http://www.dmitry-kazakov.de
Received on Fri Jun 23 2006 - 12:08:41 CEST

Original text of this message