Re: Canonical DB

From: Dmitry A. Kazakov <>
Date: Fri, 23 Jun 2006 21:59:52 +0200
Message-ID: <>

On Fri, 23 Jun 2006 15:37:54 +0200, mAsterdam wrote:

> The posts are getting way to long for my taste,
> so I snipped (... and [snip]), maybe oversnipped
> - feel free to unsnip :-)

It is OK (nobody reads us anyway (:-)), so we can silently omit things on which we agree.  

> These requirements establish K as a clean point type.
> Aside: With the last condition deleted one can make a
> circular K, having distance(k1, k2) <> distance(k2, k1).

That won't be formally a distance, which is required to be symmetric.

>> Then you can enumerate all K preserving the order. I.e. if k# denotes the
>> ordinal number of k, then k1#<k2# <=> k1<k2.
>> So you *can* define a distance on rows as
>> d(r1,r2) = | key(r1)# - key(r2)# |
>> (which will satisfy axioms of distance)
>> It can be shown, that this apparently arbitrary distance, plays a
>> sufficient role in some cases.

> K is now a point type, so distance(k1, k2) is defined.
> Aside: That is, indeed, assuming we have
> agreed on some algorithm to calculate the
> distance, given two points. See for instance

L1. Yes, different distances (metrics) have a huge impact on the methods based on them. Some "natural" metrics (like C-norm) are algorithmically very difficult.

> I'm still missing a (IMO crucial) step:
> What does it mean - or how to interpret
> this distance between two values in K as
> something in the problem space?

There is often a very natural interpretation. For example, least squares based on C2 (Euclidean) are reasonable in many technical domains, because squares of velocity, current, potential characterize energy, and there is a principle of minimal work.

> The reason I had to state the above in a
> backward way is: Normally when modeling
> this isn't a problem, because at some time
> we introduced K into the model for
> some purpose - but here we have a
> partial solution wanting a problem space:
> the computations we are looking for.

But the question is rightful. It is one addressed to the model. When you use some methods which are traced down to metrical spaces, for example, a regression analysis, then the meaning of a regression is *derived* form the meaning of the metric. If the metric does not map or has no meaning, the very method is illegal and rubbish.  

>> The space is K, and, equivalently, the space of rows R, 
>> because key:R->K is a bijection.

> Well, yes.
> In order to have a distance between keys,
> the keys must be of a point type.
> Now I think you are suggesting that this
> bijection implies that there is also a distance
> defined between the tuples.
> I reject that. There maybe other point-typed
> attributes in the relation and hence several valid
> distances between any two tuples.
> Which one is *the* distance?

There is no *the* distance. Even in the problem space there could be many. Consider the Manhattan distance, when a crossing is closed for cars, but open for pedestrians.

On the other hand, if you use a metrical method then *the* distance is one the method uses. For example, linear regression uses L2.

> The crucial step here is:
> What does it mean: how to interpret
> this distance between tuples as
> something in the problem space?


> (Again, necessarily still phrased backwards.
> I think that if I had a concrete problem
> example it would be easier to continue).

Imagine that the tuple is (X,Y) where X are streets and Y are avenues. Your problem space has Manhattan distance. Linear sorting this table fundamentally cannot help you to get out of O(n) pitfall, because whatever order on the tuples you define it will not be a distance in 2D. The fact that (X,Y) are indeed facts (no pun), does not help us here.

>>>>Which might have an immense computational impact. 
>>>How so? We haven't any way yet to compute problem space distance.
>>>That's it. No computations to compare yet.
>> A problem can be stated in terms of the problem space distances. Most of
>> classification/optimization/approximation problems this or that way are.

> Could you please give some problem space examples?

Calculate sine (approximation in C-norm) Nearest cafe (Manhattan distance)
Is this car good? (some multidimensional space of attributes) Route a road (approximation in some class of functions [curves]) Filter anything


>> Yes. "Navigational" presumes existence of some context. Your actions depend
>> on it. It is the focus of attention. You value things depending on how
>> close are they to the focus. This allows you to reduce resources needed, by
>> ignoring anything out of context. It also allows using adaptive techniques,
>> i.e. to vary your strategy from point to point or/and from time to time.
>> But it also may lead to aberrations, dead ends etc, which are quite common
>> for "local" methods as opposed to "global" ones.
>> I think that local vs. global is a better dichotomy than navigational vs.
>> not. We could find it everywhere.

> It is another dichotomy.
> local vs. global is better for what?

To express the meta-ideas behind relational and what some RM guys call "ad-hoc b**t."

> ISTM local vs. global is within the navigational realm, so
> inappropriate for comparisons of computations with navigational
> vs. relational structures.

I didn't meant scopes of variables. Though they are a case. I meant global in a more general context: scope, effect, visibility. Many of the operations in RA are global, like join. They involve complete tables. So RA tend to be global. But global is not always better than local. How often you lift your car? Is opening a car door bad, even if does not involve all atoms of the car? Yet when you drive the car, you may well wish that nothing of the car will left on the road behind...

> Aside: I replaced 'not' by relational.
> Is non-navigational synonymous to relational?
> Some are reluctant to even consider that.

Hmm, I can't tell.

Dmitry A. Kazakov
Received on Fri Jun 23 2006 - 21:59:52 CEST

Original text of this message