Re: naive questions about nested intervals with Farey fractions

From: Vadim Tropashko <vadimtro_invalid_at_yahoo.com>
Date: 18 May 2007 09:10:24 -0700
Message-ID: <1179504624.638717.32690_at_e65g2000hsc.googlegroups.com>


On May 18, 1:20 am, blacksqr <stephen.hunt..._at_alum.mit.edu> wrote:
> Since the start and end point of each node's interval are each stored
> in a table as the numerator and denominator of a rational number, a
> fair amount of mathematical calculation and calling of functions is
> necessary to execute useful queries, such as finding descendants of a
> node. Doesn't execution of these calculations and functions take an
> undesireably long time?

The Farey fractions technique has been evolved, as I gradually get more understanding what is going on. Farey fractions, continued fractions, mobius transformatins, algebra of 2x2 matrices of detrminant 1, are different facets of the same phenomenon.

Here is a query of finding all the descendants (copied and pasted from http://www.bookpool.com/sm/0977671542). First, the schema contains 4 integers, not 2

table MatrixTreeNodes (

   a11 integer,
   a12 integer,
   a21 integer,

   a22 integer
);

(The fraction a11/a12 is roughly corresponding to farey fraction in the paper you cited). First how do we find parent and child:


Find all the employees who report directly to Jones. select child.name

from MatrixTreeNodes parent, MatrixTreeNodes child where parent.name = 'Jones'
and child.a11 = parent.a12 and child.a21 = parent.a22

Who is Jones' manager?

select parent.name
from MatrixTreeNodes parent, MatrixTreeNodes child where child.name = 'Jones'
and child.a11 = parent.a12 and child.a21 = parent.a22


This might look not intuitive at first, but this is because I skipped about 10 pages of explanation.

Now the relevant snippet about finding node's descendants:



Now that we have nested intervals structure, we can move on to querying a node's descendants. As a first approximation, let's take the descendants query in terms of nested sets as a template and rewrite it in terms of nested intervals

select descendant.*
from MatrixTreeNodes descendant, MatrixTreeNodes node where descendant.a11/descendant.a21 between node.a11/node.a21

                   and (node.a11-node.a12)/(node.a21-node.a22)
and node.name = ... -- predicate uniquely identifying a node

Unfortunately, this query would produce a wrong result. None of the database vendors support rational number datatype . The ratios of integers would be silently casted into float numbers with accompanying errors due to lack of precision. We have to rewrite all the expressions with divisions within the means of integer arithmetic

select descendant.*
from MatrixTreeNodes descendant, MatrixTreeNodes node

where descendant.a11*node.a21 >= descendant.a21*node.a11
and   descendant.a11*node.a22 >= descendant.a21*node.a12
and node.name = ...  -- predicate identifying a node uniquely

When we discussed descendant query performance in the context of nested sets, we emphasized index range scan as an efficient way to extract all the descendant nodes. This idea generalizes to nested intervals, although we have to index interval boundaries. Let's enhance our tree encoding schema design with two function-based indexes:

table MatrixTreeNodes (

   a11 integer,
   a12 integer,
   a21 integer,

   a22 integer
);

CREATE INDEX idx_left ON MatrixTreeNodes(a11/a21); CREATE INDEX idx_right ON MatrixTreeNodes((a11-a12)/(a21-a22));

We have to rewrite the query in such a way that optimizer can leverage these indexes

select descendant.*
from MatrixTreeNodes descendant, MatrixTreeNodes node

where descendant.a11*node.a21 >= descendant.a21*node.a11
and   descendant.a11*node.a22 >= descendant.a21*node.a12
and   descendant.a11/descendant.a21
      between node.a11/node.a21 - 0.0000001
      and (node.a11-node.a12)/(node.a21-node.a22) + 0.0000001
and node.name = ... -- predicate uniquely identifying a node

The constant 0.0000001 is designed to compensate for floating point arithmetic rounding errors. It is essentially is the minimal supported mantissa. Please refer to your favorite database SQL manual in order to find out the exact value. This way index range scan would capture all the nodes in the interval and, possibly, some extra . Then, the (small) list of nodes is filtered with the exact condition.


There is no heavy calculations there! I hope this answers your question. Received on Fri May 18 2007 - 18:10:24 CEST

Original text of this message