naive questions about nested intervals with Farey fractions

From: blacksqr <stephen.huntley_at_alum.mit.edu>
Date: 18 May 2007 01:20:54 -0700
Message-ID: <1179476453.983053.274960_at_o5g2000hsb.googlegroups.com>



I have been studying Tropashko's technique for encoding tree structures via nested intervals generated with Farey fractions. Although I am strictly an amateur in the field of databases and SQL, I am interested in finding efficient means for representing hierarchies in a database.

After studying the Farey fractions approach, I have some naive responses/questions:

Since the start and end point of each node's interval are each stored in a table as the numerator and denominator of a rational number, a fair amount of mathematical calculation and calling of functions is necessary to execute useful queries, such as finding descendants of a node. Doesn't execution of these calculations and functions take an undesireably long time?

Why not compute and store the rational number itself rather than the numerator and denominator? Then descendants could be found via simple boolean comparisons.

And before you say, "because using numerator and denominator lets you store more nodes within available precision limits," wouldn't it be possible to calculate the rational numbers to double precision, split the values up and store them in single precision cells, then execute queries via a chain of boolean comparisons? For example:

if the start and end points for node 1.2 are the rational numbers 3/5 and 2/3, then the first 32 bits of 3/5 can be represented as 0.6 and the second 32 bits as 0, and the first 32 bits of 2/3 can be represented as 0.6666666666 and the second 32 bits as 666666666. If these values were stored as left_1=0.6 and left_2=0, right_1=0.6666666666 right_2=666666666, then the descendants of node 1.2 could be found with a statement something like:

SELECT node WHERE (left_1>0.6 OR (left_1=0.6 AND left_2>0)) AND
(left_1<0.6666666666 OR (left_1=0.6666666666 AND left_2<666666666))

In this case we have 64 bits of precision within a 32 bit storage system, and a query that is slightly unwieldy but nevertheless requires no calculations beyond the booleans within the SELECT statement and no extra function calls or user-defined data types.

(Clearly in the example I have sacrificed any pretense of good SQL
style for hopefully brief and clear illustration of the point.)

One could add more precision by calculating and storing more 32-bit chunks of the rational numbers at the price of a few more ungainly clauses in the SELECT statement.

My naive impression is that the above approach would result in faster execution of queries compared to the series of calculations which must be done for each node in the course of the search in the technique outlined by Tropashko, without sacrifice of storage capacity.

Is this impression incorrect? If not, what is the advantage of all the apparatus that goes along with storing the numerator and denominator separately? Received on Fri May 18 2007 - 10:20:54 CEST

Original text of this message