Re: More pain and sufferring with Tropashko's materialized path...
Date: Mon, 20 Oct 2003 16:47:41 -0700
Message-ID: <OD_kb.30$Rt2.207_at_news.oracle.com>
VC wrote:
> Would you be kind enough to provide more detail about the new dense
Nested Intervals Model (http://www.dbazine.com/tropashko4.html) labeled
each node with binary rational number, which is essentially a pair of
integer numbers. Given that a set of all pairs of integer numbers can be
> encoding schema ?
We'll explore a different approach. We employ a triangular navigation method which is similar to the enumeration mentioned in the previous paragraph, but we not refer to numerator and denominator values explicitly.
We start with node <x=1,y=1/2> which we label with 1, move diagonally left down along the "red" line (picture included into forthcoming dbazine article), find no more nodes. Next, we navigate along the red diagonally aligned line just above it. Our starting point is <x=1,y=3/4> which we label with the next integer 2. On the next step we move diagonally left down along the "red" line, find node <x=1/2,y=1/4> and label it with integer 3. On the following step we'd navigate the next red diagonal just above our last one, and label nodes 4,5,6, and 7.
Here is labeling of the first 14 nodes in the tree:
N PATH
-- ----------
1 .1 2 .1.1 3 .2 4 .1.1.1 5 .1.2 6 .2.1 7 .3 8 .1.1.1.1 9 .1.1.2 10 .1.2.1 11 .1.3 12 .2.1.1
13 .2.2
14 .3.1
In other words numbers at every diagonally aligned line form an arithmetic progression. Hence, we'll call them arithmetic siblings. Reader shouldn't confuse those with real siblings, which are, as you might recall from the introductory article, aligned along the lines that meet at breadth-first convergence point. The other distinction is that each line drawn trough the sibling nodes has a slope less than that of the diagonal.
Reader might have already noticed that every first (or left-most) child is twice as big as its parent. Therefore, we call all nodes on each vertical line as geometric descendants.
An immediate property of the above labeling schema is density: we have all integer positive numbers utilized. Since some database implementations have integers of limited range only, density might become an attractive feature, because one would be able to store and manipulate bigger trees before hitting arithmetic overflow exception.
Partial Order Relation
Single integer labeling allows very simple definition of transitive closure relation. Given 2 nodes i and j such that i <= j let k be the maximal integer satisfying i*2^k <= j. For example, if i=1 and j=11, then k=3 and i*2^k=8. Geometrically, the i*2^k is the intersection of arithmetic siblings line originated in j and geometric descendants line originated in i.
Consider binary relation ancestor of satisfying the following condition:
i ancestor of j <=> j-i*2^k < 2^k-1
Informally, we require j to be closer to i*2^k than i to k, if we adjust for "geometric" distance between i and i*2^k. For example, let i=1, j=12, then k=3 and i*2^k=8. We have j-i*2^k = 4 and 2^k-1 = 4, therefore 12 is not an ancestor of 1. Indeed, 12 = 3*22 is an ancestor of node 3, which is a sibling of node 1. (If we allowed to consider the roots of the forest as siblings).
<proof that ancestor of is indeed a transitive relation snipped>
Distance function
Distance function is a straightforward implementation of ancestor relation defined in the previous section
function dist ( child integer, parent integer ) RETURN integer IS
ret integer;
arifgeom integer;
pwr integer;
BEGIN
if parent > child then
RETURN -1;
end if;
if parent = child then
RETURN 0;
end if;
arifgeom := parent;
pwr := 0;
while arifgeom*2 <= child loop
arifgeom := arifgeom*2; pwr := pwr+1;
end loop;
if child-arifgeom < arifgeom/(parent*2) then
RETURN pwr;
end if;
RETURN -1;
END;
select dist(11,3), dist(12,3), dist(13,3), dist(14,3) from dual
-1 2 2 -1
In theory, distance function is all that is required in order to answer hierarchical queries. In practice, however, query with distance function implies scanning the whole hierarchy table and applying ancestor of predicate to every row. With this method we obviously couldn't advance beyond small hierarchies.
Querying node's Ancestors
A typical query, that doesn't require scanning the whole hierarchy might ask to find the list of node's ancestors. Similar to materialized path, nested intervals, and binary rational labelings answering such a query doesn't involve querying hierarchy relation at all. Parent label calculation in our new integer labeling schema is simplicity itself. If the child number is even, then, child divided by 2 gives us the parent. Otherwise we don't jump to the parent right away, but instead, navigate to adjacent (real) sibling with smaller encoding. Technically this means, subtracting 1, and dividing the result by 2:
function immediate_parent ( child integer ) RETURN integer IS
ret integer;
BEGIN
ret := child;
while mod(ret,2)=1 loop
ret := (ret-1)/2;
end loop;
RETURN ret/2;
END;
/
select immediate_parent(21) from dual
5
Therefore, queries involving ancestors can be answered efficiently by iterative navigation through the ancestors chain. Technically, this iterating calls to the immediate_parent function can be wrapped inside a table function. (Another common name for table function is virtual view).
If a list of ancestors is used in as a part of more complex report, then, it would make sense creating index on a node number column. (Primary key on the node number might be a good idea, anyway). Assuming that the list of ancestors is small compared to total number of rows in the hierarchy table, then it would be more efficient to access each individual node on the ancestors list by unique index scan . Received on Tue Oct 21 2003 - 01:47:41 CEST