Re: Nested Sets vs. Nested Intervals

From: Vadim Tropashko <vadimtro_invalid_at_yahoo.com>
Date: 7 Nov 2005 15:07:27 -0800
Message-ID: <1131404846.648403.77530_at_o13g2000cwo.googlegroups.com>


asdf wrote:
> Hi!,
>
> I am building a web directory similar to the dmoz and the yahoo
> directory. The categories get updated often.
>
> How do I find the subcategories from just the names of the ancester and
> the current categories? For example, both the dmoz and yahoo
> directories have filesystem-like URLs instead of category IDs. How do I
> do that with nested sets or nested intervals? If not, how do they do
> it?
>
> There should be a count for how many listings in each subcategory. It
> is much more efficient to count how many listings are in each
> subcategory with nested sets than the adjacency list. Is there a way to
> count subcategories efficently with nested intervals with farey
> fractions?
>
> Thank you very much!

Subcategories can be counted "semi-efficiently", that is you have to find all the intervals (or just all the left boundary points) that are contained within the ancestor interval. The speed is proportional to the size of the subtree.

Nested sets is often quoted as a solution for static hierarchies. It is rarely mentioned that one can't query ancestors path efficiently. Admittedly, nested sets counts the descendants nicely as rgt-lft-1, but "count" is the only hierarchical total aggregate function that nested sets can do efficiently.

The implementation of ancestor query in nested intervals encoding is somewhat messy. If we ignore the efficiency, then

select count(1)
from tree t
where compare(rational(a,b), lft) > 0 and compare(rgt(lft),rational(c,d)) >= 0

should be good enough. The [a/b, c/d] is the encoding of your node, below which you count the descendants. This is full table scan of the tree table. This could be improved by adding explicit predicate to turn the access path to index range scan

select count(1)
from tree t
where compare(rational(a,b), lft) > 0 and compare(rgt(lft),rational(c,d)) >= 0
and t.lft.num/t.lft.den between
a/b-0.00000001 and c/d+0.00000001

where the magic constant 0.00000001 is designed to offset floating point numerical imprecision. Received on Tue Nov 08 2005 - 00:07:27 CET

Original text of this message