Re: Nested sets in SQL - inventor?

From: Mikito Harakiri <mikharakiri_nospaum_at_yahoo.com>
Date: 27 Jul 2005 11:36:41 -0700
Message-ID: <1122489401.656825.63900_at_g14g2000cwa.googlegroups.com>


Mikito Harakiri wrote:
> Appeo Allkam wrote:
> > http://www.dbpd.com/vault/9811/kamfn.shtml
>
> I don't quite understand the following passage in Kamfonas article
>
> <quote>
> You may ask: "Why don't we use a hierarchical keyword to achieve the
> same result?" For example, an IP address-like scheme enumerates the
> nodes of a tree as 1,1.1,1.2,1.3,1.1.1, and so on. The problem with
> this scheme is that you have to start all qualifications from the top.
> Otherwise, you'll have to scan the whole index or table. With the L and
> R enumeration however, you can anchor your search at any level, relate
> any node to any other one, and still employ matching index scans.
> </quote>
>
> Any interpretations?

Well, I'm reading further and the next paragraphs are partially true and completely wrong:

<quote>
Descendent-seeking queries, such as the one in Listing 1's Case 2, are the most common and the most efficient. The optimizer will use a matching index scan to find the qualifying D.L values that lie between the A.L and A.R constants. With this query plan, the cost of descendent- seeking queries is one sweep through the index reading in a number of contiguous pages proportional to the answer set's size. In ancestor-seeking queries, such as Listing 1's case 1, the between predicate restricts a constant, D.L, between the two columns A.L and A.R. A combined index on L (descending) and R (ascending) helps these ancestor searches. The best plan we can expect for these queries is a matching lookup of D.L on the combined index, and a scan to the end of the index using index only access. Consequently, the average cost for ancestor-seeking queries is half an index-only scan. </quote>
The cost estimation for descendant looking queries is correct. What is the efficiency of ancestor search? My understanding is that with combined index, or bitmap, or even spatial it still sucks.

With Matrialized Path/Nested Intervals you *calculate* the chain of ancestors (doesn't really matter on the client, or server) and construct a dynamic SQL query

select * from tree where path in ('1','1.2', 1.2.1')

which as a concatenation uf 3 index unique scans is extremely fast.

<quote>
The L and R method has a level of magnitude performance advantage over any recursive SQL method, such as the SQL recursive union or Connect By clause. The node enumeration captures the nodes' topological ordering once, thus enabling transitive closure in one simple step, as opposed to multiple invocations. Detecting whether two nodes have an ancestor-descendent relationship normally requires the path traversal from one node to the other. Using the L and R numbers, however, you can test any two nodes using a simple between clause--without traversing the graph. Because the L and R method doesn't need to traverse the structure, more selective predicates may filter down the qualifying rows before applying the ancestor-descendent qualification. In recursive SQL, path traversal has to happen on unconstrained node sets, postponing highly filtering predicates until after the paths are traversed exhaustively.
</quote>

This is entirely wrong. Traversing adjacency list is fast. Each next node is found by index unique scan. Multiple invokations are not evil, as long as their calls are not exposed (over a network connection between client and server). This is why it makes sence supporting recursive SQL on server, instead of client querying hierarchy in multiple dynamically generated nonrecursive SQL queries. Received on Wed Jul 27 2005 - 20:36:41 CEST

Original text of this message