Re: Nested sets in SQL - inventor?
Date: 27 Jul 2005 11:36:41 -0700
Message-ID: <1122489401.656825.63900_at_g14g2000cwa.googlegroups.com>
Mikito Harakiri wrote:
> Appeo Allkam wrote:
> > http://www.dbpd.com/vault/9811/kamfn.shtml
>
> I don't quite understand the following passage in Kamfonas article
>
> <quote>
> You may ask: "Why don't we use a hierarchical keyword to achieve the
> same result?" For example, an IP address-like scheme enumerates the
> nodes of a tree as 1,1.1,1.2,1.3,1.1.1, and so on. The problem with
> this scheme is that you have to start all qualifications from the top.
> Otherwise, you'll have to scan the whole index or table. With the L and
> R enumeration however, you can anchor your search at any level, relate
> any node to any other one, and still employ matching index scans.
> </quote>
>
> Any interpretations?
Well, I'm reading further and the next paragraphs are partially true and completely wrong:
<quote>
Descendent-seeking queries, such as the one in Listing 1's Case 2, are
the most common and the most efficient. The optimizer will use a
matching index scan to find the qualifying D.L values that lie between
the A.L and A.R constants. With this query plan, the cost of
descendent- seeking queries is one sweep through the index reading in a
number of contiguous pages proportional to the answer set's size. In
ancestor-seeking queries, such as Listing 1's case 1, the between
predicate restricts a constant, D.L, between the two columns A.L and
A.R. A combined index on L (descending) and R (ascending) helps these
ancestor searches. The best plan we can expect for these queries is a
matching lookup of D.L on the combined index, and a scan to the end of
the index using index only access. Consequently, the average cost for
ancestor-seeking queries is half an index-only scan.
</quote>
The cost estimation for descendant looking queries is correct. What is
the efficiency of ancestor search? My understanding is that with
combined index, or bitmap, or even spatial it still sucks.
With Matrialized Path/Nested Intervals you *calculate* the chain of ancestors (doesn't really matter on the client, or server) and construct a dynamic SQL query
select * from tree where path in ('1','1.2', 1.2.1')
which as a concatenation uf 3 index unique scans is extremely fast.
<quote>
The L and R method has a level of magnitude performance advantage over
any recursive SQL method, such as the SQL recursive union or Connect By
clause. The node enumeration captures the nodes' topological ordering
once, thus enabling transitive closure in one simple step, as opposed
to multiple invocations. Detecting whether two nodes have an
ancestor-descendent relationship normally requires the path traversal
from one node to the other. Using the L and R numbers, however, you can
test any two nodes using a simple between clause--without traversing
the graph. Because the L and R method doesn't need to traverse the
structure, more selective predicates may filter down the qualifying
rows before applying the ancestor-descendent qualification. In
recursive SQL, path traversal has to happen on unconstrained node sets,
postponing highly filtering predicates until after the paths are
traversed exhaustively.
</quote>
This is entirely wrong. Traversing adjacency list is fast. Each next node is found by index unique scan. Multiple invokations are not evil, as long as their calls are not exposed (over a network connection between client and server). This is why it makes sence supporting recursive SQL on server, instead of client querying hierarchy in multiple dynamically generated nonrecursive SQL queries. Received on Wed Jul 27 2005 - 20:36:41 CEST