Approximate structural search in trees

From: Kai Großjohann <Kai.Grossjohann_at_CS.Uni-Dortmund.DE>
Date: Fri, 05 Jul 2002 19:03:34 +0200
Message-ID: <vaflm8q2jkp.fsf_at_lucy.cs.uni-dortmund.de>



Suppose I have a collection of ordered labeled trees (vulgo XML documents). Suppose I have a query which finds nodes based on structural conditions. Further suppose that I can't expect queries to express the exact structural conditions, so I would like to treat them in a vague way.

As an example, suppose the query is looking for a node labeled A which has a sibling labeled B. Now suppose there is no tree which has this, but there is a tree where a node labeled A has a cousin labeled B. Then I'd like the query to find this tree.

I could find some work computing the similarity between trees, based on a tree-edit distance.

This work has a couple of problems:

  • The query is usually a very small tree, whereas the document is a big tree, so the similarity will be small.
  • Some queries are not expressible as trees. Consider a query searching for a node labeled A which has either a B-labeled child or a C-labeled grandchild, or both. This disjunction cannot be expressed as a tree.
  • It is not clear to me how does the tree-edit distance relate to the idea of relevance to a user. Suppose the system finds t1 and t2 for a query q, but similarity(t1,q) > similarity(t2,q). How do I know that the user will also prefer t1 over t2?

But I'm sure I'm not the first person to think of all this stuff. So, is there some literature?

kai

-- 
A large number of young women don't trust men with beards.  (BFBS Radio)
Received on Fri Jul 05 2002 - 19:03:34 CEST

Original text of this message