Approximate structural search in trees
From: Kai Großjohann <Kai.Grossjohann_at_CS.Uni-Dortmund.DE>
Date: Fri, 05 Jul 2002 19:03:34 +0200
Message-ID: <vaflm8q2jkp.fsf_at_lucy.cs.uni-dortmund.de>
Suppose I have a collection of ordered labeled trees (vulgo XML documents). Suppose I have a query which finds nodes based on structural conditions. Further suppose that I can't expect queries to express the exact structural conditions, so I would like to treat them in a vague way.
Date: Fri, 05 Jul 2002 19:03:34 +0200
Message-ID: <vaflm8q2jkp.fsf_at_lucy.cs.uni-dortmund.de>
Suppose I have a collection of ordered labeled trees (vulgo XML documents). Suppose I have a query which finds nodes based on structural conditions. Further suppose that I can't expect queries to express the exact structural conditions, so I would like to treat them in a vague way.
I could find some work computing the similarity between trees, based on a tree-edit distance.
This work has a couple of problems:
- The query is usually a very small tree, whereas the document is a big tree, so the similarity will be small.
- Some queries are not expressible as trees. Consider a query searching for a node labeled A which has either a B-labeled child or a C-labeled grandchild, or both. This disjunction cannot be expressed as a tree.
- It is not clear to me how does the tree-edit distance relate to the idea of relevance to a user. Suppose the system finds t1 and t2 for a query q, but similarity(t1,q) > similarity(t2,q). How do I know that the user will also prefer t1 over t2?
But I'm sure I'm not the first person to think of all this stuff. So, is there some literature?
kai
-- A large number of young women don't trust men with beards. (BFBS Radio)Received on Fri Jul 05 2002 - 19:03:34 CEST