Re: Approximate structural conditions in trees

From: Jan.Hidders <hidders_at_hcoss.uia.ac.be>
Date: 10 Jul 2002 08:45:06 -0500
Message-ID: <3d2bffed$1_at_news.uia.ac.be>


In article <vafelejctqg.fsf_at_lucy.cs.uni-dortmund.de>,  <Kai.Grossjohann_at_CS.Uni-Dortmund.DE> wrote:
>
>I'm aware that there is some work on similarity between trees (one
>approach is the tree editing distance), but some things are not clear
>to me:
>
>(1) A query is not the same as a complete tree. For instance, a
> query is much smaller than a complete tree.

That's not a problem. You simply change the query from "look for documents similar to this pattern" to "look for subgraphs/trees in document that are similar to this pattern".

> Also, a query could contain other kinds of conditions. For
> example, what if the query says to search for A elements which
> have either a B child or a C child? It's clear that this does
> not map to a simple tree.

If the user wants to ask a complicated question then he or she should not use a fuzzy pattern matching query mechanism.

> So it is problematic to apply a similarity measure between trees
> if one tree could be very different from the other one, or maybe
> one tree is not a tree at all!

No problem, you then take "graph edit distance".

>(2) Is the tree edit distance a good measure of the users' idea of
> relevance? If one tree has a low distance and the other has a
> high distance, will the user really prefer the tree with the low
> distance?

I don't think "the user" exists. :-) That means that you shouldn't try to come up with the ultimate tool that does exactly what "the user" wants. In stead try to come up with different tools with a well-defined and predictable behavior such that different users can select the tools that suit them best.

>Is there work on this? What are the keywords to feed to Google for
>this?

"query semistructured data by example"

  • Jan hidders
Received on Wed Jul 10 2002 - 15:45:06 CEST

Original text of this message