Re: Hierarchical query

From: Marshall <marshall.spight_at_gmail.com>
Date: Wed, 13 Jun 2007 23:47:15 -0000
Message-ID: <1181778435.230571.100170_at_z28g2000prd.googlegroups.com>


On Jun 13, 2:25 pm, Jan Hidders <hidd..._at_gmail.com> wrote:
> On 13 jun, 21:10, Vadim Tropashko <vadimtro_inva..._at_yahoo.com> wrote:
> > On Jun 13, 9:19 am, Jan Hidders <hidd..._at_gmail.com> wrote:
>
> > > Ok. I'm going to assume the following DTD (in a notation of my own
> > > making to make it a bit more readable) for the syntax tree. (It uses
> > > no attributes to keep things simple):
>
> > > <stat_bl> --> <decl_kw> <decl_item_list> <begin_kw> <statements>
> > > <semicol_kw> <end-kw>
> > > <decl_kw> --> EMPTY
> > > <begin_kw> --> EMPTY
> > > <end_kw> --> EMPTY
> > > <semicol_kw> --> EMPTY
> > > <statements> --> ( <stat_bl> | <assignment> )+
> > > <assignment> --> <var> <assign_kw> ( <var> | <number> ) <semicol_kw>
> > > <var> --> PCDATA
> > > <assign_kw> --> EMPTY
> > > <number> --> PCDATA
> > > <decl_item_list> --> ( <var> <type> <semicol_kw> )+
> > > <type> --> PCDATA
>
> > All right, the DTD is a bastardized [context free?] grammar describing
> > a language that XML document is an element of. Although the adjectives
> > "cumbersome" and "ugly" still apply, I grudgingly admit that the idea
> > that a grammar fits into DTD effortlessly is quite powerful (so I'm
> > removing the "XML sucks" image from my homepage:-)
>
> Of course, it combines ugliness with great power. Welcome to the
> Dark Side. :-)

This raises some questions for me. Does the above mean we should consider comparing XML and its associated ... whadayacallits? XPath, XSL, etc.?, against more traditional parsing techniques? *** What, if anything, would that leave out? ***

It seems clear enough, the idea of regarding an XML document as a syntax tree, and the DTD as a grammar. In which case, *XML*parsing is then the analog of ... what? Converting an ascii version of a syntax tree into a binary syntax tree? Sort of like parsing s-expressions?

Then XML querying and updating is comparable to operations we would do once we had a syntax tree, which seems to be multipass traversal and transformation. Think of an interpreter, which traverses the tree, carrying out instructions as it does so. The more complex case is a compiler with multiple analysis, optimizer, and code generator passes. These involve generating additional data structures which may or may not be trees as well. A symbol table in a lexically scoped language likely would be; an object file likely would not be.

If that analysis is valid, then I'd be interested to continue to a comparison with other techniques for accomplishing the same thing, particularly from the standpoint of expressive power, but also from the standpoint of convenience and generality.

Other techniques I can think of are:

  1. ad hoc traversal of an object graph in an OO language
  2. pattern matching over abstract data types, as introduced by SML. (The gold standard.)
  3. "Tree walking" a la Terrance Parr / Antlr (the upstart.)
  4. is what I am most familiar with. It isn't particularly pleasant, but I suppose it's no more objectionable than anything else in an OOPL. One ends up using the Visitor pattern a lot, and one necessarily finds oneself having to defeat the type system, since OOPLs are organized around extensible structures w/ fixed operations, whereas trees work best with fixed structures w/ extensible operations.
  5. I would be very interested to hear a comparison with pattern matching from someone who is familiar with both techniques. One thing I can see immediately is that XPath expressions allow one to address only some specific part of a tree that one is interested in, whereas pattern matching requires whole-tree analysis. How much of an issue is this in practice?
  6. is interesting. I admit that when I first heard about the idea I was extremely skeptical. The idea that one would write a grammar to abstract operating on a tree one had generated by using a grammar to abstract over a language seemed perverse and unnecessarily complex at first. However in practice it's turned out to be the opposite: it is extremely simple. Tree grammars are a fraction of the size of the original grammar, possible similarly to the size of an XPath expression to a DTD.

I suppose there are also term rewriting languages but I'm not as familiar with them. I am also vaguely aware of "strategies" which are abstractions of data structure traversal techniques. I suppose further there are the various tree encoding techniques of SQL, but I haven't found these to be very exciting. Perhaps a more complete relational language ...

> Btw., if you forget about sequence order, document order and value
> comparisons for a moment, the core of XPath is actually a relatively
> neat calculus of binary relations:
> - p_1/p_2 is the concatenation of binary relations
> - p_1[p_2] is the selection of pairs in p_2 whose right-hand side
> matches a left-hand side in p_2 (semijoin anyone?)
> - p_1 union p_2 is the set union of two binary relations
> - p_1 intersect p_2 is the set intersection of two binary relations
> - p_1 except p_2 is the set difference between binary relations
>
> Did I mention Tarski already? :-) Of course we had to wait until
> XPath2.0 until the three set operations were all allowed everywhere,
> but we have them now.

I guess we're talking about Relation Algebra here? I don't know enough about it to know if this is just an interesting comparison of if there's some specific result being alluded to.

My hope is that this post will stimulate an educated and informative discussion of the relative merits of different approaches to tree traversal, not neglecting to differentiate among the needs of diverse use cases.

Hey, it could happen.

Marshall Received on Thu Jun 14 2007 - 01:47:15 CEST

Original text of this message