Re: Hierarchical query

From: Jan Hidders <hidders_at_gmail.com>
Date: Wed, 13 Jun 2007 10:19:21 -0700
Message-ID: <1181755161.954560.222940_at_i38g2000prf.googlegroups.com>


On 13 jun, 18:11, Vadim Tropashko <vadimtro_inva..._at_yahoo.com> wrote:
> Reposting with more clarification (as Jan asked).
>
> Suppose I have a BNFgrammar and a source text parsed into a tree. How
> would I query an
> identifier declaration?
>
> All the XQuery tutorials (where I went to gather some ideas) start
> with simpleminded examples like browsing all the descendants of /
> bookstore/book (and the bookstore XML design is such wrong idea to
> boot).
>
> Perhaps some example is needed. A simplified grammar:
>
> statement_block:
> 'declare'
> declaration_item_list
> 'begin'
> statements
> 'end'
> ;
>
> statements:
> (statement_block | assignment) statements |
> ;
>
> assignment:
> identifier ':=' (identifier | number) ';'
> ;
>
> declaration_item_list:
> identifier 'integer' ';'
> ;
>
> Suppose we parse the following text
>
> declare -- token #1
> i integer; -- tokens 2,3,4
> begin
> i := 1; -- tokens 6,7,8,9
> end;
>
> So that we get the derivation tree like this:
>
> 1 statement_block
> 1.1 'declare' (matches token #1)
> 1.2 declaration_item_list
> 1.2.1 identifier (matches token #2)
> 1.2.2 'integer' (matches token #3)
> 1.2.3 ';' (matches token #4)
> 1.3 'begin' (matches token #5)
> 1.4 statements
> 1.4.1 assignment
> 1.4.1.1 identifier
> 1.4.1.2 ':='
> 1.4.1.3 number
> 1.4.1.4 ';'
> 1.4.2 statements (matches empty token)
> 1.5 'end';
>
> Now, given a derivation tree and the leaf node identifier 1.4.1.1
> corresponding to the token #6 which is variable i, how do we find the
> node 1.2.1 that declares it?

Ok. I'm going to assume the following DTD (in a notation of my own making to make it a bit more readable) for the syntax tree. (It uses no attributes to keep things simple):

<stat_bl> --> <decl_kw> <decl_item_list> <begin_kw> <statements>
<semicol_kw> <end-kw>
<decl_kw> --> EMPTY
<begin_kw> --> EMPTY
<end_kw> --> EMPTY
<semicol_kw> --> EMPTY
<statements> --> ( <stat_bl> | <assignment> )+
<assignment> --> <var> <assign_kw> ( <var> | <number> ) <semicol_kw>
<var> --> PCDATA
<assign_kw> --> EMPTY
<number> --> PCDATA
<decl_item_list> --> ( <var> <type> <semicol_kw> )+
<type> --> PCDATA

For starters I'll first do the reverse query, so I will assume there is a variable $dvar that contains a <var> element that describes a variable in a declaration. The XPath expression that walks to all the <var> nodes in an assignment that are in the scope of $dvar is as follows:

(1)  $dvar/(
(2)    ../../statements//assignment/var[string() = $dvar/string()]
(3)        minus
(4)    ../../statements//stat_bl[decl_item_list/var/string() = $dvar/
string()]/statements//var
(5) )

The idea is quite simple: the path expression in line (2) walks to all variables that are nested within the statement block of the declaration $dvar, and the path expression in line (4) subtracts those for which there is another intermediate declaration that supersedes the one of $dvar.

Now the query that you asked for, which in some sense simply the reverse. We assume now that $uvar contains the <var> element that is used in an assignment and the path expression walks to the <var> elements in a declaration that binds the <var> element in $uvar:

(1) $uvar/(
(2) ancestor::stat_bl/decl_item_list/var[string() = $uvar/ string()]

(3)      minus
(4)     ancestor::stat_bl[decl_item_list/var/string() = $uvar/
string()]/ancestor::stat_bl/decl_item/var (5) )

Of course there are many many different ways of expressing these queries, but these are the most compact ones I could think of. Note that I actually didn't have to take into account the order of the declarations, just how they are nested. But this is because of the way you defined the syntax and scoping rules. However, if you would extend the syntax and the scoping rules such that order also maters then extending the queries to take that into account should not be a big problem.

  • Jan Hidders
Received on Wed Jun 13 2007 - 19:19:21 CEST

Original text of this message