Re: Three Kinds of Logical Trees

From: dawn <dawnwolthuis_at_gmail.com>
Date: 22 Jul 2005 07:41:14 -0700
Message-ID: <1122043274.726940.222980_at_z14g2000cwz.googlegroups.com>


Marshall Spight wrote:
> dawn wrote:
> > Marshall Spight wrote:
> > > I've been thinking about trees in the abstract lately, and
> > > trying to classify them. I am not talking about trees as
> > > a physical data structure, such as BTrees or Red-Black, but
> > > rather trees as logical data structures. In other words,
> > > the *interface* to a tree.
> > >
> > > I've identified three distinct kinds.
> > > "homogeneous tree"
> > > All nodes are the same type, tree has varying structure
> > >
> > > "dynamic heterogenous"
> > > Nodes are varying types, tree has varying structure
> > >
> > > "static heterogeneous"
> > > Nodes are varying type, tree has fixed structure.
> > >
> > > Examples
> > >
> > > homogeneous: org chart. Every node is a person record, but the
> > > structure of the organization may be of whatever form.
> >
> > Or a simpler example of a tree with an interger on each node.
>
> Yes. But I'm not sure how realistic an example that is. Note
> that I'm trying to classify *logical* trees, so the structure
> of the tree needs to have some meaning apart from just being
> a data structure. Usually when we talk about a tree of ints,
> we've got some data structure going because we want logn lookup.
> That I would call a set, even if the underlying physical structure
> is a tree.

hmmm. I'll process, but I figured that putting a person record on each node when trying to classify logical trees was muddying the example. I don't think I'm fully tapped into what you are thinking yet, but this gives me more clues.

>
> > > dynamic heterogeneous: parse tree. There are specific kinds of
> > > nodes, but the structure of the tree is relatively unconstrained.
> >
> > There are people here much more knowledgeable about trees than I (but
> > that has not stopped me before, eh?) but even with the lose definition,
> > I wouldn't say "relatively unconstrained" but perhaps "either
> > unconstrained or constrained by a grammar." The grammar constraint
> > might not be fully encoded into the tree, due to the complexity of the
> > grammar.
>
> Yes, exactly. In other words, one has, say, 20 different kinds of
> nodes; each node type may have a fixed or variable number of children;
> each specific child may be constrained to be of a particular type:
> a parse tree. I'm looking for a term that captures the fact that
> a node has a specific structure. Each node has fixed *local* structure,
> but the tree's *overall* structure is not fixed. (Hence "dynamic
> heterogeneous.")

OK, so the tree structure is not the only variable here -- you are also looking at the structure within a node. Got it.

>
> > > static heterogeneous: customer/account/invoice/line item. Each
> > > level in the tree is fixed, with fixed relationships.
> > >
> > > Here "dynamic" and "static" refer to the structure of the tree.
> > >
> > > SQL handles the third kind of tree extremely well.
> >
> > I'm not so sure. If your tree extends from metadata down to data
> > values, you might have to add in that less-than-satisfying term "scalar
> > value" for your leaf nodes.
>
> (I am unclear why you use the term "metadata" here.

If you look at a DOM tree for an XML document (or just zero in on an XHTML document if easier), you see that you move from tag to tag in a path until you get to a value that has no children. So, you step down the tree from <html> to <body> to <div> to <p> to a value that is the text in the paragraph. The leaf nodes have values and others have metadata.

> To me that means
> the system catalog; one doesn't usually join from the catalog to
> user-define tables, yes?)

With the topic of trees, I flipped out of "relational". So, I would say that it is the case when I work with a tree or graph data structure (which for me are typically old data structures rather than new) I might step from metadata, such as the name space (by whatever name -- this might be a data source or schema), down the tree to a file (for example), to a record based on a key value, to a to a field, to a data value. If that data value is the value of a key in another file, then I might step to a record in another file, then to a field, and then finally to the target data value. Everything above the leaf value is really just data about the data I'm after -- it is metadata, although only the names might be considered such -- the name of the name space, of the two files and of the two fields.

But the point is that in my mind the data trees do have metadata and paths through the tree lead to data values.

> I agree "scalar" is an unsatisfying term. I have in mind a better one,
> but it requires a type system that's a lot different from SQL's.
> (Which I assert is an "opportunity." :-)

I'm planning to start, uh, blogging, this Fall and I decided to test out various tools this past spring, so I have a first entry (which likely looks like I abandoned the cause) at my not-exactly-perfect web site (they say that women apologize more than men) about the types I have at the top of my types tree (under the "type" type) in my ideal LOGICAL system.

http://tincatgroup.com/mewsings

I'm guessing yours are different?

>
> > Otherwise, if you have such a structure
> > and you have a value that is either multipart or multivalued within
> > this static structure, you still don't have SQL-92 support and most
> > SQL-DBMS's don't make it easy to have such structures, even if they are
> > supported in some way.
>
> I'm not saying that this is necessarily the best way to go, but you
> can certainly handle this case in a 1NF way.

Sure, but that doesn't mean that if you have a tree that matches your definition then it IS in a SQL-DBMS structure. If so, then, yes, obviously SQL handles that type of tree well, but if you permit other instances of such a type then, no, SQL doesn't handle all trees of this type well, so this partitioning of types of trees did not isolate information about SQL and trees. Did that make sense?

> I think this kind of structure is what one often ends up with in SQL.
> And I do think it works really well. Consider the customer/account/
> invoice/invoice line item sort of case. The structure is rigid, but
> any of the queries you want to ask against this sort of structure
> are quite easy.

If you are saying that you can put data structures that could be implemented in a SQL-DBMS into such a structure, OK, but if you are saying that SQL handles such tree structures well, then NO -- it only handles the SQL-like flavors well.

>
> > Interesting topic, but I not quite sure about this partitioning as yet.
> > Are you trying to carve out how the relational model fits within a
> > tree model by giving a term to such trees?
>
> Yes!

You have possibly made it to a category that is a superset. It might be necessary to use such a structure (I'm not quite fully tapped into it, so I'll hedge on that), but it is not sufficient unless you add a translator to take a tree with multipart values or multivalue and explode it (that really is formally the term we use in my neck of the woods) into a tree that SQL does like.

> SQL handles the static heterogeneous case with ease,

am I correct that this is only tree for a subset of this type of tree by your definitions?

> but really
> chokes on the other two cases. The big difference seems to be
> the fixed depth vs. variable depth issue.

Yes, that is an issue, but only one.

> It's impossible to
> handle variable depth without something more powerful than the
> basic RM; you need something at least as powerful as transitive
> closure; recursive queries or (ugh) iteration will also work.

yes & yes and iteration is precisely how I drive from my house to my parents' house, among other things. How do you think relational operators are implemented? ;-)

> So the area I'm exploring is, what kinds of operations do we want
> to do on the dynamic tree types, and what smallest bit of power
> to we have to add to the RM to handle that well?

That is precisely the wrong question ;-) if you are talking about the logical data model IMO. If a human is interacting with the data model, then that person can apply either the metaphor of a graph (tree or otherwise) or of sets and benefit from the use of both metaphors, rather than being restricted by one. If we can model the idea of "travel" by showing a picture of a person on a bicycle or by showing an airplane, then how can we extend the picture of the bicycle minimally so that it gets at the idea of being able to go over water? That question is as relevant in my mind as the one you asked (OK, that is almost the case -- I used a metaphor, so it is necessarily flawed and also limited).

>
> > If so and if I understand
> > your terms, then there are at least NF2 (non-first normal form) models
> > that would fit your "static" heterogeneous term.
>
> The issue for me is not the model so much, as it is what operators
> do we need to work on them.

Give freedom to the functions -- they can be relational operators or functions to move down a path or whatever is useful.

> Modeling structure is relatively easy;
> the hard part is querying, updating, and constraining.

Yup. I check in on XQuery on occasion just for the entertainment. smiles. --dawn

>
> Marshall
Received on Fri Jul 22 2005 - 16:41:14 CEST

Original text of this message