Re: Three Kinds of Logical Trees

From: dawn <dawnwolthuis_at_gmail.com>
Date: 24 Jul 2005 16:01:47 -0700
Message-ID: <1122246107.054344.299320_at_g47g2000cwa.googlegroups.com>


Marshall Spight wrote:
> dawn wrote:
> > Marshall Spight wrote:

> > > (I am unclear why you use the term "metadata" here.
> >
> > If you look at a DOM tree for an XML document (or just zero in on an
> > XHTML document if easier), you see that you move from tag to tag in a
> > path until you get to a value that has no children. So, you step down
> > the tree from <html> to <body> to <div> to <p> to a value that is the
> > text in the paragraph. The leaf nodes have values and others have
> > metadata.
>
> That strikes me as a nonstardard definition of the use of metadata,
> but no matter.

I used html tags, but these could have been metadata from any structure, so a path could be <Person> then <lastName> then a value. I haven't done a lot of work with XML compared to others, but unless I am misunderstanding something, this would be a common way of shaping metadata & data from an RDBMS into an xml dom tree.

>
> > > I agree "scalar" is an unsatisfying term. I have in mind a better one,
> > > but it requires a type system that's a lot different from SQL's.
> > > (Which I assert is an "opportunity." :-)
> >
> > I'm planning to start, uh, blogging, this Fall and I decided to test
> > out various tools this past spring, so I have a first entry (which
> > likely looks like I abandoned the cause) at my not-exactly-perfect web
> > site (they say that women apologize more than men) about the types I
> > have at the top of my types tree (under the "type" type) in my ideal
> > LOGICAL system.
> >
> > http://tincatgroup.com/mewsings
>
> The pun in the URL is awful. I like it!

Yes it is and thanks.

>
> > I'm guessing yours are different?
>
> Well, that column seemed to be mostly about document management
> and not data management.

but if you knew me, you would know that I don't talk about document management, I talk about data management. But at the logical level, we only care about a value such as 4 if we have some context, semantics. We model propositions and retrieve the same. I used to say that I just need "String" and "Binary" as my data types, with other types inheriting from those. When I type in a 13 as a number it is a string of two integers, the same as if I type it in as a string. The computer can do what it wants to store it in a smaller number of 1's & 0's, I just want to enter strings and get them back. I used the term "Document" because I am not interfacing with the devices via voice, so it seems like I wan to enter data from a document and get documents back. I would be OK with the term "Word" or "Sentence Fragment" or even "String" or even "Page" as this type, but I thought Document fit well with how a person thinks. Am I working with a computer related to music files, video files, picture files, document files?

> I'm really starting to see them as two
> entirely different disciplines,

just when they are coming so much closer together. Maybe it is easier for you to see a comment attribute as a component in a document than another tag and related value, but at least you can think of data processed reports as documents, right? Can you then make the leap to see the input forms as documents too? And what is the interface -- the input to and the output from, right?

> and I believe that most of what's
> going on in XML is about document management and not data
> management.

I tell people who either a) fear xml or b) worship xml that it is simply a minor advancement over comma-quote format. If you buy that, then you should be able to see that xml is all about data. So I suspect your concerns are with the "management" word. I will grant that managing data using xml and related documents such as dtd or xsd is not the same as managing data with an RDBMS. However, from a model perspective, managing data as trees (and using xlink as other graph structures) is an alternative to managing data as relations. Then some of the same data management features as a typical DBMS can be added to this model.

> I admit that text documents are an important data type, but
> it's *just one* datatype among millions.

It is the type of data that I am entering when I stream in numbers from a temperature device or type in my name and address again. If you want to think of it as String instead of Document, feel free. The average person might not feel as in touch with Strings as with Documents, however.

> Limiting ourselves
> to looking at only a single datatype doesn't put us in a good
> position to think about datatypes in general.

It is not a limit, but the next level down from "Type" or "Object" as the type from which all types flow. So, Document is-a Type and Mime is-a Type. Date is a Document, HTML is-a Document, Integer is-a Document (replace Document with String and it will sound better to you). It is only when you look at the interface between the software and the machine that you would want to think of a number as not being a type of string. If you look at the input to and the output from the database, you will see it is all strings of data. Mime types are also strings, but they differ in that they do not build up to documents, but to songs or videos or object code or ...

So, all non-binary data that I use when working with a computer are in terms of strings and I could call all such forms for getting these strings in and out of the computer "documents".

>
> > > > Otherwise, if you have such a structure
> > > > and you have a value that is either multipart or multivalued within
> > > > this static structure, you still don't have SQL-92 support and most
> > > > SQL-DBMS's don't make it easy to have such structures, even if they are
> > > > supported in some way.
> > >
> > > I'm not saying that this is necessarily the best way to go, but you
> > > can certainly handle this case in a 1NF way.
> >
> > Sure, but that doesn't mean that if you have a tree that matches your
> > definition then it IS in a SQL-DBMS structure. If so, then, yes,
> > obviously SQL handles that type of tree well, but if you permit other
> > instances of such a type then, no, SQL doesn't handle all trees of this
> > type well, so this partitioning of types of trees did not isolate
> > information about SQL and trees. Did that make sense?
>
> Sorry, but it didn't. At least not enough to tell you whether I agree
> or not.

I disagree with your statement that SQL handles a particular type of tree well because there are trees that match your def of that type that SQL does not handle well. They could be reshaped into trees that SQL does handle, but I did not think that was your point.

> I'll try rephrasing: If you have data that conceptually matches
> what I'm calling "static heterogeneous", (specifically, the data
> hierarchy is fixed, as in Customer/Account/Invoice/InvoiceLineItem)
> then you will be able to model, query, update, and constrain this
> data very well in SQL.

I suspect that I disagree, but I might not be understanding what your tree looks like.

For example, if InvoiceLineItem has an attribute "discount" which has a value that is a list, such as

10.0
20.0

would your tree look the same as if this attribute were constrained to be a single value? If the two trees would look the same, then SQL can handle one and not the other (without first changing it to a different tree). So, I think you have to get into restricting data types before you can say that SQL will handle a tree in this shape.

I will grant that I might not be seeing the same tree that you are seeing. Mine would have a path like this:

namespace:datasource/schema identifier(s) such as a URI or host:port:datasourcename
|
Customer customerID=12345
|
Account accountNbr=12345-01
|
Invoice invoiceNbr=3828193
|
InvoiceLineItem lineNbr=3
/
InvoiceLineItem discount=10.0

      \
and a second child node of the InvoiceLineItem lineNbr=3 would be discount=20.0

  1. Does this meet your criteria for what type of tree it is?
  2. Does your tree look similar?
  3. Do you agree that SQL doesn't query this tree very well as it gets hung up on having two values for the discount attribute of the InvoiceLineItem at lineNbr=3?

> In constrast, if you have an org chart,
> in which the structure is not fixed, then you will have a harder
> time, particularly with querying and constraining, and maybe also
> updating.
>
>
> > If you are saying that you can put data structures that could be
> > implemented in a SQL-DBMS into such a structure, OK, but if you are
> > saying that SQL handles such tree structures well, then NO -- it only
> > handles the SQL-like flavors well.
>
> I don't really get it. *Any* data structure can be implemented in SQL.
> I'm trying to get at the question of what specific ones are a good
> match for sql, and I'm saying static heterogeneous is, and the others
> aren't.

It might simply be that I'm not catching on to your terms here and if so, I apologize. If I am understanding, then I can agree that you can say that the trees that SQL handles well are static heterogeneous, but you cannot say that SQL handles all such trees well.

>
> > > > Interesting topic, but I not quite sure about this partitioning as yet.
> > > > Are you trying to carve out how the relational model fits within a
> > > > tree model by giving a term to such trees?
> > >
> > > Yes!
> >
> > You have possibly made it to a category that is a superset. It might
> > be necessary to use such a structure (I'm not quite fully tapped into
> > it, so I'll hedge on that), but it is not sufficient unless you add a
> > translator to take a tree with multipart values or multivalue and
> > explode it (that really is formally the term we use in my neck of the
> > woods) into a tree that SQL does like.
> >
> > > SQL handles the static heterogeneous case with ease,
> >
> > am I correct that this is only tree for a subset of this type of tree
> > by your definitions?
>
> I'm going to assume you mean "true" when you first said "tree", because
> otherwise I can't make the sentence parse.

tree (I'm using it again as a synonym for "true" -- do you have a problem with that? I mean, oops, sorry).

> If so, then *no*, I'm
> saying it's true for all tree structures that match my definition
> of static homogeneous. (And, explicitly, *not* true for the other
> two kinds. They're hard to work with in SQL.)

Do you understand the tree that I am handing you that I think is a counter-example? Is it one?

>
> > > but really
> > > chokes on the other two cases. The big difference seems to be
> > > the fixed depth vs. variable depth issue.
> >
> > Yes, that is an issue, but only one.
>
> Okay. What are the others, as you see it?

My counter-example of a complex data type (a list in this case), would be one.

>
> > > It's impossible to
> > > handle variable depth without something more powerful than the
> > > basic RM; you need something at least as powerful as transitive
> > > closure; recursive queries or (ugh) iteration will also work.
> >
> > yes & yes and iteration is precisely how I drive from my house to my
> > parents' house, among other things.
>
> I do not agree. It is just as accurate to say that in going from
> point A to B to C to D, you invoke a pure function to transform your
> position A to B, which then invokes a function to transform your
> position from B to C, which then invokes a function to transform
> your position from C to D. If you want to prove to me that the
> universe is inherrently iterative, you're going to have to point
> out to me a loop counter somewhere in the real world. Under a rock,
> say.

I have no reason to push that one further. I'd prefer a function that beams me up to one that requires me to get 1 tank of gas and then another (2) tank of case and so on until I reach the destination.

> The fact that so many progammers favor iteration over recursion
> is something I consider odd, given that iteration is strictly
> less powerful than recursion.

Yes, but if you can iterate instead of using recursion, then you don't set yourself up for a stack overflow, for example. With recursion, you push the stack with each iteration and don't free up that memory until you are done with the entire process. If an iterative algorithm is possible, then I would only use recursion if that algorithm gives improved performance or increased maintainability (at least that is all I can think of now).

> It is possible to transform any
> iterative algorithm into a recursive one; the reverse is not true.

Yes indeed.

> > How do you think relational operators are implemented? ;-)
>
> Alas, this is irrelevant. We build software in layers, and each
> layer is implemented in whatever paradigm it chooses, but this
> does not constrain the choice of paradigm of higher layers.

OK. So, some functions can be defined using iteration, but you want to be done with the iteration in the underlying layers and not in any software you or I would write. As long as it remains an option, should I choose to use it instead of playing any games to get around using it, then I'm OK with that approach.

>
> > > So the area I'm exploring is, what kinds of operations do we want
> > > to do on the dynamic tree types, and what smallest bit of power
> > > to we have to add to the RM to handle that well?
> >
> > That is precisely the wrong question ;-) if you are talking about the
> > logical data model IMO. If a human is interacting with the data model,
> > then that person can apply either the metaphor of a graph (tree or
> > otherwise) or of sets and benefit from the use of both metaphors,
> > rather than being restricted by one.
>
> I would be interested to hear what you consider to be the right
> question.

How about this one --
How can we have an API to data that permits us to view it in the variety of ways that would be helpful? The answer might include operators for joining relations as well as for navigating from one row of data in one relation to another relation, using a foreign key.

> You regularly mention graphs. Do you have some demonstration of
> why you consider them a good choice?

I would love to have some emperical data, but I don't. I do have anecdotal data and first-hand experience of software development teams being much more productive and/or requiring many fewer developers when using a DBMS that employs a tree/graph model along with a set approach, instead of a SQL-DBMS. I have no data that proves that this is the case in general, nor that shows that the data model is key to the productivity gains. I have asked before in this forum what kind of an experiment could be set up that would demonstrate that. I cannot think of anything I could set up as an experiment for a small cost where the outcome could sway anyone to anything related to the choice of a data model.

I have only my intuition for that, which runs at about .72 (and no, I have no intuition for whether to play red or black in vegas, so don't think this implies that I'm right 72% of the time, just 72% of the time that I am running with an intuition).

> Maybe a description of some
> minimal set of operators and what you can do with them? I am quite
> pleased with the fact that the RM has its minimal set of operations
> and its correspondence with first order logic; can you describe
> anything comparable?

There are some papers along this line that Jan has pointed me to before (so they are on my stolen computer). Perhaps googling for functional-data-model or xml-data-model (I know, I know) would yield results. I'll check at some point, but let me know if you find something.

>
> > If we can model the idea of
> > "travel" by showing a picture of a person on a bicycle or by showing an
> > airplane, then how can we extend the picture of the bicycle minimally
> > so that it gets at the idea of being able to go over water? That
> > question is as relevant in my mind as the one you asked (OK, that is
> > almost the case -- I used a metaphor, so it is necessarily flawed and
> > also limited).
>
> Alas, I did not follow this metaphor at all. :-( Could you try to
> rephrase your point, perhaps without using a metaphor?

My point is that each metaphor is some partial version of the whole. I just listened to a CD of a theologian talking about how Jesus used parables and he said something like "a parable is a metaphor and a metaphor is a lie". If you know the poem about the blind men and the elephant, that makes a similar point to mine. We gain something by working with propositions viewed as relations AND as "webs" (do you prefer that to "graphs"?) rather than restricting ourselves to one of these.

>
> > > > If so and if I understand
> > > > your terms, then there are at least NF2 (non-first normal form) models
> > > > that would fit your "static" heterogeneous term.
> > >
> > > The issue for me is not the model so much, as it is what operators
> > > do we need to work on them.
> >
> > Give freedom to the functions -- they can be relational operators or
> > functions to move down a path or whatever is useful.
> >
> > > Modeling structure is relatively easy;
> > > the hard part is querying, updating, and constraining.
> >
> > Yup. I check in on XQuery on occasion just for the entertainment.
> > smiles. --dawn
>
> I hear XMLSchema is a laugh riot.

I'll check it out. smiles. --dawn Received on Mon Jul 25 2005 - 01:01:47 CEST

Original text of this message