Re: Three Kinds of Logical Trees

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 24 Jul 2005 20:06:10 -0700
Message-ID: <1122260770.457992.126230_at_z14g2000cwz.googlegroups.com>


dawn wrote:
> Marshall Spight wrote:
> >
> > That strikes me as a nonstardard definition of the use of metadata,
> > but no matter.
>
> I used html tags, but these could have been metadata from any
> structure, so a path could be <Person> then <lastName> then a value. I
> haven't done a lot of work with XML compared to others, but unless I am
> misunderstanding something, this would be a common way of shaping
> metadata & data from an RDBMS into an xml dom tree.

Again, this does not appear to be standard terminology to me. As I am used to understanding the term, "metadata" means higher-order data, not simply higher level data. It would not properly be used to describe data in an enclosing scope. In an xhtml example, for some text inside <b> tags, metadata would not be anything about the enclosing <p>, but rather the metadata would be the xhtml dtd.

Metadata is schema information, or type information.

> > > I'm guessing yours are different?
> >
> > Well, that column seemed to be mostly about document management
> > and not data management.
>
> but if you knew me, you would know that I don't talk about document
> management, I talk about data management.

I'm not sure I agree.

> But at the logical level, we
> only care about a value such as 4 if we have some context, semantics.
> We model propositions and retrieve the same. I used to say that I just
> need "String" and "Binary" as my data types, with other types
> inheriting from those. When I type in a 13 as a number it is a string
> of two integers, the same as if I type it in as a string.

Okay, you're headed down a dead end road here. Watch out!

When you type, you are typing characters. There is no way to type in 13 as a number, not even with ^M. You can only type characters. When you type '1' and then '3', you have typed two characters. You don't have a number yet.

> The computer
> can do what it wants to store it in a smaller number of 1's & 0's, I
> just want to enter strings and get them back.

I would be surprised if all you really care about is strings. If all you have is strings, you can't add, for example. The fact that some programming languages will implicitly convert a string to a number in some contexts and add the resulting numbers is simply a distraction; it does not mean that strings and numbers are the same thing.

> I used the term
> "Document" because I am not interfacing with the devices via voice, so
> it seems like I wan to enter data from a document and get documents
> back. I would be OK with the term "Word" or "Sentence Fragment" or
> even "String" or even "Page" as this type, but I thought Document fit
> well with how a person thinks. Am I working with a computer related to
> music files, video files, picture files, document files?
>
> > I'm really starting to see them as two
> > entirely different disciplines,
>
> just when they are coming so much closer together.

They aren't, though. It's just that some people with a document management background are claiming their tools do everything the data management tools do. But their claims are false.

> Maybe it is easier
> for you to see a comment attribute as a component in a document than
> another tag and related value, but at least you can think of data
> processed reports as documents, right?

Not so much. Mostly I think of reports as result sets. You might format them as an html table, pretty them up, and embed it in a page, though. But that's mere presentation.

> Can you then make the leap to
> see the input forms as documents too? And what is the interface -- the
> input to and the output from, right?

The machine interface is, we type characters on the keyboard, and we see pixels on the screen. (We also specify x,y coordinates and have a few more buttons, non-character this time, on the mouse.)

Honestly, I consider the html way of talking about the world, with input forms, hypertext, etc. as a decidedly inferior model of human-computer interaction than the one available 20 years ago. I am fond of saying that Tim Berners-Lee got one important thing right and every other important thing wrong, and in so doing set software back 20 years. The two bloodiest casualties have been UI and data management.

> > and I believe that most of what's
> > going on in XML is about document management and not data
> > management.
>
> I tell people who either a) fear xml or b) worship xml that it is
> simply a minor advancement over comma-quote format. If you buy that,
> then you should be able to see that xml is all about data.

XML is all about strings.

> So I
> suspect your concerns are with the "management" word. I will grant
> that managing data using xml and related documents such as dtd or xsd
> is not the same as managing data with an RDBMS. However, from a model
> perspective, managing data as trees (and using xlink as other graph
> structures) is an alternative to managing data as relations. Then some
> of the same data management features as a typical DBMS can be added to
> this model.

Having the tree as the only possible structure is worse than having the relation as the only possible structure, and I agree with you that the latter is too limiting. I would also propose that having the object as the only possible structure is a lose. Further, I think the random mishmash of stuff that appears in most programming languages is also not the way to go.

I would propose that the solution would support at least relations, lists, tuples, and enum types. In fact, that might be a complete set.

Also: having strings as the only datatype is inferior to having a variety of different types. The perl/tcl/html approach, where everything is a string and you have a bajillion kinds of implicit conversions might be fine if your goal is fast-and-loose, but if you want any kind of discipline, which you certainly do if you want to manage data, then it's out the door.

> > I admit that text documents are an important data type, but
> > it's *just one* datatype among millions.
>
> It is the type of data that I am entering when I stream in numbers from
> a temperature device or type in my name and address again. If you want
> to think of it as String instead of Document, feel free. The average
> person might not feel as in touch with Strings as with Documents,
> however.

By limiting yourself to how Joe Sixpack thinks about these things today, you may gain some usability benefit in applications aimed at the lowest common denominator, but you're not going to discover anything profound that way. And, If Everyone Did It, (as Mom would say) the forward progress of mankind would halt.

So I'm not much interested in how the average person might think about their data.

> > Limiting ourselves
> > to looking at only a single datatype doesn't put us in a good
> > position to think about datatypes in general.
>
> It is not a limit, but the next level down from "Type" or "Object" as
> the type from which all types flow. So, Document is-a Type and Mime
> is-a Type. Date is a Document, HTML is-a Document, Integer is-a
> Document (replace Document with String and it will sound better to
> you).

Integer is most certainly not a document.

Question: if you add two documents together, what is the result? Is addition of documents commutative?

> It is only when you look at the interface between the software
> and the machine that you would want to think of a number as not being a
> type of string.

As a mathematician who has never touched a computer (well, they used to exist) if a number is a kind of Engish text.

> If you look at the input to and the output from the
> database, you will see it is all strings of data.

You are confusing things again. I could just as well say it's all pixels, because we use graphic displays these days. Would you agree that the job of a dbms is to manage pixels? When you click the left mouse button, which character is that?

> Mime types are also
> strings, but they differ in that they do not build up to documents, but
> to songs or videos or object code or ...

I don't see how you can consider a video and a text ocument to be the same thing.

> I disagree with your statement that SQL handles a particular type of
> tree well because there are trees that match your def of that type that
> SQL does not handle well. They could be reshaped into trees that SQL
> does handle, but I did not think that was your point.
>
> > I'll try rephrasing: If you have data that conceptually matches
> > what I'm calling "static heterogeneous", (specifically, the data
> > hierarchy is fixed, as in Customer/Account/Invoice/InvoiceLineItem)
> > then you will be able to model, query, update, and constrain this
> > data very well in SQL.
>
> I suspect that I disagree, but I might not be understanding what your
> tree looks like.
>
> For example, if InvoiceLineItem has an attribute "discount" which has a
> value that is a list, such as

Ah, now I think I see what you're getting at.

First of all, I completely agree that SQL handles lists poorly. Ordered data, when the order is not derived from some ordering function on the data but is implicit, is a weak point for SQL.

However, I was attempting to isolate the question of tree structure; list data, whether embedded in a tree of whatever kind or just on its own, is a real issue, but one that I see as being orthogonal to the tree-structure taxonomy I'm working on.

And I'm sure we can both agree that SQL doesn't do nested relations or nested anything, really. I am fully signed up to the value of nested structure, and to the value of lists. So you don't have to try to convince me on either point. :-)

> a) Does this meet your criteria for what type of tree it is?
> b) Does your tree look similar?
> c) Do you agree that SQL doesn't query this tree very well as it gets
> hung up on having two values for the discount attribute of the
> InvoiceLineItem at lineNbr=3?

Yes, yes, and yes.

The part that SQL handles well, though, is queries across the node types (or levels), which is the part that is important (at least to me) in classifying these kinds of trees. SQL has a hard time with list data or multivalue data, whether it's in a tree or not, so again I consider it an orthogonal issue.

> It might simply be that I'm not catching on to your terms here and if
> so, I apologize. If I am understanding, then I can agree that you can
> say that the trees that SQL handles well are static heterogeneous, but
> you cannot say that SQL handles all such trees well.

I think we're in agreement at this point.

> > The fact that so many progammers favor iteration over recursion
> > is something I consider odd, given that iteration is strictly
> > less powerful than recursion.
>
> Yes, but if you can iterate instead of using recursion, then you don't
> set yourself up for a stack overflow, for example.

Tail call optimization can make most or all of this issue go away. (This raises a question for me, which is, is it the case that TCO can convert any iterative construct into a recursive one that uses only constant stack space? I think the answer might be yes, because I think I can see how to write a tail-recursive 'while', and I think all iterative constructs are just syntax on while.)

> With recursion, you
> push the stack with each iteration and don't free up that memory until
> you are done with the entire process. If an iterative algorithm is
> possible, then I would only use recursion if that algorithm gives
> improved performance or increased maintainability (at least that is all
> I can think of now).

If programmers put as much effort into recursion as they put into iteration, I assert it would provide increased maintainability.

> > It is possible to transform any
> > iterative algorithm into a recursive one; the reverse is not true.
>
> Yes indeed.
>
> > > How do you think relational operators are implemented? ;-)
> >
> > Alas, this is irrelevant. We build software in layers, and each
> > layer is implemented in whatever paradigm it chooses, but this
> > does not constrain the choice of paradigm of higher layers.
>
> OK. So, some functions can be defined using iteration, but you want to
> be done with the iteration in the underlying layers and not in any
> software you or I would write. As long as it remains an option, should
> I choose to use it instead of playing any games to get around using it,
> then I'm OK with that approach.

Ha ha! I think I agree.

> > > > So the area I'm exploring is, what kinds of operations do we want
> > > > to do on the dynamic tree types, and what smallest bit of power
> > > > to we have to add to the RM to handle that well?
> > >
> > > That is precisely the wrong question ;-) [...]
> >
> > I would be interested to hear what you consider to be the right
> > question.
>
> How about this one --
> How can we have an API to data that permits us to view it in the
> variety of ways that would be helpful? The answer might include
> operators for joining relations as well as for navigating from one row
> of data in one relation to another relation, using a foreign key.

Okay, sure. But trying to ask this question for all possible data structures is too big for me to go after all at once. So right now I'm focusing just on trees: trying to classify them, and trying to figure out what kinds of operations I might want to do with them.

The question in the large is: what are all the different logical data structures, how might we query them, and how might we update them and constrain them? I think mankind will be working on this for some time.

> > You regularly mention graphs. Do you have some demonstration of
> > why you consider them a good choice?
>
> I would love to have some emperical data, but I don't. [...]

Whoops! You answered a question that wasn't quite what I meant to ask. What I meant to ask was, can you show me a simple graph(s) and some simple operations on them. Do you have a framework for saying that some particular set of operations on these graphs is complete, or even just some framework for saying what the possible graph operators are?

I care about empirical data, too, but when I'm *here* I'm thinking about theory.

> > Maybe a description of some
> > minimal set of operators and what you can do with them? I am quite
> > pleased with the fact that the RM has its minimal set of operations
> > and its correspondence with first order logic; can you describe
> > anything comparable?
>
> There are some papers along this line that Jan has pointed me to before
> (so they are on my stolen computer). Perhaps googling for
> functional-data-model or xml-data-model (I know, I know) would yield
> results. I'll check at some point, but let me know if you find
> something.

I would suggest that knowing what the graph operators are is something you ought to have a good grip on if you're going to be advocating for graphs.

> My point is that each metaphor is some partial version of the whole. I
> just listened to a CD of a theologian talking about how Jesus used
> parables and he said something like "a parable is a metaphor and a
> metaphor is a lie". If you know the poem about the blind men and the
> elephant, that makes a similar point to mine. We gain something by
> working with propositions viewed as relations AND as "webs"
> (do you prefer that to "graphs"?)

I prefer "graphs."

> rather than restricting ourselves to one of these.

But one can also *gain* from restrictions, as well. This point is often missed. It's why constraints are valuable, and it's why a minimal formalism is valuable.

Marshall Received on Mon Jul 25 2005 - 05:06:10 CEST

Original text of this message