Re: Three Kinds of Logical Trees

From: dawn <dawnwolthuis_at_gmail.com>
Date: 24 Jul 2005 22:49:09 -0700
Message-ID: <1122270549.205126.313760_at_g47g2000cwa.googlegroups.com>


Marshall Spight wrote:
> dawn wrote:
> > Marshall Spight wrote:
> > >
> > > That strikes me as a nonstardard definition of the use of metadata,
> > > but no matter.
> >
> > I used html tags, but these could have been metadata from any
> > structure, so a path could be <Person> then <lastName> then a value. I
> > haven't done a lot of work with XML compared to others, but unless I am
> > misunderstanding something, this would be a common way of shaping
> > metadata & data from an RDBMS into an xml dom tree.
>
> Again, this does not appear to be standard terminology to me.
> As I am used to understanding the term, "metadata" means higher-order
> data, not simply higher level data.

If lastName is an attribute (in RM terminology) of Person, then the names "lastName" and "Person" are both metadata for the Person relation, right?

> It would not properly be used
> to describe data in an enclosing scope.

I think this is just a representation issue. Since you are talking about representing data in a tree model, I refered to an xml representation of the metadata and data rather than a table representation.

> In an xhtml example, for some
> text inside <b> tags, metadata would not be anything about the
> enclosing <p>, but rather the metadata would be the xhtml dtd.
>
> Metadata is schema information, or type information.

Yes, indeed. Surely it includes the names of relations and header information such as column names, right?

> > > > I'm guessing yours are different?
> > >
> > > Well, that column seemed to be mostly about document management
> > > and not data management.
> >
> > but if you knew me, you would know that I don't talk about document
> > management, I talk about data management.
>
> I'm not sure I agree.

I guess I can only state it as my intent. I don't mind chatting about .mp3 data or .rm or .mov or .class data, but I'm particularly interested in data that comes from language modeled for software development purposes. This would be Words, or Sentence Fragments, or Strings, etc, and "Documents" might be too large a concept, but it at least gives the hint that there would be operators that can parse individual data values of this type and not only set operators.

>
> > But at the logical level, we
> > only care about a value such as 4 if we have some context, semantics.
> > We model propositions and retrieve the same. I used to say that I just
> > need "String" and "Binary" as my data types, with other types
> > inheriting from those. When I type in a 13 as a number it is a string
> > of two integers, the same as if I type it in as a string.
>
> Okay, you're headed down a dead end road here. Watch out!

I don't think so, but I'll bring my mace just in case.

> When you type, you are typing characters.

Precisely. And without loss of generalization we can consider my "Document" type to be that which can be typed in on a keyboard and read out loud in a language. So, under Type in my type hierarchy, I have Strings/Documents and Binary/MIME types. All other types are in the type tree below these.

> There is no way to type
> in 13 as a number, not even with ^M. You can only type characters.

Yes, yes -- you can only type
characters/strings/words/sentences/language = Documents.

> When you type '1' and then '3', you have typed two characters.
> You don't have a number yet.

I do have a number, but I am only ttreating as a string (supertype). A number is a string with some additional functions. So, once I realize this string is a number, I can apply numeric functions. But for any data of type String or Document, I can treat it as a String.

>
> > The computer
> > can do what it wants to store it in a smaller number of 1's & 0's, I
> > just want to enter strings and get them back.
>
> I would be surprised if all you really care about is strings.
> If all you have is strings, you can't add, for example.

13 is in the Integer subclass of String/Document, in my type hierarchy.

> The
> fact that some programming languages will implicitly convert
> a string to a number in some contexts and add the resulting
> numbers is simply a distraction; it does not mean that strings
> and numbers are the same thing.

The representation of a number, like the representation of a word, are Strings and that is what we are working with in our software applications. An Integer is-a String.

>
> > I used the term
> > "Document" because I am not interfacing with the devices via voice, so
> > it seems like I wan to enter data from a document and get documents
> > back. I would be OK with the term "Word" or "Sentence Fragment" or
> > even "String" or even "Page" as this type, but I thought Document fit
> > well with how a person thinks. Am I working with a computer related to
> > music files, video files, picture files, document files?
> >
> > > I'm really starting to see them as two
> > > entirely different disciplines,
> >
> > just when they are coming so much closer together.
>
> They aren't, though. It's just that some people with a document
> management background are claiming their tools do everything
> the data management tools do. But their claims are false.

I come from the data side of the house, but respect the fact that if you start marking up propositions in a consistent way, you can have a representation of structured data that moves the document in the direction of a database.

>
> > Maybe it is easier
> > for you to see a comment attribute as a component in a document than
> > another tag and related value, but at least you can think of data
> > processed reports as documents, right?
>
> Not so much. Mostly I think of reports as result sets.

If I ask you to e-mail me one of those result sets, nevermind, I know you can see it as a document if you want to.

> You might
> format them as an html table, pretty them up, and embed it in
> a page, though. But that's mere presentation.

Exactly. It is not the data, but it is how we can perceive the data -- one possible representation and the one that we use as humans both for entering data directly and for getting it back from the computer.

>
> > Can you then make the leap to
> > see the input forms as documents too? And what is the interface -- the
> > input to and the output from, right?
>
> The machine interface is, we type characters on the keyboard, and
> we see pixels on the screen. (We also specify x,y coordinates and
> have a few more buttons, non-character this time, on the mouse.)
>
> Honestly, I consider the html way of talking about the world,
> with input forms, hypertext, etc. as a decidedly inferior model
> of human-computer interaction than the one available 20 years
> ago.

Along with way-cool, hip languages and tools, I also work with database products that date back 40 years (I have a paper from 1965 I want to scan in). Every piece of data is a string, but you can also define the string as a number and apply numeric functions if you want to. I mention xml instead only because more people know that lingo. The model is very similar between the 40-year-old system (definitely about data) as it evolved and the document model from the xml folks. It is as if flared hip-hugger jeans were back in style. And if the document folks want to take the credit for this new hip idea, it's ok by me.

> I am fond of saying that Tim Berners-Lee got one important
> thing right and every other important thing wrong, and in so doing
> set software back 20 years. The two bloodiest casualties have
> been UI and data management.

Codd got some things right, Berners-Lee got some right, and I'm with those who are suggesting we put the chocolate and the peanut butter together.

>
> > > and I believe that most of what's
> > > going on in XML is about document management and not data
> > > management.
> >
> > I tell people who either a) fear xml or b) worship xml that it is
> > simply a minor advancement over comma-quote format. If you buy that,
> > then you should be able to see that xml is all about data.
>
> XML is all about strings.

now we are getting somewhere. Some of those strings are numbers, some are dates, right?

>

<snip>

> I would propose that the solution would support at least relations,
> lists, tuples, and enum types. In fact, that might be a complete set.
>
> Also: having strings as the only

not only!! It is the ANCESTOR of all non-binary, language data.

> datatype is inferior to having
> a variety of different types. The perl/tcl/html approach, where
> everything is a string and you have a bajillion kinds of implicit
> conversions might be fine if your goal is fast-and-loose,

I've never been called that! But maybe we can get bigger bang for the buck s/w development with what you termed fast-and-loose.

> but
> if you want any kind of discipline,

and I definitely do

> which you certainly do if
> you want to manage data, then it's out the door.

My type hierarchy has booleans, ints, etc, but these have an ancestor of String.

>
> > > I admit that text documents are an important data type, but
> > > it's *just one* datatype among millions.
> >
> > It is the type of data that I am entering when I stream in numbers from
> > a temperature device or type in my name and address again. If you want
> > to think of it as String instead of Document, feel free. The average
> > person might not feel as in touch with Strings as with Documents,
> > however.
>
> By limiting yourself to how Joe Sixpack thinks about these things
> today, you may gain some usability benefit in applications aimed
> at the lowest common denominator,

I'm aiming for a data model that works for humans and related software that does likewise.

> but you're not going to discover
> anything profound that way.

Profound is not my goal - I'm looking for ease and low cost in software development and maintenance.

> And, If Everyone Did It, (as Mom would
> say) the forward progress of mankind would halt.

A logical data model is about the interface between humans and the machine, even if those humans are s/w developers. It would be progress to have the computer meet the human closer to where the human lives.

> So I'm not much interested in how the average person might think
> about their data.

I'm not (currently) aiming for a non-professional to prepare a data model or write software here. But the average programmer is an average person. I want to let the computer do the work of shaping data the way a computer needs to see it and make the API between person and representation of a data model more like the human intuitively perceives it. This will (does, from my experience) improve the ability of the s/w developer to write and maintain big-bang-for-the-buck software.

> > > Limiting ourselves
> > > to looking at only a single datatype doesn't put us in a good
> > > position to think about datatypes in general.
> >
> > It is not a limit, but the next level down from "Type" or "Object" as
> > the type from which all types flow. So, Document is-a Type and Mime
> > is-a Type. Date is a Document, HTML is-a Document, Integer is-a
> > Document (replace Document with String and it will sound better to
> > you).
>
> Integer is most certainly not a document.

in the interface between me and the computer, I only pass integers as strings, in document formats, typically with some metadata about the integer visible somewhere. But if you prefer, Integer is-a String.

> Question: if you add two documents together, what is the result?
> Is addition of documents commutative?

Replace "docuemnt" with "string" and "add" with "concatenate". Addition is not a function of a String. It shows up down the type hierarchy for types that are numeric.

> > It is only when you look at the interface between the software
> > and the machine that you would want to think of a number as not being a
> > type of string.
>
> As a mathematician who has never touched a computer (well, they used
> to exist) if a number is a kind of Engish text.

For that mathematician, we could say that the number, such as 1, is something they can hear as a word.

>
> > If you look at the input to and the output from the
> > database, you will see it is all strings of data.
>
> You are confusing things again. I could just as well say it's
> all pixels, because we use graphic displays these days. Would
> you agree that the job of a dbms is to manage pixels?

Nope. That is the representation of my document, words, sentences, strings or language in the computer. I am not using the term document as a single representation.

> When you
> click the left mouse button, which character is that?

We definitely aren't hitting each other right on this one.

>
> > Mime types are also
> > strings, but they differ in that they do not build up to documents, but
> > to songs or videos or object code or ...
>
> I don't see how you can consider a video and a text ocument to be the
> same thing.

I don't, I don't!! That is why in the type hierarchy, I have two types -- binary which in my type hierarchy I'd like those to all be Mime types (otherwise I could call them all Binary) and then String language types, which I'm calling Documents.

>
> > I disagree with your statement that SQL handles a particular type of
> > tree well because there are trees that match your def of that type that
> > SQL does not handle well. They could be reshaped into trees that SQL
> > does handle, but I did not think that was your point.
> >
> > > I'll try rephrasing: If you have data that conceptually matches
> > > what I'm calling "static heterogeneous", (specifically, the data
> > > hierarchy is fixed, as in Customer/Account/Invoice/InvoiceLineItem)
> > > then you will be able to model, query, update, and constrain this
> > > data very well in SQL.
> >
> > I suspect that I disagree, but I might not be understanding what your
> > tree looks like.
> >
> > For example, if InvoiceLineItem has an attribute "discount" which has a
> > value that is a list, such as
>
> Ah, now I think I see what you're getting at.
>
> First of all, I completely agree that SQL handles lists poorly.
> Ordered data, when the order is not derived from some ordering
> function on the data but is implicit, is a weak point for SQL.
>
> However, I was attempting to isolate the question of tree structure;
> list data, whether embedded in a tree of whatever kind or just on
> its own, is a real issue, but one that I see as being orthogonal
> to the tree-structure taxonomy I'm working on.
>
> And I'm sure we can both agree that SQL doesn't do nested relations
> or nested anything, really. I am fully signed up to the value of
> nested structure, and to the value of lists. So you don't have
> to try to convince me on either point. :-)
>
> > a) Does this meet your criteria for what type of tree it is?
> > b) Does your tree look similar?
> > c) Do you agree that SQL doesn't query this tree very well as it gets
> > hung up on having two values for the discount attribute of the
> > InvoiceLineItem at lineNbr=3?
>
> Yes, yes, and yes.
>
> The part that SQL handles well, though, is queries across the node
> types (or levels), which is the part that is important (at least to
> me) in classifying these kinds of trees.

Unless I am misunderstanding, a counter-example to that statement would be a tree where instead of a list type, we have a name for a two-attribute value in our tree. So, in our relation, we have attribute A and attribute B and in our tree, we have names for A, B and also the name C for A and B together (a COBOL FD for a VSAM file just popped into my brain as clear as day, yikes). This tree with C having children of A and B doesn't look like a SQL-happy tree.

> SQL has a hard time with
> list data or multivalue data, whether it's in a tree or not, so
> again I consider it an orthogonal issue.
>
>
> > It might simply be that I'm not catching on to your terms here and if
> > so, I apologize. If I am understanding, then I can agree that you can
> > say that the trees that SQL handles well are static heterogeneous, but
> > you cannot say that SQL handles all such trees well.
>
> I think we're in agreement at this point.
>
>
> > > The fact that so many progammers favor iteration over recursion
> > > is something I consider odd, given that iteration is strictly
> > > less powerful than recursion.
> >
> > Yes, but if you can iterate instead of using recursion, then you don't
> > set yourself up for a stack overflow, for example.
>
> Tail call optimization can make most or all of this issue go away.
> (This raises a question for me, which is, is it the case that TCO
> can convert any iterative construct into a recursive one that uses
> only constant stack space? I think the answer might be yes, because
> I think I can see how to write a tail-recursive 'while', and I think
> all iterative constructs are just syntax on while.)

Over my head on that one -- TCO to me is only total-cost-of-ownership. Are you saying that if I write a recursive method in Java then the compiler has or might have a feature that mitigates this or that the run-time environment would have this feature? Have I unnecessarily dragged along a concern that I could have dropped long ago? Did I miss a memo that everyone else got that said not to worry about recursion eating memory?

>
> > With recursion, you
> > push the stack with each iteration and don't free up that memory until
> > you are done with the entire process. If an iterative algorithm is
> > possible, then I would only use recursion if that algorithm gives
> > improved performance or increased maintainability (at least that is all
> > I can think of now).
>
> If programmers put as much effort into recursion as they put into
> iteration, I assert it would provide increased maintainability.

I'll keep that in mind. If you have any "instead of doing this common iteration, try it as this recursion" examples, pass them along, even if OT.

>
> > > It is possible to transform any
> > > iterative algorithm into a recursive one; the reverse is not true.
> >
> > Yes indeed.
> >
> > > > How do you think relational operators are implemented? ;-)
> > >
> > > Alas, this is irrelevant. We build software in layers, and each
> > > layer is implemented in whatever paradigm it chooses, but this
> > > does not constrain the choice of paradigm of higher layers.
> >
> > OK. So, some functions can be defined using iteration, but you want to
> > be done with the iteration in the underlying layers and not in any
> > software you or I would write. As long as it remains an option, should
> > I choose to use it instead of playing any games to get around using it,
> > then I'm OK with that approach.
>
> Ha ha! I think I agree.
>
> > > > > So the area I'm exploring is, what kinds of operations do we want
> > > > > to do on the dynamic tree types, and what smallest bit of power
> > > > > to we have to add to the RM to handle that well?
> > > >
> > > > That is precisely the wrong question ;-) [...]
> > >
> > > I would be interested to hear what you consider to be the right
> > > question.
> >
> > How about this one --
> > How can we have an API to data that permits us to view it in the
> > variety of ways that would be helpful? The answer might include
> > operators for joining relations as well as for navigating from one row
> > of data in one relation to another relation, using a foreign key.
>
> Okay, sure. But trying to ask this question for all possible data
> structures is too big for me to go after all at once. So right now
> I'm focusing just on trees: trying to classify them, and trying
> to figure out what kinds of operations I might want to do with them.
>
> The question in the large is: what are all the different logical
> data structures, how might we query them, and how might we update
> them and constrain them? I think mankind will be working on this
> for some time.

I think it is done. They named it XQuery.

But seriously, while it isn't perfect by any stretch (I believe I called it "dog ugly" in this forum), XQuery (with the update capabilities too) does give me some hope. Unlike SQL, I think I could build the api I would want to have on top of it.

<snip>
> > > Maybe a description of some
> > > minimal set of operators and what you can do with them? I am quite
> > > pleased with the fact that the RM has its minimal set of operations
> > > and its correspondence with first order logic; can you describe
> > > anything comparable?
> >
> > There are some papers along this line that Jan has pointed me to before
> > (so they are on my stolen computer). Perhaps googling for
> > functional-data-model or xml-data-model (I know, I know) would yield
> > results. I'll check at some point, but let me know if you find
> > something.
>
> I would suggest that knowing what the graph operators are is something
> you ought to have a good grip on if you're going to be advocating for
> graphs.

I'm just a practitioner dabbling in theory (and, worse yet, maybe just a s/w dev manager dabbling in practice) but if I were told I had to enumerate the graph functions I use, I would look up the xpath functions on w3.org and start with those.

>
> > My point is that each metaphor is some partial version of the whole. I
> > just listened to a CD of a theologian talking about how Jesus used
> > parables and he said something like "a parable is a metaphor and a
> > metaphor is a lie". If you know the poem about the blind men and the
> > elephant, that makes a similar point to mine. We gain something by
> > working with propositions viewed as relations AND as "webs"
> > (do you prefer that to "graphs"?)
>
> I prefer "graphs."

of course I knew that ;-)

> > rather than restricting ourselves to one of these.
>
> But one can also *gain* from restrictions, as well. This point
> is often missed. It's why constraints are valuable, and it's
> why a minimal formalism is valuable.

I agree that such are valuable. I do have a big problem with the way we handle constraints, however, as I've mentioned in the past. The minimal formalism is good for theory and for a maintainable implementation under the covers, but not necessarily the best api.

cheers! --dawn Received on Mon Jul 25 2005 - 07:49:09 CEST

Original text of this message