Re: Three Kinds of Logical Trees

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 25 Jul 2005 08:01:11 -0700
Message-ID: <1122303671.301250.50530_at_g47g2000cwa.googlegroups.com>


dawn wrote:
> Marshall Spight wrote:
> > It would not properly be used
> > to describe data in an enclosing scope.
>
> I think this is just a representation issue. Since you are talking
> about representing data in a tree model, I refered to an xml
> representation of the metadata and data rather than a table
> representation.

It seems to me that what's happening here is that because xml spews attribute names all through its file format, you're considering this as normal. But that pretty much only happens with xml.

> > Metadata is schema information, or type information.
>
> Yes, indeed. Surely it includes the names of relations and header
> information such as column names, right?

Yes.

> > > But at the logical level, we
> > > only care about a value such as 4 if we have some context, semantics.
> > > We model propositions and retrieve the same. I used to say that I just
> > > need "String" and "Binary" as my data types, with other types
> > > inheriting from those. When I type in a 13 as a number it is a string
> > > of two integers, the same as if I type it in as a string.
> >
> > Okay, you're headed down a dead end road here. Watch out!
>
> I don't think so, but I'll bring my mace just in case.

What, are you hawkgirl now? :-)

> > When you type '1' and then '3', you have typed two characters.
> > You don't have a number yet.
>
> I do have a number, but I am only ttreating as a string (supertype).

You're conflating lexical and semantic issues. This is that dead end I warned you about, and I fear you've driven into it at 60 mph.

> A number is a string with some additional functions. So, once I realize
> this string is a number, I can apply numeric functions. But for any
> data of type String or Document, I can treat it as a String.

It's certainly possible to build a system like this, but I wouldn't want to use it. For one thing, it throws static typing out the window.

> > The
> > fact that some programming languages will implicitly convert
> > a string to a number in some contexts and add the resulting
> > numbers is simply a distraction; it does not mean that strings
> > and numbers are the same thing.
>
> The representation of a number, like the representation of a word, are
> Strings and that is what we are working with in our software
> applications. An Integer is-a String.

Again the pushing together of lexical and semantic issues. The software that I write, in statically typed languages, does not consider integer to be a subtype of string. Your source code isn't your program, any more than the word "water" is wet.

> I come from the data side of the house, but respect the fact that if
> you start marking up propositions in a consistent way, you can have a
> representation of structured data that moves the document in the
> direction of a database.

These all-string representations are intended for reading, not for processing. Human reading is an important application of code and data, but it's certainly not the only one.

> > I am fond of saying that Tim Berners-Lee got one important
> > thing right and every other important thing wrong, and in so doing
> > set software back 20 years. The two bloodiest casualties have
> > been UI and data management.
>
> Codd got some things right, Berners-Lee got some right, and I'm with
> those who are suggesting we put the chocolate and the peanut butter
> together.

Except Berners-Lee got so many things wrong, and incompatibly wrong with so many right things. In essence, he did one tiny valuable thing, which is put a GUI on a slightly updated FTP.

> > XML is all about strings.
>
> now we are getting somewhere. Some of those strings are numbers, some
> are dates, right?

No. Lexically, all my source code is a string; semantically, it is many different types. A Java source file is one big string, but there are int and classes and so forth in there that, when you execute the program, are not strings at all. You could make a system where they were strings at runtime as well, but such a system would lose many valuable properties that Java has, such as the ability to do static analysis, both by the human and by the computer, for both correctness and performance reasons. This price is too high for me; static analysis is one of the most powerful tools available.

> > datatype is inferior to having
> > a variety of different types. The perl/tcl/html approach, where
> > everything is a string and you have a bajillion kinds of implicit
> > conversions might be fine if your goal is fast-and-loose,
>
> I've never been called that! But maybe we can get bigger bang for the
> buck s/w development with what you termed fast-and-loose.

Sure; for prototyping and other small-scale applications. But not for data management applications in which the cost of corruption is high.

> > but if you want any kind of discipline,
>
> and I definitely do
>
> > which you certainly do if
> > you want to manage data, then it's out the door.
>
> My type hierarchy has booleans, ints, etc, but these have an ancestor
> of String.

So, how do two strings sort: lexicographically or numerically? I guess it depends on whether they are also numbers, right? When you sort a list
of strings, some of which are numbers, which compare function do you use? Or does it vary depending on which strings you're looking at?

Since int <: string, (<: means "is a subtype of") then I presume that int has all the string methods available? So I can have a variable of type int, with an int value in it (which is also a string) and invoke a method to prepend a ~ character to the string, right? Now my variable declared to be int does not contain an int value anymore.

> > By limiting yourself to how Joe Sixpack thinks about these things
> > today, you may gain some usability benefit in applications aimed
> > at the lowest common denominator,
>
> I'm aiming for a data model that works for humans and related software
> that does likewise.
>
> > but you're not going to discover
> > anything profound that way.
>
> Profound is not my goal - I'm looking for ease and low cost in software
> development and maintenance.

Yes, but you're trying to do it by dumbing things down. I don't think that's going to work. I think what's needed is to smarten things up. Not complicate them, mind you; make them smart and simple.

> > And, If Everyone Did It, (as Mom would
> > say) the forward progress of mankind would halt.
>
> A logical data model is about the interface between humans and the
> machine, even if those humans are s/w developers. It would be progress
> to have the computer meet the human closer to where the human lives.
> [...] I want to let the computer do the work of shaping data the way
> a computer needs to see it and make the API between person and
> representation of a data model more like the human intuitively
> perceives it.

I'm imagining you and some mathematician about a millenium ago, sitting in an ivory tower. The guy comes to you and says, "I'm thinking about this idea for a new number, which I call 'zee-row.' It represents the absence of a number. You could use it as the result of some operations that are currently considered illegal today, like subtracting X from X."
And you'd say, "But that's not how the average person intuitively perceives subtraction. Let's not pursue that approach; let's do
something more user-friendly."

> > Integer is most certainly not a document.
>
> in the interface between me and the computer, I only pass integers as
> strings, in document formats, typically with some metadata about the
> integer visible somewhere. But if you prefer, Integer is-a String.

And once you type the integer in, you can just forget about it? No; the human is also in charge of the integer as it moves around the computer, across function calls, across the network, into the database, etc. And to manage this process effectively, he needs a strong suite of tools, chief among them a type system, static analysis, and a way to structure and constrain data.

It's not the case that the only thing that matters to the human is the point of interface between the computer and the human.

> > > It is only when you look at the interface between the software
> > > and the machine that you would want to think of a number as not being a
> > > type of string.

Looking at this paragraph again, this is exactly what I disagree with.

> > > If you look at the input to and the output from the
> > > database, you will see it is all strings of data.
> >
> > You are confusing things again. I could just as well say it's
> > all pixels, because we use graphic displays these days. Would
> > you agree that the job of a dbms is to manage pixels?
>
> Nope. That is the representation of my document, words, sentences,
> strings or language in the computer. I am not using the term document
> as a single representation.

Correct. Pixels are a mere representation of strings. And in *exactly the same way* strings are a mere representation of your integers.

> > > For example, if InvoiceLineItem has an attribute "discount" which has a
> > > value that is a list, such as
> >
> > Ah, now I think I see what you're getting at.
> >
> > First of all, I completely agree that SQL handles lists poorly.
> > Ordered data, when the order is not derived from some ordering
> > function on the data but is implicit, is a weak point for SQL.
> >
> > However, I was attempting to isolate the question of tree structure;
> > list data, whether embedded in a tree of whatever kind or just on
> > its own, is a real issue, but one that I see as being orthogonal
> > to the tree-structure taxonomy I'm working on.
> >
> > And I'm sure we can both agree that SQL doesn't do nested relations
> > or nested anything, really. I am fully signed up to the value of
> > nested structure, and to the value of lists. So you don't have
> > to try to convince me on either point. :-)
> >
> > > a) Does this meet your criteria for what type of tree it is?
> > > b) Does your tree look similar?
> > > c) Do you agree that SQL doesn't query this tree very well as it gets
> > > hung up on having two values for the discount attribute of the
> > > InvoiceLineItem at lineNbr=3?
> >
> > Yes, yes, and yes.
> >
> > The part that SQL handles well, though, is queries across the node
> > types (or levels), which is the part that is important (at least to
> > me) in classifying these kinds of trees.
>
> Unless I am misunderstanding, a counter-example to that statement would
> be a tree where instead of a list type, we have a name for a
> two-attribute value in our tree. So, in our relation, we have
> attribute A and attribute B and in our tree, we have names for A, B and
> also the name C for A and B together (a COBOL FD for a VSAM file just
> popped into my brain as clear as day, yikes). This tree with C having
> children of A and B doesn't look like a SQL-happy tree.

Okay, I thought I said it pretty well the first time, but I'll try again: in thinking about trees, I'm trying *just* to think about trees. Anything that's also a problem when you have exactly one level will of course also be a problem when you have a multi-level tree. Solve the one-level case and you've solved the multi-level case, assuming you handle multi-level data. So I don't consider SQL's null problems to be a tree issue, even though those problems *also* show up when you're thinking about trees.

I also said:

> > SQL has a hard time with
> > list data or multivalue data, whether it's in a tree or not, so
> > again I consider it an orthogonal issue.

which puts in fairly well, I think.

> > > > The fact that so many progammers favor iteration over recursion
> > > > is something I consider odd, given that iteration is strictly
> > > > less powerful than recursion.
> > >
> > > Yes, but if you can iterate instead of using recursion, then you don't
> > > set yourself up for a stack overflow, for example.
> >
> > Tail call optimization can make most or all of this issue go away.
> > (This raises a question for me, which is, is it the case that TCO
> > can convert any iterative construct into a recursive one that uses
> > only constant stack space? I think the answer might be yes, because
> > I think I can see how to write a tail-recursive 'while', and I think
> > all iterative constructs are just syntax on while.)
>
> Over my head on that one -- TCO to me is only total-cost-of-ownership.

My fault; I went from the term ("tail call optimization") to the abbreviation ("TCO") too abruptly.

> Are you saying that if I write a recursive method in Java then the
> compiler has or might have a feature that mitigates this or that the
> run-time environment would have this feature?

The Java compiler probably won't, but it might. The Scheme compiler is required to. Since Java is chock-full of iterative constructs, it's not much of an issue; no one uses recursion much.

> Have I unnecessarily
> dragged along a concern that I could have dropped long ago?

Uh, yes.

> Did I miss
> a memo that everyone else got that said not to worry about recursion
> eating memory?

Well, you still have to worry if your language's implementation doesn't have TCO. But it's not a *fundamental* problem. You also have to worry if your recursive method isn't tail-recursive, but I'm proposing that a recursive translation of an iterative algorithm can be necessarily tail recursive. I'll still have to check on that.

> > If programmers put as much effort into recursion as they put into
> > iteration, I assert it would provide increased maintainability.
>
> I'll keep that in mind. If you have any "instead of doing this common
> iteration, try it as this recursion" examples, pass them along, even if
> OT.

Of course, my background is about 98% iteration. I work for a living which means I've had to use C++ or Java for most of the time. (Before that it was C and Fortran. :-)

As for cool examples, check out quicksort in Haskell: http://www.haskell.org/aboutHaskell.html

Blew my mind the first time I saw it.

> > The question in the large is: what are all the different logical
> > data structures, how might we query them, and how might we update
> > them and constrain them? I think mankind will be working on this
> > for some time.
>
> I think it is done. They named it XQuery.

Uh, does it have natural join?

> I'm just a practitioner dabbling in theory (and, worse yet, maybe just
> a s/w dev manager dabbling in practice) but if I were told I had to
> enumerate the graph functions I use, I would look up the xpath
> functions on w3.org and start with those.

I have no reason to do this. I need some tiny glimmer of a reason to suspect there's something interesting there before I look. Nothing so far.

> > But one can also *gain* from restrictions, as well. This point
> > is often missed. It's why constraints are valuable, and it's
> > why a minimal formalism is valuable.
>
> I agree that such are valuable. I do have a big problem with the way
> we handle constraints, however, as I've mentioned in the past. The
> minimal formalism is good for theory and for a maintainable
> implementation under the covers, but not necessarily the best api.

Sure sure; we've had that converastion to death. I believe we entirely agree on the analysis of the problem (for once,:-) and I think we mostly
agree on the characteristics of the solution.

Marshall

PS. Good grief we are both long-winded, eh? Received on Mon Jul 25 2005 - 17:01:11 CEST

Original text of this message