Re: Three Kinds of Logical Trees

From: Marshall Spight <marshall.spight_at_gmail.com>
Date: 25 Jul 2005 22:06:50 -0700
Message-ID: <1122354410.541585.314990_at_g14g2000cwa.googlegroups.com>


dawn wrote:
> Marshall Spight wrote:
> > > > Okay, you're headed down a dead end road here. Watch out!
> > >
> > > I don't think so, but I'll bring my mace just in case.
> >
> > What, are you hawkgirl now? :-)
>
> We look alike and both carry mace, but otherwise no.

Ha ha! Hawkgirl is my second favorite member of the JLA.

> > You're conflating lexical and semantic issues. This is that dead
> > end I warned you about, and I fear you've driven into it at 60 mph.
>
> I'll step on the gas then.

I admire your spirit!

> > > A number is a string with some additional functions. So, once I realize
> > > this string is a number, I can apply numeric functions. But for any
> > > data of type String or Document, I can treat it as a String.
> >
> > It's certainly possible to build a system like this, but I wouldn't
> > want to use it. For one thing, it throws static typing out the window.
>
> It changes it, but definitely does not toss it out the window. It
> becomes more flexible.

I don't believe it's possible to have a static type system where you can update variables in such a way that they become of a more general type. Once you do that, the type of the variable has to change at runtime, and if that happens, you do not have a *static* type system by definition.

I can see how to make your idea work with a dynamically typed language, but not with a statically typed one.

> The representation of a number, like the representation of a word, are
> Strings and that is what we are working with in our software
> applications. An Integer is-a String.

I do not agree that what our software works on is simply the string representation of our data. (It is in TCL, and some other system, but not generally.)

> If my "1" is-a
> Integer, I can add 2 to it to get another representation -- "3"

You sure can, but only in a dynamically typed language with implicit coercions. These are certainly workable, but I don't consider them a good choice for data management.

> > No. Lexically, all my source code is a string; semantically, it
> > is many different types. A Java source file is one big string,
> > but there are int and classes and so forth in there that, when
> > you execute the program, are not strings at all. You could make
> > a system where they were strings at runtime as well, but such
> > a system would lose many valuable properties that Java has,
> > such as the ability to do static analysis, both by the human
> > and by the computer, for both correctness and performance reasons.
> > This price is too high for me; static analysis is one of the
> > most powerful tools available.
>
> You don't lose that as completely as you are suggesting.

So you say, but you don't explain how to get around the problem with my below "prepend the tilde" example.

> > So, how do two strings sort: lexicographically or numerically?
>
> if you are treating values as strings, then lex... and if they are both
> of the subtype number, then the sorting function there overrides the
> string sort.

And does this determination happen at runtime or at compile time? If the answer is "at runtime" then you've precluded static typing. Which is not necessarily disastrous, but it's not a choice I'd make.

> You cannot sort a set of Colors and Integers together unless you bump
> up in the type hierarchy until you are seeing them both as Strings or
> something with the same ordering.

But this should be automatic, right? I mean, that's the definition of subtyping: being able to use a more specific type in place of a more general supertype. In this case, since strings can be sorted, and you've stated that everything is a subtype of string, then everything can be sorted. So you necessarily can sort colors and ints together. The question is, what sort function is used?

Or are you doing away with substitution?

> > So I can have a variable
> > of type int, with an int value in it (which is also a string) and
> > invoke a method to prepend a ~ character to the string, right?
>
> Yes and that would not violate that this was a string.

But it *would* violate that this was an int, and so the *variable* would either have to change its type, or not have one in the first place (dynamic typing) or else your system will allow variables to contain values of a different type than the variable is declared as. Or you just don't allow update operators.

Or you just don't have variables. But then it's hard to manage updatable data.

On a related note, I really don't think that if you sat down and, without thinking about representation, wrote down all the operators that apply to string and all the opertors that apply to int, I don't think you'd see much overlap. You certainly don't in most popular languages. Now, it's certainly possible to treat strings as if they were integers via a partial mapping. It's also possible to have a (partial or total) mapping from integers into strings. This doesn't mean they are the same thing, though, any more than a mapping from the even numbers to the odd numbers means that even and odd numbers are the same. You can *represent* every even number with an odd number, you know.

> Since I want
> that source code all persisted with any databases it updates, each
> database would have all the data and functions it needs to address
> inconsistencies.

This seems like it would really increase the coupling, which I don't think you'd want to do. Wouldn't it be better to have the system such that the applications were *less* coupled to the dbms rather than *more* coupled?

> > I'm imagining you and some mathematician about a millenium ago, sitting
> > in an ivory tower. The guy comes to you and says, "I'm thinking about
> > this idea for a new number, which I call 'zee-row.' It represents the
> > absence of a number. You could use it as the result of some operations
> > that are currently considered illegal today, like subtracting X from
> > X."
> > And you'd say, "But that's not how the average person intuitively
> > perceives subtraction. Let's not pursue that approach; let's do
> > something more user-friendly."
>
> humorous, but not accurate nor to the point IMO. I wrote a paragraph
> on the flaws in this analogy, but it was boring even if true, so I'll
> spare you.

Well, I didn't particularly intend it to be humorous, although perhaps I am hilarious just out of habit. Ha ha, I'm funny!

My point was that I don't think it's a good idea to use "how people think about things today" as a hard design constraint, because it precludes any possiblility of coming up with *a better way to think about things*, which is where the *real* progress is made.

> > And once you type the integer in, you can just forget about it?
> > No; the human is also in charge of the integer as it moves around
> > the computer, across function calls, across the network, into the
> > database, etc. And to manage this process effectively, he needs
> > a strong suite of tools,
>
> Yes, she does.

[Let me just state for the record that my singular indefinite "he" is not gender specific. Rather it is a consequence of the lack of a gender inspecific pronoun in the English language, coupled with a wish to avoid the difficulties of speaking of indefinite people in the plural. Void where prohibited. Driver carries no change.]

> > chief among them a type system, static
> > analysis, and a way to structure and constrain data.
>
> I don't disagree, just have a more flexible way of doing that IMO.

I don't think you can do any static analysis with your approach. You can still do structure and constraints, though.

> In order for me to agree with it, I have to add to the start "Within
> the software, ...". The computer (behind the scenes software) might
> want to persist integers differently in memory or on disk than if they
> were strings of numeric characters. It can do that behind the scenes.

It cannot do these things without a static type system...

> Otherwise it simply needs to know what functions to apply and how to
> apply them for all subtypes of strings.

... But this part does not require static typing.

> Then outside of the computer, the s/w developer needs to know semantics
> in order to develop the software properly, defining and applying
> functions appropriate to the types of variables, for example.

This also does not require static typing. (Although I and others claim this is easier for the developer to do when static typing is available-- but that claim is merely anecdotal.)

> Basically, I'm taking the schema and constraints out of the dbms tool
> and putting it with all of the rest of the code, so it is handled just
> like other data models used in the code, such as the UI data model.
> The software applications should be able to execute a function on a
> model of some data that gets the output to a browser and another that
> gets the output to a database for storage on disk. It should be able to
> execute a function on a model of data that pulls in values from a
> browser or from a web service, xml document, or database.

Have you worked much with multi-application databases? Because this seems hard to do in that situation. The UI for a particular program is not shared among different applications; the schema and constraints are.

> > Okay, I thought I said it pretty well the first time, but I'll try
> > again: in thinking about trees, I'm trying *just* to think about
> > trees.
>
> Trees with nodes that could have random values, completely independent
> of anything else? Then how do you get SQL into this picture. You are
> right, I'm confused.

It's not that they are random, nor that they are independent of anything
else. It's simply that I'm trying to *isolate* those properties specific to trees.

Java does not have a way to declare that a reference type must not be null. (Other languages, including Nice, and SQL, do.) Let's say I was looking at tree handling in Java. I can handle dynamic trees nicely in Java. But HA! you point out: Java doesn't have a way to specify that a reference type within that tree must be non-null. So Java doesn't handle trees that well. I say, yes, that *is* a flaw with Java, but it's not a flaw with how Java handles *trees*, it's a flaw with how Java handles reference types. It's not a tree issue at all; it's a reference type issue. If you fix this issue, as has been done in Nice, it doesn't mean you can handle trees any better; it means you can handle reference types better, whether they occur in trees, or in lists, or in objects, or by themselves.

Likewise, SQL's limitations regarding multivalue attributes, or ordered data, are real, but you run into them all over the place; they don't have anything to do with trees *per se.* If you added a generic list type to SQL, it wouldn't make it any better or any worse at handling static heterogeneous

> You don't have to try to get me to understand, but I really am confused
> about the trees you are looking at and if you give my brain (I swear it
> used to be a whole lot better) another chance, I will try again. You
> are saying that all trees that have a certain form are easy for SQL.
> But without something on those nodes, and only a general shape for the
> tree, I'm just not getting it.

I hope the above helps. Again, it's not that the nodes don't have anything in them, it's just that, if I'm thinking specifically about trees, I don't want to consider *at the same time* issues that SQL has whether my data is tree-structured or not.

> > > Did I miss
> > > a memo that everyone else got that said not to worry about recursion
> > > eating memory?
> >
> > Well, you still have to worry if your language's implementation doesn't
> > have TCO.
>
> OK, so I won't do a major shift right now then.

If you're thinking about *language design*, you should probably incorporate this information right away. I believe that it is a better design choice for a language to include recursion along with a "marketing technique" for conveniently expressing iterative algorithms that is implemented in terms of recursion, for reeling in the C++/Java people. I base this on the fact that recursion is strictly more powerful than iteration. Any time I see two language features, and I have to have both of them, and one is a superset of the other, I figure it makes sense to put the larger one in, and define the smaller one in terms of the larger. Roughly speaking.

> > As for cool examples, check out quicksort in Haskell:
> > http://www.haskell.org/aboutHaskell.html
> >
> > Blew my mind the first time I saw it.
> Will do.

I can't recommend looking at lots of different languages enough. If all you've ever encountered is the Algol-family, like I had when I started this whole thing, you've encountered only a very narrow slice of what's possible.

> > > > The question in the large is: what are all the different logical
> > > > data structures, how might we query them, and how might we update
> > > > them and constrain them? I think mankind will be working on this
> > > > for some time.
> > >
> > > I think it is done. They named it XQuery.
> >
> > Uh, does it have natural join?
>
> I'm guessing you know the answer, eh?

I'm guessing it's "no." Having used join a lot and been wildly impressed
by its expressive power, I now consider it a must-have language feature.

I have to say, I don't think the process is done yet.

> > PS. Good grief we are both long-winded, eh?
>
> Yup, let's just hope no one else is attempting to follow this one.

On the one hand, one imagines that there are a thousand lurkers for every poster. On the other hand, this thread feels like just you + me + crickets. I think I hear them chirping now.

Marshall Received on Tue Jul 26 2005 - 07:06:50 CEST

Original text of this message