Re: Three Kinds of Logical Trees

From: dawn <dawnwolthuis_at_gmail.com>
Date: 25 Jul 2005 13:22:15 -0700
Message-ID: <1122322935.892964.310930_at_g44g2000cwa.googlegroups.com>


Marshall Spight wrote:
> dawn wrote:
> > Marshall Spight wrote:

<snip>

> > > > But at the logical level, we
> > > > only care about a value such as 4 if we have some context, semantics.
> > > > We model propositions and retrieve the same. I used to say that I just
> > > > need "String" and "Binary" as my data types, with other types
> > > > inheriting from those. When I type in a 13 as a number it is a string
> > > > of two integers, the same as if I type it in as a string.
> > >
> > > Okay, you're headed down a dead end road here. Watch out!
> >
> > I don't think so, but I'll bring my mace just in case.
>
> What, are you hawkgirl now? :-)

We look alike and both carry mace, but otherwise no.

> > > When you type '1' and then '3', you have typed two characters.
> > > You don't have a number yet.
> >
> > I do have a number, but I am only ttreating as a string (supertype).
>
> You're conflating lexical and semantic issues. This is that dead
> end I warned you about, and I fear you've driven into it at 60 mph.

I'll step on the gas then.

Let's take petQty=1 and hairColor=brown
The "1" is no more a number than "brown" is a color. They are morphemes. They are character/string/keyboard representations related to oneness and brownness.

So, my Type hierarchy is different from others in that I recognize up top that what I'm working with are representations -- that's the type of stuff I've got for the computer to work with.

Next level, adding in semantics for more precision I can further refine the types of 1 and brown by designating the 1 as a string that represents an Integer and the brown as string that represents a Color type, both of which can be descendants of the String/Document/Words/Sentences/Language type. They are not strings that represent songs or videos in the computer, afterall, they are the content of documents. Adding in the semantics does not take the string "brown" and turn it into brown, it simply recognizes that beyond the fact that I have a string, it is a string that represents a Color. So, not only can I extract a character from it, I can also find shoes to match. Then if I further identify that this is not just any color, it is a HairColor, I can apply more functions, such as determining what would need to be added to the hair color to turn it strawberry blond.

I'm guessing you think I'm switching levels here between the character string "1" and what it represents, but I am always talking about the character string and functions on that string. I am using semantics for the design and interpretation of the software but the computer never has to comprehend the meaning.

>
> > A number is a string with some additional functions. So, once I realize
> > this string is a number, I can apply numeric functions. But for any
> > data of type String or Document, I can treat it as a String.
>
> It's certainly possible to build a system like this, but I wouldn't
> want to use it. For one thing, it throws static typing out the window.

It changes it, but definitely does not toss it out the window. It becomes more flexible.

>
> > > The
> > > fact that some programming languages will implicitly convert
> > > a string to a number in some contexts and add the resulting
> > > numbers is simply a distraction; it does not mean that strings
> > > and numbers are the same thing.
> >
> > The representation of a number, like the representation of a word, are
> > Strings and that is what we are working with in our software
> > applications. An Integer is-a String.
>
> Again the pushing together of lexical and semantic issues.
> The software that I write, in statically typed languages,
> does not consider integer to be a subtype of string. Your
> source code isn't your program, any more than the word "water"
> is wet.

I think I am more consistent in not pretending that the word "brown" really is a Color nor that "1" really is a number.

>
> > I come from the data side of the house, but respect the fact that if
> > you start marking up propositions in a consistent way, you can have a
> > representation of structured data that moves the document in the
> > direction of a database.
>
> These all-string representations are intended for reading, not for
> processing. Human reading is an important application of code and
> data, but it's certainly not the only one.

I'm thinking of the API between human and computer related to the data model to be the software API for the data as well. Software works with data models all the time and I want it to be easier and more consistent, independent of whether data are to be stored on disk or not. When it comes to computations and processing, the functions for the more specific types can be applied as appropriate. If my "1" is-a Integer, I can add 2 to it to get another representation -- "3"

<snip>
> Except Berners-Lee got so many things wrong, and incompatibly wrong
> with so many right things. In essence, he did one tiny valuable
> thing, which is put a GUI on a slightly updated FTP.

and it spread like wild fire.

>
> > > XML is all about strings.
> >
> > now we are getting somewhere. Some of those strings are numbers, some
> > are dates, right?
>
> No. Lexically, all my source code is a string; semantically, it
> is many different types. A Java source file is one big string,
> but there are int and classes and so forth in there that, when
> you execute the program, are not strings at all. You could make
> a system where they were strings at runtime as well, but such
> a system would lose many valuable properties that Java has,
> such as the ability to do static analysis, both by the human
> and by the computer, for both correctness and performance reasons.
> This price is too high for me; static analysis is one of the
> most powerful tools available.

You don't lose that as completely as you are suggesting.

> > > datatype is inferior to having
> > > a variety of different types. The perl/tcl/html approach, where
> > > everything is a string and you have a bajillion kinds of implicit
> > > conversions might be fine if your goal is fast-and-loose,
> >
> > I've never been called that! But maybe we can get bigger bang for the
> > buck s/w development with what you termed fast-and-loose.
>
> Sure; for prototyping and other small-scale applications. But not
> for data management applications in which the cost of corruption
> is high.
I'm definitely aiming for highly scalable apps and quality data. Make it painful (even if not technically hard) to alter a data name or type when requirements change and you will get bad data and work-arounds. I'll cut the rest of this PA, 'cause this thread is getting long.

> So, how do two strings sort: lexicographically or numerically?

if you are treating values as strings, then lex... and if they are both of the subtype number, then the sorting function there overrides the string sort.

> I guess
> it depends on whether they are also numbers, right? When you sort a
> list
> of strings, some of which are numbers, which compare function do you
> use? Or does it vary depending on which strings you're looking at?

You cannot sort a set of Colors and Integers together unless you bump up in the type hierarchy until you are seeing them both as Strings or something with the same ordering.

> Since int <: string, (<: means "is a subtype of") then I presume that
> int has all the string methods available?

yes

> So I can have a variable
> of type int, with an int value in it (which is also a string) and
> invoke a method to prepend a ~ character to the string, right?

Yes and that would not violate that this was a string. I do realize this introduces back in some of the problems that the DBMS was built to eliminate. Different tools are then needed to do something similar to what the dbms does now. The biggest problems I see with this are in cases where there is a DBMS that is maintained directly through the dbms's api with appliations from different top level owners where there is no ability to have tools that inspect source code. Since I want that source code all persisted with any databases it updates, each database would have all the data and functions it needs to address inconsistencies.

This is not unlike what mountain man was interested in doing, but he was taking everything out of other s/w apps and putting it in the dbms as code, while I'm taking it out of the dbms as code and giving it back to the dbms as data (some of which is code). And, granted, I have a concept but the devil is in the details. Until perfection is reached, I would have a different set of risks and flexibility than with a current sql-dbms.

<snip
> but you're trying to do it by dumbing things down. I don't think
> that's going to work. I think what's needed is to smarten things up.

I smartened them up. The software now knows that I really can put a tilde in front of a string even though it used to be a number and that I just stopped it from being viewed as a number. It was smart enough to accomodate this change to data values without me having to do anything other than code the application differently and address any warnings my tools give me.

> Not complicate them, mind you; make them smart and simple.

precisely.

<snip>
> I'm imagining you and some mathematician about a millenium ago, sitting
> in an ivory tower. The guy comes to you and says, "I'm thinking about
> this idea for a new number, which I call 'zee-row.' It represents the
> absence of a number. You could use it as the result of some operations
> that are currently considered illegal today, like subtracting X from
> X."
> And you'd say, "But that's not how the average person intuitively
> perceives subtraction. Let's not pursue that approach; let's do
> something more user-friendly."

humorous, but not accurate nor to the point IMO. I wrote a paragraph on the flaws in this analogy, but it was boring even if true, so I'll spare you.
>
> > > Integer is most certainly not a document.
> >
> > in the interface between me and the computer, I only pass integers as
> > strings, in document formats, typically with some metadata about the
> > integer visible somewhere. But if you prefer, Integer is-a String.
>
> And once you type the integer in, you can just forget about it?
> No; the human is also in charge of the integer as it moves around
> the computer, across function calls, across the network, into the
> database, etc. And to manage this process effectively, he needs
> a strong suite of tools,

Yes, she does.

> chief among them a type system, static
> analysis, and a way to structure and constrain data.

I don't disagree, just have a more flexible way of doing that IMO.

<snip>
>
> > > > It is only when you look at the interface between the software
> > > > and the machine that you would want to think of a number as not being a
> > > > type of string.
>
> Looking at this paragraph again, this is exactly what I disagree with.

In order for me to agree with it, I have to add to the start "Within the software, ...". The computer (behind the scenes software) might want to persist integers differently in memory or on disk than if they were strings of numeric characters. It can do that behind the scenes. Otherwise it simply needs to know what functions to apply and how to apply them for all subtypes of strings.

Then outside of the computer, the s/w developer needs to know semantics in order to develop the software properly, defining and applying functions appropriate to the types of variables, for example.

Basically, I'm taking the schema and constraints out of the dbms tool and putting it with all of the rest of the code, so it is handled just like other data models used in the code, such as the UI data model. The software applications should be able to execute a function on a model of some data that gets the output to a browser and another that gets the output to a database for storage on disk. It should be able to execute a function on a model of data that pulls in values from a browser or from a web service, xml document, or database.

<snip>

> > Unless I am misunderstanding, a counter-example to that statement would
> > be a tree where instead of a list type, we have a name for a
> > two-attribute value in our tree. So, in our relation, we have
> > attribute A and attribute B and in our tree, we have names for A, B and
> > also the name C for A and B together (a COBOL FD for a VSAM file just
> > popped into my brain as clear as day, yikes). This tree with C having
> > children of A and B doesn't look like a SQL-happy tree.
>
> Okay, I thought I said it pretty well the first time, but I'll try
> again: in thinking about trees, I'm trying *just* to think about
> trees.

Trees with nodes that could have random values, completely independent of anything else? Then how do you get SQL into this picture. You are right, I'm confused.

> Anything that's also a problem when you have exactly one
> level will of course also be a problem when you have a multi-level
> tree. Solve the one-level case and you've solved the multi-level
> case, assuming you handle multi-level data. So I don't consider
> SQL's null problems to be a tree issue, even though those problems
> *also* show up when you're thinking about trees.
>
You don't have to try to get me to understand, but I really am confused about the trees you are looking at and if you give my brain (I swear it used to be a whole lot better) another chance, I will try again. You are saying that all trees that have a certain form are easy for SQL. But without something on those nodes, and only a general shape for the tree, I'm just not getting it.
>
> I also said:
>
> > > SQL has a hard time with
> > > list data or multivalue data, whether it's in a tree or not, so
> > > again I consider it an orthogonal issue.
>
> which puts in fairly well, I think.

Then what does SQL have to do with your tree? What does your tree look like and how can SQL work with it?

>
> > > > > The fact that so many progammers favor iteration over recursion
> > > > > is something I consider odd, given that iteration is strictly
> > > > > less powerful than recursion.
> > > >
> > > > Yes, but if you can iterate instead of using recursion, then you don't
> > > > set yourself up for a stack overflow, for example.
> > >
> > > Tail call optimization can make most or all of this issue go away.
> > > (This raises a question for me, which is, is it the case that TCO
> > > can convert any iterative construct into a recursive one that uses
> > > only constant stack space? I think the answer might be yes, because
> > > I think I can see how to write a tail-recursive 'while', and I think
> > > all iterative constructs are just syntax on while.)
> >
> > Over my head on that one -- TCO to me is only total-cost-of-ownership.
>
> My fault; I went from the term ("tail call optimization") to the
> abbreviation ("TCO") too abruptly.

No, I caught that you were using TCO for tail call optimization -- I just hadn't heard it before.

> > Are you saying that if I write a recursive method in Java then the
> > compiler has or might have a feature that mitigates this or that the
> > run-time environment would have this feature?
>
> The Java compiler probably won't, but it might. The Scheme compiler
> is required to. Since Java is chock-full of iterative constructs,
> it's not much of an issue; no one uses recursion much.
>
> > Have I unnecessarily
> > dragged along a concern that I could have dropped long ago?
>
> Uh, yes.

OK, I can still be taught new tricks.

> > Did I miss
> > a memo that everyone else got that said not to worry about recursion
> > eating memory?
>
> Well, you still have to worry if your language's implementation doesn't
> have TCO.

OK, so I won't do a major shift right now then.

> But it's not a *fundamental* problem. You also have to worry
> if your recursive method isn't tail-recursive, but I'm proposing that
> a recursive translation of an iterative algorithm can be necessarily
> tail recursive.

this is outside of anything I know about

> I'll still have to check on that.

While optimizations are taking place, perhaps it could rewrite the code to show recursion instead of iteration so I don't have to change? :-)

> > > If programmers put as much effort into recursion as they put into
> > > iteration, I assert it would provide increased maintainability.
> >
> > I'll keep that in mind. If you have any "instead of doing this common
> > iteration, try it as this recursion" examples, pass them along, even if
> > OT.
>
> Of course, my background is about 98% iteration. I work for a living
> which means I've had to use C++ or Java for most of the time. (Before
> that it was C and Fortran. :-)
>
> As for cool examples, check out quicksort in Haskell:
> http://www.haskell.org/aboutHaskell.html
>
> Blew my mind the first time I saw it.
Will do.
>
> > > The question in the large is: what are all the different logical
> > > data structures, how might we query them, and how might we update
> > > them and constrain them? I think mankind will be working on this
> > > for some time.
> >
> > I think it is done. They named it XQuery.
>
> Uh, does it have natural join?

I'm guessing you know the answer, eh?
>
> > I'm just a practitioner dabbling in theory (and, worse yet, maybe just
> > a s/w dev manager dabbling in practice) but if I were told I had to
> > enumerate the graph functions I use, I would look up the xpath
> > functions on w3.org and start with those.
>
> I have no reason to do this. I need some tiny glimmer of a reason
> to suspect there's something interesting there before I look. Nothing
> so far.

fair enough.

> > > But one can also *gain* from restrictions, as well. This point
> > > is often missed. It's why constraints are valuable, and it's
> > > why a minimal formalism is valuable.
> >
> > I agree that such are valuable. I do have a big problem with the way
> > we handle constraints, however, as I've mentioned in the past. The
> > minimal formalism is good for theory and for a maintainable
> > implementation under the covers, but not necessarily the best api.
>
> Sure sure; we've had that converastion to death. I believe we entirely
> agree on the analysis of the problem (for once,:-) and I think we
> mostly
> agree on the characteristics of the solution.
>
> Marshall
>
> PS. Good grief we are both long-winded, eh?

Yup, let's just hope no one else is attempting to follow this one. smiles
--dawn Received on Mon Jul 25 2005 - 22:22:15 CEST

Original text of this message