Re: Object-relational impedence

From: Marshall <>
Date: Wed, 12 Mar 2008 09:15:27 -0700 (PDT)
Message-ID: <>

On Mar 12, 12:54 am, S Perryman <> wrote:
> Marshall wrote:
> >
> > It's also
> > part of why the relational model is so interesting. And it's
> > why OOPLs don't hold my interest so much any more;
> > their axioms are complicated and often weak.
> Which "axioms" in particular cause you pain ??

The ones they leave out. The ones that would let me draw stronger conclusions about my code.

Have you read, say, the Java language spec? It's very long and detailed. It's also a prose spec. So it's ambiguous in places, and there aren't any proofs done against it. At the same time, the guarantees you can make with the Java type system are useful, but they don't go nearly far enough.

A goal of mine is to be able to take the "typeful programming" idea to its logical conclusion.

> >>I take a slightly different view in that I obviously have the need
> >>for various collection types, and support for a Relational "calculus"
> >>to use on those collections. I also need support for collections of
> >>ADTs in particular.
> > The collections question is quite interesting: Lists, maps, bags,
> > sets, tables, trees. It is apparent after only a modest amount
> > of study that maps, sets, and tables are thoroughly and beautifully
> > handled with relations.
> Collections should be deliberately vague.

That is the current consensus in much of the PLT world. Consider STL. All kinda collections; all kinda algorithms. Certainly better than nothing! But still a long way from as good as can be done.

The "vague collections" idea is not a good one IMHO. Because you can get more or less everything you want with relations. (Surprise! I like relations! :-)

> Insert and remove. Empty. Contains an item, how many occurrences of
> an item etc.
> A set is a specific form of collection (contains 0 or 1 instance of
> any given element, insertion of a contained element has no effect etc) .

What I consider ideal would be to have an axiomatic algebra that can succinctly express the complete set of such operations, and furthermore exactly specify the algebraic properties of such operations. This enables not only a lot of automatic optimization, but it also enables a lot more automated reasoning. (Actually optimization is just a special case of automated reasoning.)

> > It took me a lot longer but I've reached
> > the conclusion that lists fall in to the same category.
> Sequences are conceptually a partial mapping of (Index,Element) tuples.
> Certain mappings being undefined (hence partial) .

Exactly. A partial mapping. Aka a relation.

> > Trees no longer seem like a single category to me: there
> > are what I call "statically structured" trees, for example
> > Customers/ Invoices/InvoiceLineItems, and "dynamically
> > structured" trees, such as a parse tree.
> A tree is a specific form of graph.
> Graphs are conceptually two sets with an invariant relationship
> between the sets. Trees merely define additional invariants on
> those sets.


I still think my static/dynamic structure distinction is an important one.

> > Statically structured
> > trees are are a particular strong point for SQL and also a
> > strong point for the RM. Dynamically structured trees *can be*
> > handled with the RM but I don't think it does as good a job
> > as is done in, say, FP languages with union types and
> > structural recursion. I conjecture that it may be possible
> > to develop best practices and/or tiny extensions to the
> > RM such that it does as good a job, but currently I'm
> > leaning in the direction of thinking this will not be possible.
> > I'm not completely thrilled with structural recursion, but
> > I have yet to see anything better.
> Is your issue with performance for things like breadth/depth
> traversal etc ??

Your question caused me to consider my feelings for structural recursion, and I didn't really come up with much to say against it. I guess the worst I can say is, I don't see how to get a handle on how to transform them.

You know what 1NF is, the form in which the system requires that attributes of relations have types that are not themselves relations? This turns out to be sufficient for (virtually) everything. However I don't think it's the best design choice. The problem that results when you relax 1NF, though, is a psychological one: everyone wants to start nesting everything deeply. And that's a disaster.

> >>If possible I would like the prog lang user to be able to construct
> >>specific collections such as sets etc, and for the prog lang env to be
> >>within reasonable performance of something that is designed for the
> >>support of some one specific aspect (as Lisp is for lists etc) .
> >>I think that FP is the paradigm that could possibly do this.
> > An important aspect of the performance requirement is physical
> > independence, something that is largely ignored in PL theory,
> > sadly.
> The separation of specification (interface) from implementation is
> something key to ADT theory. And for those prog langs (CLU, Modula-2
> etc) that support one impl of each ADT per program unit, you have
> the independence.
> However, the coming of OO (unintentionally) threw a spanner in the
> works. It became possible to have multiple *different* impls of one ADT
> used *concurrently* in the same program unit.

I don't see that there's anything wrong with that necessarily. This is a version of physical independence, and that's a good thing. The problem comes more with the fact that objects are stateful; their status as first class variables is what is problematic. ("State is hell."
--Ken Arnold.)

And stateful objects aren't even necessarily a problem per se. See for example Peter Van Roy's demonstration of objects as mechanisms for increasing modularity in ways that are unavailable to FP or LP languages. Or consider that closures are isomorphic to objects in languages with (mutable local variables + lexical scope + first class functions.) Some have even made the case that threads are isomorphic to objects.

No, the problem comes up with the psychological factor again. Once you admit stateful objects, people use them freaking everywhere, and having state sprinkled like little grains of sand all through the gears of your program, THAT's a problem.

(Although I believe that message passing (not the Smalltalk sense of the word) may be able to reproduce the modularity advantage Van Roy describes, without the disadvantages of local state.)

> A rough analogy (that AFAIK is not supported in commercial systems)
> would be an SQL database that for some given table operates a B-tree
> for certain records in the table, hashing for others etc. And expects
> the performance for ops on the entire table to be the same whatever
> predicates are applied to the table.

Well, "expectation" is an attribute of people, not of programs. People can be educated.

Performance tuning necessarily involves human input, because performance requirements are not automatically inferable by software. We have operation A and operation B, and it may be that the requirements are that A has to be as fast as possible and if that's at the expense of B, then so be it. Or vice versa. Or the requirements may say to balance them. How can the system know which way to optimize for? It needs to be told the requirements. HOWEVER, once it knows them, obviously the most desirable case is that it does the tuning itself. So what we really want is a system that lets us specify our performance requirements declaratively.

And at this point, Bob can pipe up and point out that the best candidate for a system that will be able to support the sort of features I'm describing is a relational system. (Oops, I guess he doesn't have to now.)

> > If we look at the mathematical universe, we usually encounter
> > an extensional viewpoint. And if something is expressed
> > intensionally, or algorithmically, mathematicians are free to
> > immediately think of its extension, because they do not have
> > to limit themselves to what is computable.
> > On the other hand, if we look at the computable universe, we
> > see more often the intensional viewpoint. And we are not
> > free to immediately shift into extensional mode, because
> > of the possibly-infinite, almost-certainly-prohibitive cost of
> > doing so.
> > The RM gives the best handle on the computably extensional
> > viewpoint I have encountered. FP gives the best handle
> > on the intensional viewpoint.
> I would say that FP has a strong extensional viewpoint too, solely
> because it too is based on mathematical concepts (functions) .

Sure. I think it's just as fair to say that the RM has a strong intensional
viewpoint, too:

  SELECT CustomerId, InvoiceId from Invoices WHERE TotalCost > 100.00

See the lambda in there? It's camouflaged, but it's still there. Hiding
behind a rock; clever thing.

Marshall Received on Wed Mar 12 2008 - 17:15:27 CET

Original text of this message