Re: Object-relational impedence

From: David BL <davidbl_at_iinet.net.au>
Date: Sat, 15 Mar 2008 05:58:23 -0700 (PDT)
Message-ID: <5e42c704-373e-4019-a46f-5517f40b64d2_at_s8g2000prg.googlegroups.com>


On Mar 15, 6:12 pm, "Dmitry A. Kazakov" <mail..._at_dmitry-kazakov.de> wrote:
> On Fri, 14 Mar 2008 18:59:49 -0700 (PDT), David BL wrote:
> > On Mar 15, 12:16 am, "Dmitry A. Kazakov" <mail..._at_dmitry-kazakov.de>
> > wrote:
> >> On Fri, 14 Mar 2008 06:33:45 -0700 (PDT), frebe wrote:
>
> >>> That's why the OO camp has such problems with making a good ORM. If
> >>> SQL would have been low-level, compared to the network model, the task
> >>> would have been much easier.
>
> >> Not necessarily. Certain architectures are difficult to translate into, for
> >> vector processors. It is related to the presumption of computational
> >> equivalence. A difficulty or impossibility to translate can come from
> >> weakness of a given language. SQL is pretty weak.
>
> >> Clearly when SQL is used as a intermediate language for an ORM, then to
> >> have it lower level and more imperative than it is would be an advantage.
>
> >> But I agree that ORM is wasting time. In my why other architectures are
> >> needed (like WAN-wide persistent objects). In short DBMS to be scrapped as
> >> a concept.
>
> > I expect you like the idea of distributed OO,
>
> Well, distributed OO is a different thing to me. It is when an object is
> distributed over a set of nodes, a kind of wave, rather than a particle...

That sounds like fragmented objects.

    http://en.wikipedia.org/wiki/Fragmented_object

> > orthogonal persistence,
> > location transparency and so on.
>
> Issues which might be too big to swallow in one gulp.
>
> > However the literature is hardly
> > compelling. There is the problem of
>
> > - finding a consistent cut (ie that respects the
> > happened-before relation)
>
> > - the contradiction between transactions and orthogonal
> > persistence
>
> > - the contradiction between rolling back a transaction
> > and orthogonal persistence
>
> Complementarity, rather than mere contradiction.

The following argument appears in "Concurrency, the fly in the ointment" by Blackburn and Zigman:

The transactional model implicitly requires the dichotomy of two worlds - an internal one for the persistent data, and an external non persistent world that issues transactions over the first. This follows from the impossibility of an ACID transaction being invoked from within an (atomic) transaction - ie a transaction cannot be the basis for its own nested invocation.

By definition of atomicity of a parent transaction, the durability of any nested (ie child) transaction is subject to the atomicity of the parent transaction. This is in conflict with an independent durability required by a child ACID transaction.

> > - the impossibility of reliable distributed transactions
>
> There is no such thing as unconditionally reliable computing anyway.

Sure, but it is customary to assume infallibility within a process and fallibility between processes.

> > - the fact that synchronous messages over the wire can easily
> > be a million times slower than in-process calls
>
> Huh, don't buy multi-core processors, don't use any memory except registers
> etc. This is not an argument, so long no concrete time constraint put down.
> Below you mentioned HTTP as an example. It is milliard times slower, who
> cares? "Bright" minds use XML as a transport level, and for that matter,
> interpreted SQL...

You missed the point. Fine grained interchanges of messages are useful within a process but are something to be avoided between processes. The penalty is so high that distributed computing systems must account for it in the high level design.

Between processes it is better to stream data asynchronously, such as the way OpenGL drawing commands are piped between client/server without any round trip delay for each drawing command.

> > - the fallibility of distributed synchronous messages which
> > contradicts location transparency
>
> That depends on what is transparent to what. I don't see why
> synchronization should play any role here. I assume you meant something
> like routing to moving targets, then that would apply to both.

By definition, the call of an asynchronous messages (a "post") can return without knowing whether the message was received, whereas a synchronous message must block. That raises the question of what to do when the network fails. This impacts design by contract (in conflict with location transparency).

> > - the enormously difficult problem of distributed locking
> > * how to avoid concurrency bottlenecks
> > * when to release locks in the presence of network or machine
> > failures
> > * distributed deadlock detection.
> > * rolling back a distributed transaction.
>
> This is a sort of mixing lower and higher level synchronization
> abstractions. If you use transactions then locking is an implementation
> detail. Anyway all these problems are ones of concurrent computing in
> general, they are not specific to distributed computing and even less than
> that to OO. You can always consider concurrent remote tasks running local.

The point is that concurrency interacts very badly with orthogonal persistence and location transparency - to the extent that it places serious doubts on whether orthogonal persistence and location transparency are useful concepts in the first place.

> > - how to schema evolve a distributed OO system assuming
> > orthogonal persistence and location transparency.
>
> > - how to manage security when a process exposes many of its
> > objects for direct communication with objects in another
> > process.
>
> On per object basis.

In reality security can only be controlled at the boundary between processes and that conflicts with location transparency. Allowing direct communication between objects opens up security holes everywhere. By contrast, the data centric approach allows the interprocess  message protocol to be simple and implemented entirely within the DBMS layers.

> > Persistent, distributed state machines raise more questions than
> > answers. Persistent distributed encoded values provide a much better
> > basis for building a system.
>
> I am not sure what you mean here. Referential vs. by-value semantics of
> objects, or user-defined vs. inferred behavior of? Clearly you cannot get
> rid of values as well as of references (values of identity).

I'm saying persistent data should be nothing more that persistent encoded values instead of snapshots (ie consistent cuts) of multithreaded or distributed state machines. The former is much simpler than the latter.

> Are you arguing for inference? Do you believe that distributed inference
> would help in any way? No, it will have all the problems you listed plus
> uncountable new others. When the behaviour is defined by the
> programmer/user, that moves the burden from the system to him. This makes
> things a lot easier. You don't need to infer that l = 2 Pi r in the
> steering wheel microcontroller, you can take it for granted, if the
> programmer says so.
>
> So long we will remain more intelligent than our programs, inference will
> always play a subordinate role. Once/if computers will surpass us, we will
> no more program them. Inference is clearly a dead end, a mental laziness.
>
> > SOA suggests that a large system should be decomposed by behaviour (ie
> > "services") which is basically an OO way of thinking.
>
> Well, IMO SOA is a "hype way of thinking," a marketing slogan with no
> substance...

> > It is a flawed
> > approach to the extent that it is promoted as the main way to build
> > enterprise systems.
>
> But it sells good...
>
> > The only proven scalable approach is to remain
> > data-centric at ever increasing scales.
>
> RDBMS sells good as well... (:-))
>
> > The easiest way for distributed applications to communicate is
> > indirectly via shared data rather than by direct communication. This
> > is implicit with a data-centric approach.
>
> Ooch, shared data is the worst possible way. Note how hardware
> architectures have been moving away from shared memory. Sooner or later it
> should hit software design.

Really? Are you suggesting there is a trend away from SMP?

Here is an example of the benefits of indirect communication between applications with a shared data model. We have the following applications

1.   The company timesheet entry system
2.   The company payroll system
3.   The company email system

Consider that the list of employees is managed by a DBMS and is accessible to each of these applications. Whenever any of these applications changes the shared data, all the other applications will reflect the changes.

An alternative is for the applications to avoid shared data and special message protocols are developed to allow them to talk to each other. Do you agree that's not a very good solution?

A third approach is to develop some kind of message oriented service that centralises the information about the employees. However then all the problems arise about distributed OO (such as terrible performance of synchronous messages, causality and consistent cuts etc).

> BTW, data sharing is much in OO way. Many understand OO equivalent to
> referential semantics. It is a wrong perception, but taking it for
> simplicity of argument, functional (value semantics) fits massively
> parallel architectures much better.
>
> But again, it a wrong perception. OO adds user-defined semantics to values.
> Identity is user-defined as well. You can share or exchange, it is
> orthogonal to OO.

OO has little to do with the shared data for an enterprise

> > The WWW is data-centric. It is not at all surprising that Http on
> > port 80 is *much* more common than RPC, CORBA, DCOM, RMI and SOAP put
> > together. Http concerns messages used to access data instead of
> > messages used to elicit behaviour.
>
> That does not wonder me. WWW evolves from bottom to top. Mammals have the
> gills when they undergo the embryo stage of development.
Received on Sat Mar 15 2008 - 13:58:23 CET

Original text of this message