Re: Mixing OO and DB

From: rpost <rpost_at_pcwin518.campus.tue.nl>
Date: Sat, 08 Mar 2008 13:53:25 +0100
Message-ID: <79aef$47d28c45$839b4533$7035_at_news1.tudelft.nl>


David Cressey wrote:

>Before I move on, I have to give an opinion based on my own data-centric
>world view. If you don't understand the data, then you don't know what
>you're talking about. In short, I completely fail to grasp how one can
>understand a system in terms of "behavior" without understanding the data
>that the behavior affects. This is something that it's going to take me
>years of lurking in comp.objects to grasp.

I don't think so. Can we understand the differences between sets, multisets, ordered lists, queues and stacks just by "understanding the data"? Well, it depends on what you mean by "data", but my guess is that you'll agree that the differences between them are not so much in their data, in what information they store, but on how we can access and update this information; the laws that govern their interaction with the rest of the world. This is what the OO world calls "behaviour". Behaviour is what an object looks like from the outside. Data structure is what it looks like from the inside, its implementation. The behaviour of sets, or multisets, or lists, can be implemented with many different concrete data structures, and conversely, the same concrete data structure can be used in implementing many different abstract data types.

Can we understand the behaviour of sets without having a concrete data structure in mind? Yes, definitely, we can write down set operations and the laws that govern them, e.g.

operations:

  empty: Set<T>
  isempty: Set<T> -> Boolean
  in: T x Set<T> -> Boolean
  singleton: T -> Set<T>
  union: Set<T> x Set<T>
  intersection: Set<T> x Set<T> -> Set<T>

laws:

  isempty(empty) = true

  for all e in T: in(e, empty)
  for all e in T: in(e, singleton(e))
  for all e in T: in(e, singleton(e))

  for all e in T, X,Y in Set<T>:
    in(e, X) or in(e, Y) <=> in(union(X,Y))   for all e in T, X,Y in Set<T>:
    in(e, X) and in(e, Y) <=> in(intersection(X,Y))   (etc.)

It's not easy, and I'm no expert in it, but it can be done, and I don't think you'll call this "understanding the data". What is more, I'll claim that all of the "understanding data" that you claim to be capable of is essentially of this nature: even with a concrete data structure in mind to aid understanding and the implementation, the data structure doesn't really mean anything without specifying the operations that can be performed on it; and that meaning essentially consists of the laws that govern the behaviour of those operations as observable from the outside, and is thereby essentially independent of that concrete data structure.

>And I suspect that, based on the
>experience of people like Marshall Spight, that I'm going to conclude, at
>the end of the day, that behavior is not the holy grail of computing.

No, but it is absolutely crucial. We can't specify any data at all without thinking of how to interpret it; and interpretation means to think of operations and how they are supposed to behave.

In an RDBMS the focus is on data structures with "relational" behaviour, where the operations and their behaviour are fixed in the query language; this is a good fit for many of the data we need to work with in practice, but not for everything.

[...]

>In my original perspective, the single thing that ties together all the
>applications and all the databases that collaborate by sharing data is just
>one thing: data.

You might just as well say: behaviour. The way in which your systems represent and access their data is ultimately subordinate to how the systems are supposed to *behave*, their functionality.

>If you understand the data, and you understand the
>(observable) behavior of each of the applications and each of the databases,
>you can understand the system. Otherwise, you can't understand the system.

The opposite is even more true: no data can be understood without understanding what operations are used to obtain and apply the data. Here, I have some data for you:

 1,129,960,000  March 8, 2008
   303,569,100  March 5, 2008
   231,627,000
   186,315,468  March 1, 2008
   162,652,500  February 29, 2008
   158,665,000
   148,093,000
   141,933,955  March 1, 2008
   127,790,000  December 1, 2007
   106,535,000

Completely useless, unless you understand which interactions with the real world these figures correspond to.

So I think your suggestion that understanding is somehow tied to data, not to behaviour, is flat-out wrong. We do need to understand the operations on our data before we can understand the data. The reason relational database designers want to store data and not operations or laws of behaviour has little to do with where the information is, but it is purely due to the fact that in most RDBMS applications we can "factor out" the behaviour into the fixed set of common, parametrizable operations provided by the relational query language. (In reality, of course, this rarely suffices, because the query language is't powerful enough, and we have to kludge around with transactions and stored procedures to actually get the job done.)

-- 
Reinier Post
Received on Sat Mar 08 2008 - 13:53:25 CET

Original text of this message