Objects and Relations

From: David BL <davidbl_at_iinet.net.au>
Date: 29 Jan 2007 11:42:33 -0800
Message-ID: <1170099752.947515.283690_at_l53g2000cwa.googlegroups.com>



Many of the wars between the OO and RM camps end up in side issues, often with unsubstantiated performance or scalability claims or discussions about whether physical independence is good or bad.

AFAIK these discussions ignore something simple and fundamental, which I will describe in this post.

I have previously discussed this on comp.object (with mixed results), and I'm interested in feedback from the relational camp.

Consider a web server. This can be seen at a number of different "levels of abstraction"

  1. Elementary particles
  2. Atoms
  3. Molecules
  4. Electronic components
  5. Digital circuits
  6. Computer
  7. Executing process
  8. Abstract computational machine
  9. Web server

All of these "levels of abstraction" are equally valid. I don't want to get into boring discussions about what's physical versus logical, model versus what is modeled, or real versus abstract. In fact I prefer the Platonic viewpoint, which treats all conceptions on an equal basis. Therefore in this exposition, please take a liberal meaning of the words "real" and "exists". For example, numbers exist. Also nebulous things like "Microsoft".

With this Platonic view I want to define "object" in the context of OO programming, and not be stuck with some nebulous (or meaningless) definition. On the contrary I want to be quite specific! Therefore...

Definition: An *object* is associated with a run time instance of a data structure that resides at a particular address in memory within a particular process running on a particular computer. An object comprises state, behavior and identity. All three are tied to the instance in computer memory.

I know that some OOers will not like the definition because they distinguish between "instance" and "object". The former exists in computer memory and the latter exists in some abstract computational machine. However that distinction doesn't really affect my exposition apart from some changes in terminology.

Also, some OOers may not like the definition because it precludes persistent objects (ie that reside on disk). Again it is possible to change the wording as required to accommodate these alternatives.

We see that an object's identity is associated with the instance in memory. So we should be able to stop the process in the debugger, look at the object in memory and ask whether it really claims to be that instance or is it confused and pretends to be something else. The claim is that objects must be the *real deal*.

I want to be very clear on what I mean by that, so I will use a number of examples to illustrate.

  1. A string object : Yes, the instance really does encapsulate the state and behavior of a string, regarded as an Abstract Data Type
    (ADT).
  2. A FIFO queue : Yes, like a string object.
  3. A GUI button : Yes! The button object is associated with a button that we can really see on the computer monitor. When we call SetText() on the button we indeed find that the text in the button on the screen changes. The button is real in the sense that the user can click on it. It even appears to depress and fires pressed events at button listeners.
  4. A car in a simulation that claims to be based on the one that Jack Brabham drove in 1967 : Yes! When SetAccel() is called on the object the corresponding car on the screen accelerates accordingly.
  5. An object that claims to be the file "c:/config.sys" : Technically this is a lie. Really the object should only claim to be a *proxy* for the file. Note that streaming bytes to the proxy indeed streams bytes into the underlying file.
  6. A car in an OODB that claims to be the one that Jack Brabham drove in 1967. No, that is a lie.
  7. An employee object : No, that is a lie. The object is no such thing. I'm assuming of course that the object doesn't model a human in the manner of a simulation.

It would appear that a class called Employee probably involves a semantic lie, at odds with the fundamental meaning of what state, behavior and identity mean for an object. The identity crisis leads to a number of problems.

Let an *external entity* refer to an entity outside of the computing space of the abstract computational machine. If an object models
(pretends to be) an external entity such as an actual human employee
then we get problems with object identity tests because most generally we should support multiple, independent models of the same human. Imposing a constraint of at most one model per external entity seems both adhoc and limiting.

Another problem is revealed if the object has a Clone() method. Assuming this copies all the state, we would have two objects with different identity yet model the same external entity.

We can look at this from another angle. If we ignore the difference between hardware and software then together these comprise a real device that encapsulates state, behavior and identity. OO can be seen as an approach for decomposing a device into smaller devices that in turn encapsulate state, behavior and identity. This is exactly relevant to the perspective of an abstract computational machine. The devices are entities like circular buffers or scroll bars that are part of the computing space (as distinct from external entities that are outside the computing space).

This can be contrasted to tuples in relations, which only represent information about external entities. There is no object identity semantic with tuples. That makes them a completely different (and incompatible) beast. This flies in the face of the O/R mapping idea.

It is interesting to look through the Design Patterns book by the GoF to see how often their designs are reasonable - ie that objects avoids the identity crisis. There are many examples.

Virtually all are well behaved because they map properly to entities in the computing space of the abstract computational machine. For example Glyph, SpellChecker, GuiFactory, Window, Command, WidgetFactory, RTFReader, Iterator, Stream, Compiler, Parser, Image, SaveDialog, PrintButton, List, ListIterator.

A suspicious example on page 83 has a base class called Room. However this is actually ok because the authors state that it's for a maze game so a Room object may indeed be entitled to be a room in the context of the game.

A more serious example appears on page 170 to demonstrate the Composite design pattern. In this case the base class Equipment appears to break the rules, particularly with methods like Power(), NetPrice() and DiscountPrice(). How can an object in the computing space have a price? Nowhere is it mentioned that this is for a simulation or game.

Another example appears on page 265 where a list is parameterised with a pointer to an Employee.

I find it interesting that amongst so many examples I only found two that break the rules. I'm not surprised that a class like Employee crept into their excellent book given the popularity of this faulty example amongst so many other OO authors.

An OO textbook with an Employee example is a poor choice because simulations or games involving employee objects are rare. By contrast the need to store information about actual human employees is very common. Therefore readers of these examples are led to believe that OO offers a reasonable alternative to the RM approach for storing information about external entities.

Examples like the Employee class feed the confusion over the nature of object identity and help to sustain a whole industry of inappropriate use of OO. For example Model Driven Architecture (MDA) assumes OO should be used for modeling purposes, when it really is about building devices.

So is my "criteria" valid science? Yes because it makes real testable predictions.

  1. OO is good for string, deque, front ends, simulations, games
  2. RM is good for storing information about Employees, Students, University courses, Inventory systems, Invoices.

These predictions are borne out in practice. Received on Mon Jan 29 2007 - 20:42:33 CET

Original text of this message