Re: OO versus RDB

From: Christian Brunschen <cb_at_festis.df.lth.se>
Date: Thu, 29 Jun 2006 10:24:22 +0000 (UTC)
Message-ID: <e809om$8nt$1_at_news.lth.se>

In article <04Cog.3422$pu3.83147_at_ursa-nb00s0.nbnet.nb.ca>, Bob Badour <bbadour_at_pei.sympatico.ca> wrote:
>Christian Brunschen wrote:
>
>> In article <eZxog.3346$pu3.80765_at_ursa-nb00s0.nbnet.nb.ca>,
>> Bob Badour <bbadour_at_pei.sympatico.ca> wrote:
>>
>>>Christian Brunschen wrote:
>>>
>>>>In article <1151501485.755962.108350_at_p79g2000cwp.googlegroups.com>,
>>>>erk <eric.kaun_at_gmail.com> wrote:
>>>>
>>>>>topmind wrote:
>>>>>
>>>>>>[...] (Unlike OO, where encapsulation
>>>>>>encourages each object/class to reinvent its own
>>>>>>add/change/delete/cross-reference/search rules and interfaces so that
>>>>>>they are all different for each project or shop.
>>>>>
>>>>>Agreed. I wouldn't have a problem with inconsistencies if the languages
>>>>>just offered some powerful basic operations. You can't even write
>>>>>something in Java like this, which would be completely type-safe:
>>>>>
>>>>>Set<LineItem> items =
>>>>>theOrder.lineItems.where(item.status==Status.SHIPPED);
>>>>
>>>>Perhaps somewhat interestingly, in a dynamic OO language such as
>>>>Smalltalk, Objective-C or Ruby, you can use the technique of Higher-Order
>>>>Messaging (HOM), as described in the 2005 OOPLSA paper
>>>><www.metaobject.com/papers/Higher_Order_Messaging_OOPSLA_2005.pdf>, to do
>>>>something quite similar; in Objective-C syntax something like
>>>>
>>>> NSSet *lineItems =
>>>> [[[[order lineItems] selectWhere] status] equals:STATUS_SHIPPED];
>>>
>>>You have given an example involving only restriction.
>>
>> If I interpret things correctly, this is what is called 'selection' in
>> some places (suh as <http://en.wikipedia.org/wiki/Generalized_selection>
>> and <http://www.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch4.html>)?
>
>Suggesting a synonym for restrict does not answer my question.

This particular paragrah of my response was not intended to answer your question, but instead to ensure that I had understood the question correctly, in particular because a quick google search for 'relational algebra operations' shows in the top five result only one, the fifth, that actualy refers to this operation as 'restrict'; the others call it 'select'. I just wanted to avoid any possible misunderstandings about the terminology in use, also because I am coming mainly from the direction of being a pragmatic software developer rather than a deep theorist on any particular subject. Please also keep this in mind if my use of terminology is imprecise or inaccurate - that will probably be because I have lots of things to learn.

>The relational algebra also has union, join, project, extend, quantification
>etc.

(not all of which are considered 'fundamental' operations in the relational algebra, according to some of those top google hits.)

>>>Does the method work for project, extend and join as well?
>>
>> Caveat: I am an OO developer, and use relational databases a lot, but I
>> there is a lot of precise terminology with which I am not conversant, and
>> I *will* make mistakes. but they will be just that, mistakes.
>
>[lengthy example using objects, dictionaries, higher-order-methods snipped]

I hope that the example successfully demonstrated that the technique (I prefer using a different word than 'method', which is in the context of object-oriented programming usually used to describe what C++ calls 'member functions') can indeed work for projection, extension and join as well. A comment from you indicating whether you consider my demonstration successful or not on that point would be appreciated.

>> In the case of a join it could be argued whether the
>> result should be a set of NSDictionaries containing just the attributes
>> extracted from each joined pair of objects (i.e., trying to emulate as
>> closely as possible the real relational model), or something like a set of
>> pair of objects, where each such pair would simply identify the objects
>> fromthe original A and B sets that were thus joined together (since we
>> are, after all, working with objects rather than tuples). Of course, we
>> could simply offer both operations and let the programmer decide which is
>> appropriate to use.
>
>With all due respect, you have just broken one of the most fundamental
>properties of the relational algebra/calculus: closure. (Actually, you
>have proposed two alternates that both break closure.) I cannot stress
>enough the importance of closure and nesting. Without that, your
>proposal falls flat.

Before beginning to address the substance of your comment, let me add another 'terminology aside': I am presuming that by 'closure' you are specifically referring to the property of relational algebra that any operation on relation(s) returns another relation, which can then be used immediately as one of the operands for another relational-algebra operation, and so on and so forth; i.e., it enables arbitrary nesting of relational operations (and that it is that 'nesting' you are referring to, as well). If these assumptions of mine are incorrect, obviously my arguments below (which are based in part on these assumptions) will fall.

Given my above understanding of 'closure' and 'nesting', I must say that I disagree that my suggestions 'break closure' - in the context where my suggestion is set, namely in the context of an object-oriented environment, which is *not* the same as the relational model. Where the relational model works with relations, which are fundamentally sets of tuples, the object-oriented environment I've sketched at works with sets of _objects_. Each operation thus fulfils closure if it, too, returns a set of objects. Consider now that a 'pairs of objects' would be straightforwardly represented in an object-oriented context precisely as an object itself, something like:

  _at_interface Pair : NSObject {
    id left;
    id right;
  }
  _at_end
  _at_implementation Pair

left { return left; }
(void) setLeft:l { left = l; }
right { return right; }
(void) setRight:r { right = r; }
valueForKey:(NSString *)key { if ([key hasPrefix:_at_"left_"]) { return [[self left] valueForKey:[key substringFromIndex:5]]; } else if ([key hasPrefix:_at_"right_"]) { return [[self right] valueForKey:[key substringFromIndex:6]]; } else { return [super valueForKey:key]; } } // this is the 'catch-all' method ta is invoked for messages that // have no implementation in this class
(void)forwardInvocation:(NSInvocation *)invocation { NSString *selectorName = NSStringFromSelector([invocation selector]); if ([selectorName hasPrefix:_at_"left_"] || [selectorName hasPrefix:_at_"right_"]) { return [self valueForKey:selectorName]; } } _at_end

Now, the join operation returning 'pairs of objects' a) operates on two sets of objects, and b) returns a set of objects, which can in turn be used as input to other operations (such as projection to extract a subset of the properties available in the 'left' and 'right' objects of the pair). Under the above informal definition of 'closure', this fulfils it, and certainly it allows for nesting of these operations. For example, using the fact that the 'Pair' class overrides its 'valueForKey:' method to permit access to the properties of its 'left' and 'right' objects directly (by prefixing the key with 'left_' and 'right_' respectively), we could do something like

// these are the keys we want to extract from the join result NSArray *interestingKeys =
[NSArray arrayWithObjects:_at_"left_x", "right_y", nil];

// these are the keys we want to use in the extracted result NSArray *renamedKeys =
[NSArray arrayWithObjects:_at_"a_x", @"b_y", nil];

// perform the join, extract the interesting keys and rename, all in one go NSSet *extracted =
[[[[A equijoinWith:B] foo] bar] project:interestingKeys

                                  andRename:renamedKeys];

The 'extracted' set would then contain NSDictionaries, each containing as their 'a_x' property the value of the 'x' property of the object from the 'A' set and as their 'b_y' property the value of the 'y' property of the object from the 'B' set tat were joined together. You could of course also select only those pairs whose 'left_x' property matches a certain value,

NSSet *matching =
[[[[[[A equijoinWith:B] foo] bar] selectWhere] left_x] equals:47];

... again demonstrating that a result in the shape of a set of pairs of objects certainly permits nesting, and to my understanding (which may indeed be limited) also satisfies closure.

But just so that this has been re-stated, this is not 'closure' _exactly_ as in the relational model, but closure in the context of an object-oriented environment where we are working with sets of objects.

One thing that very much differentiates 'my' sketched-at system from the relational model and relational algebra, is that in my system there is no difference between what a set can contain vs what the properties on the set's contents are: everything is objects (or, well, references to objects). And these objects have properties, which can in turn be ojects, and which can be accessed uniformly and traversed even through different depths. But more importantly from an object-orientation viewpoint, these objects don't just hold data, they also associate wth it certain behaviour. This is a reason why an object-riented developer might prefer to get a set of pairs of objects as the result from a join rather than a set of dictionaries just holding the values of the respective properties of the objects: extracting the properties and putting them in a dictionary loses information (and can always, as demonstrated above, be done explicitly if desired).

By the way, I try to use 'properties' rather than 'attributes' deliberately, because I tend to use 'properties' to refer both to simple-valued _attributes_ (i.e., things that have values that are strings, numbers, booleans, and so on) and to object-values _relationships_ (ie, references to one or more related more complex objects). In this context, to-one relationships would usually be represented as a direct reference to the related object, and to-many relationships as a reference to a set of related objects.

Now, using either key-value coding or higher-order messaging, accessing attributes or relationships is all done uniformly. This means that both types of properties, attributes and relationships, can be used for any purpose, including for join conditions, selection criteria and so on.

For instance, if you have a set of objects, and want to select from it only the sbset that are all related to a common other object (say, from a set of employees all those that work in the same department), you could actually perform the join not using the 'departmentId' to compare on, but actually the department object itself:

  Department *someDepartment;
  NSSet *employees;
  NSSet *departmentEmployees =
    [[[employees selectWhere] department] equals:someDepartment];

Of course, this presumes that we haven't already included an 'employees' relationship on the Department object, in which case we could simply have used that. But there may be good reasons to exclude such a relationship and only calculate it through an explicit join when necessary, in which case the above would make an excellent candidate for being a method on the Department objects:

_at_implementation Department

(NSSet *)matchingEmployees:(NSSet *)candidates { return [[[candidates selectWhere] department] equals:self]; }

The fact that to-many relationships are modeled as sets of objects is also quite useful, because it means that we can first traverse a graph of objects to find a fairly small candidate set of objects, and then perform a join or other operation only an that small set of objects rather than the set of all existing objets of that type. Essentially, it allows us to combine the best bits of using an object graph, with a lot of what makes the relational model so usefu.

Note, however, that I am not advocating getting rid of the relational model, or that this somehow 'replaces' or 'subsumes' it. I am fully aware that this is perhaps best described as a 'bastard child' if object-orientation and the relational model; hence my deliberate choice of talking explicitly about 'sets of objects' as being distinct from (though possibly in some senses similar to, because it is heavily inspired by) relations. But it would be a way to get _some_ of the power demonstrated by the relational model into an object-riented environment.

>> I am uncertain what the 'extend' operation is that you refer to; I didn't
>> recall it from my database classes at university, and googling for
>> 'relational algebra extend' doesn't seem to give anything that seems
>> immediately enlightening. Would you care to elucidate?
>
>Extend derives new attributes from existing attributes. Thus given a
>relation including a date_of_birth attribute, one might derive a new
>relation that extends the original relation with an age attribute.

This is a very straightforward thing to do in object-orientation in general. If you consider the 'Point' class I sketched at in my previous post, it could be implemented to have two 'intrinsic' properties, 'x' and 'y', , which would be its cartesian coordinates (aka 'abscissa' and 'ordinate' if you are so inclined); but could also expose the derived properties 'r' and 'theta', its polar coordinates, like so:

  _at_interface Point : NSObject {
    double x;
    double y
  }
  _at_end
  _at_implementation Point
    // the usual 'x' and 'y' accessors omitted for brevity

(double) r { return hypot([self x], [self y]); }
(double) theta { return atan2([self x], [self y]); } _at_end

These would be available for use with higher-order messaging, through key-value coding, and indeed in any other way you want (as they are simply methods that can be invoked through sending the appropriate messages to the Point object(s) in question).

Best wishes,

// Christian Brunschen Received on Thu Jun 29 2006 - 12:24:22 CEST

This message: [ Message body ]
Next message: Bruno Desthuilliers: "Re: Unpredictable programming"
Previous message: Bruno Desthuilliers: "Re: What databases have taught me"
Maybe in reply to: mAsterdam: "Re: OO versus RDB"
In reply to Bob Badour: "Re: OO versus RDB"
Next in thread: J M Davitt: "Re: OO versus RDB"
Reply: J M Davitt: "Re: OO versus RDB"
Reply: Bob Badour: "Re: OO versus RDB"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message