Re: OO versus RDB

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Thu, 29 Jun 2006 16:33:41 GMT
Message-ID: <FhTog.3746$pu3.89586_at_ursa-nb00s0.nbnet.nb.ca>


Christian Brunschen wrote:
> In article <04Cog.3422$pu3.83147_at_ursa-nb00s0.nbnet.nb.ca>,
> Bob Badour <bbadour_at_pei.sympatico.ca> wrote:
>

>>Christian Brunschen wrote:
>>
>>>In article <eZxog.3346$pu3.80765_at_ursa-nb00s0.nbnet.nb.ca>,
>>>Bob Badour  <bbadour_at_pei.sympatico.ca> wrote:
>>>
>>>>Christian Brunschen wrote:
>>>>
>>>>>In article <1151501485.755962.108350_at_p79g2000cwp.googlegroups.com>,
>>>>>erk <eric.kaun_at_gmail.com> wrote:
>>>>>
>>>>>>topmind wrote:
>>>>>>
>>>>>>>[...] (Unlike OO, where encapsulation
>>>>>>>encourages each object/class to reinvent its own
>>>>>>>add/change/delete/cross-reference/search rules and interfaces so that
>>>>>>>they are all different for each project or shop.
>>>>>>
>>>>>>Agreed. I wouldn't have a problem with inconsistencies if the languages
>>>>>>just offered some powerful basic operations. You can't even write
>>>>>>something in Java like this, which would be completely type-safe:
>>>>>>
>>>>>>Set<LineItem> items =
>>>>>>theOrder.lineItems.where(item.status==Status.SHIPPED);
>>>>>
>>>>>Perhaps somewhat interestingly, in a dynamic OO language such as 
>>>>>Smalltalk, Objective-C or Ruby, you can use the technique of Higher-Order 
>>>>>Messaging (HOM), as described in the 2005 OOPLSA paper 
>>>>><www.metaobject.com/papers/Higher_Order_Messaging_OOPSLA_2005.pdf>, to do 
>>>>>something quite similar; in Objective-C syntax something like
>>>>>
>>>>> NSSet *lineItems =
>>>>>   [[[[order lineItems] selectWhere] status] equals:STATUS_SHIPPED];
>>>>
>>>>You have given an example involving only restriction. 
>>>
>>>If I interpret things correctly, this is what is called 'selection' in 
>>>some places (suh as <http://en.wikipedia.org/wiki/Generalized_selection> 
>>>and <http://www.cse.ohio-state.edu/~gurari/course/cse670/cse670Ch4.html>)?
>>
>>Suggesting a synonym for restrict does not answer my question. 

>
> This particular paragrah of my response was not intended to answer your
> question, but instead to ensure that I had understood the question
> correctly, in particular because a quick google search for 'relational
> algebra operations' shows in the top five result only one, the fifth, that
> actualy refers to this operation as 'restrict'; the others call it
> 'select'. I just wanted to avoid any possible misunderstandings about the
> terminology in use,

Fair enough.

  also because I am coming mainly from the direction of
> being a pragmatic software developer rather than a deep theorist on any
> particular subject.

With all due respect, software development is applied mathematics. One cannot be pragmatic about it while not knowing the underlying theory.

Similarly, pragmatic electrical engineers know Ohm's Law, Maxwell's Equations and Stoke's Theorem. I cannot imagine an electrical engineer claiming both pragmatism and ignorance of the theory of his field.

  Please also keep this in mind if my use of terminology
> is imprecise or inaccurate - that will probably be because I have lots of
> things to learn.

Fair enough.

>>The relational algebra also has union, join, project, extend, quantification 
>>etc.

>
> (not all of which are considered 'fundamental' operations in the
> relational algebra, according to some of those top google hits.)

That's rather like saying 'and', 'or' and 'not' are not fundamental to boolean logic because one can express them all using only 'nand'. Nevertheless, one would be hard-pressed to create a useful algebra that failed to deliver the above list of operations in one way or another: e.g. one cannot express them using only restrict.

BTW, which of the above operations did you find were not considered 'fundamental'? And do you have a reference?

>>>>Does the method work for project, extend and join as well?
>>>
>>>Caveat: I am an OO developer, and use relational databases a lot, but I 
>>>there is a lot of precise terminology with which I am not conversant, and 
>>>I *will* make mistakes. but they will be just that, mistakes.
>>
>>[lengthy example using objects, dictionaries, higher-order-methods snipped]

>
> I hope that the example successfully demonstrated that the technique (I
> prefer using a different word than 'method', which is in the context of
> object-oriented programming usually used to describe what C++ calls
> 'member functions') can indeed work for projection, extension and join as
> well. A comment from you indicating whether you consider my demonstration
> successful or not on that point would be appreciated.

I found the example informative. I snipped it to shorten the reply and to focus attention on the part where it failed below.

>>>In the case of a join it could be argued whether the
>>>result should be a set of NSDictionaries containing just the attributes 
>>>extracted from each joined pair of objects (i.e., trying to emulate as 
>>>closely as possible the real relational model), or something like a set of 
>>>pair of objects, where each such pair would simply identify the objects 
>>>fromthe original A and B sets that were thus joined together (since we 
>>>are, after all, working with objects rather than tuples). Of course, we 
>>>could simply offer both operations and let the programmer decide which is 
>>>appropriate to use.
>>
>>With all due respect, you have just broken one of the most fundamental 
>>properties of the relational algebra/calculus: closure. (Actually, you 
>>have proposed two alternates that both break closure.) I cannot stress 
>>enough the importance of closure and nesting. Without that, your 
>>proposal falls flat.

>
> Before beginning to address the substance of your comment, let me add
> another 'terminology aside': I am presuming that by 'closure' you are
> specifically referring to the property of relational algebra that any
> operation on relation(s) returns another relation, which can then be used
> immediately as one of the operands for another relational-algebra
> operation, and so on and so forth; i.e., it enables arbitrary nesting of
> relational operations (and that it is that 'nesting' you are referring to,
> as well). If these assumptions of mine are incorrect, obviously my
> arguments below (which are based in part on these assumptions) will fall.
>
> Given my above understanding of 'closure' and 'nesting', I must say that I
> disagree that my suggestions 'break closure' - in the context where my
> suggestion is set, namely in the context of an object-oriented
> environment, which is *not* the same as the relational model.

But you gave it in the context of a cross-posted discussion comparing object-orientation with the relational model. Object-orientation falls flat for the reason I gave.

  Where the
> relational model works with relations, which are fundamentally sets of
> tuples, the object-oriented environment I've sketched at works with sets
> of _objects_. Each operation thus fulfils closure if it, too, returns a
> set of objects. Consider now that a 'pairs of objects' would be
> straightforwardly represented in an object-oriented context precisely as
> an object itself, something like:
>
> _at_interface Pair : NSObject {
> id left;
> id right;
> }
> _at_end
> _at_implementation Pair
> - left { return left; }
> - (void) setLeft:l { left = l; }
> - right { return right; }
> - (void) setRight:r { right = r; }
> - valueForKey:(NSString *)key {
> if ([key hasPrefix:_at_"left_"]) {
> return [[self left] valueForKey:[key substringFromIndex:5]];
> } else if ([key hasPrefix:_at_"right_"]) {
> return [[self right] valueForKey:[key substringFromIndex:6]];
> } else {
> return [super valueForKey:key];
> }
> }
> // this is the 'catch-all' method ta is invoked for messages that
> // have no implementation in this class
> - (void)forwardInvocation:(NSInvocation *)invocation {
> NSString *selectorName = NSStringFromSelector([invocation selector]);
> if ([selectorName hasPrefix:_at_"left_"] ||
> [selectorName hasPrefix:_at_"right_"]) {
> return [self valueForKey:selectorName];
> }
> }
> _at_end
>
> Now, the join operation returning 'pairs of objects' a) operates on two
> sets of objects, and b) returns a set of objects, which can in turn be
> used as input to other operations (such as projection to extract a subset
> of the properties available in the 'left' and 'right' objects of the
> pair). Under the above informal definition of 'closure', this fulfils it,
> and certainly it allows for nesting of these operations.

But the real question is: Does it do so usefully while preserving the properties of the relational operations? The answer to that is no.

For example, join is associative. (A join B) join C = A join (B join C)

What you propose will create two different trees as the final result and will thus break the associativity. For example, the left hand side above will yield B by asking for the right of the left, while the right hand side will not even define a right of the left. The proposal strikes me as doing a lot of work only to break something.

I suggest a language that supports relations and the relational algebra or calculus would be much, much better.

  For example,
> using the fact that the 'Pair' class overrides its 'valueForKey:' method
> to permit access to the properties of its 'left' and 'right' objects
> directly (by prefixing the key with 'left_' and 'right_' respectively), we
> could do something like
>
> // these are the keys we want to extract from the join result
> NSArray *interestingKeys =
> [NSArray arrayWithObjects:_at_"left_x", "right_y", nil];
>
> // these are the keys we want to use in the extracted result
> NSArray *renamedKeys =
> [NSArray arrayWithObjects:_at_"a_x", @"b_y", nil];
>
> // perform the join, extract the interesting keys and rename, all in one go
> NSSet *extracted =
> [[[[A equijoinWith:B] foo] bar] project:interestingKeys
> andRename:renamedKeys];
>
> The 'extracted' set would then contain NSDictionaries, each containing as
> their 'a_x' property the value of the 'x' property of the object from the
> 'A' set and as their 'b_y' property the value of the 'y' property of the
> object from the 'B' set tat were joined together.

Does that not strike you as a rather lengthy kludge for:

(A join B){x,y} rename (x as a, y as b)

? Does it not strike you as a kludge that tries to put closure back together again?

  You could of course also
> select only those pairs whose 'left_x' property matches a certain value,
>
> NSSet *matching =
> [[[[[[A equijoinWith:B] foo] bar] selectWhere] left_x] equals:47];
>
> ... again demonstrating that a result in the shape of a set of pairs of
> objects certainly permits nesting, and to my understanding (which may
> indeed be limited) also satisfies closure.

But it doesn't exhibit the properties of join like associativity.

> But just so that this has been re-stated, this is not 'closure' _exactly_
> as in the relational model, but closure in the context of an
> object-oriented environment where we are working with sets of objects.

Fair enough, but it only shifts the problem ever so slightly before again falling flat.

> One thing that very much differentiates 'my' sketched-at system from the
> relational model and relational algebra, is that in my system there is no
> difference between what a set can contain vs what the properties on the
> set's contents are: everything is objects (or, well, references to
> objects).

With all due respect, I can parse no meaningful statement from the above. Can you define what you mean by 'objects' and 'properties' ?

  And these objects have properties, which can in turn be ojects,
> and which can be accessed uniformly and traversed even through different
> depths.

Let's try to add a little precision. In the relational model, relations are sets of tuples. The tuples are sets of named, typed values.

Types define operations on values whose results are also typed values. One can nest operations to any depth according to the types of their arguments and results.

In what way is the relational model less uniform? What benefit does traversing by location provide that is not provided by nesting operations? How does a property differ from a unary operation?

  But more importantly from an object-orientation viewpoint, these
> objects don't just hold data, they also associate wth it certain
> behaviour.

A type associates operations with values without confusing them with variables.

  This is a reason why an object-riented developer might prefer
> to get a set of pairs of objects as the result from a join rather than a
> set of dictionaries just holding the values of the respective properties
> of the objects: extracting the properties and putting them in a dictionary
> loses information (and can always, as demonstrated above, be done
> explicitly if desired).

There are reasons why more informed developers would prefer to use relations instead: notably relations and relational operations do not lose anything they are not instructed to lose--perhaps the most important thing being symmetry.

> By the way, I try to use 'properties' rather than 'attributes'
> deliberately, because I tend to use 'properties' to refer both to
> simple-valued _attributes_ (i.e., things that have values that are
> strings, numbers, booleans, and so on) and to object-values
> _relationships_ (ie, references to one or more related more complex
> objects). In this context, to-one relationships would usually be
> represented as a direct reference to the related object, and to-many
> relationships as a reference to a set of related objects.

That's nice. Why do you think references are necessary in the first place? They increase structural complexity while breaking symmetry and closure, and yet they offer no particular benefit for semantic expression.

> Now, using either key-value coding or higher-order messaging, accessing
> attributes or relationships is all done uniformly.

You and I apparently use different definitions for 'uniformly'.

  This means that both
> types of properties, attributes and relationships, can be used for any
> purpose, including for join conditions, selection criteria and so on.

If you say so. If your definition of join hadn't broken important properties of the operation, the above might address the problems of having introduced an artificial distinction in the first place; however, I have not verified that. If one uses only values, one doesn't need to do anything else to use values whereever one can use a value. (I see no reason why join needs any conditions, btw.)

> For instance, if you have a set of objects, and want to select from it
> only the sbset that are all related to a common other object (say, from a

[yet another example of restrict snipped]

> The fact that to-many relationships are modeled as sets of objects is also
> quite useful, because it means that we can first traverse a graph of
> objects to find a fairly small candidate set of objects, and then perform
> a join or other operation only an that small set of objects rather than
> the set of all existing objets of that type.

If one uses the relational algebra or calculus, one doesn't need to traverse anything at all. Let the compiler handle efficiency. Compilers do that sort of thing very well. 'Restrict before join' and 'pushing restrict through join' are very obvious optimizations that are almost always the first optimizations provided.

  Essentially, it allows us to
> combine the best bits of using an object graph, with a lot of what makes
> the relational model so usefu.

I have yet to meet any good bits of using a graph for expression, and your assertion regarding what makes the RM so useful is false as evidenced by your join breaking symmetry, closure and associativity.

> Note, however, that I am not advocating getting rid of the relational
> model, or that this somehow 'replaces' or 'subsumes' it. I am fully aware
> that this is perhaps best described as a 'bastard child' if
> object-orientation and the relational model; hence my deliberate choice of
> talking explicitly about 'sets of objects' as being distinct from (though
> possibly in some senses similar to, because it is heavily inspired by)
> relations. But it would be a way to get _some_ of the power demonstrated
> by the relational model into an object-riented environment.

Some, yes. Very little, though, I am afraid.

>>>I am uncertain what the 'extend' operation is that you refer to; I didn't 
>>>recall it from my database classes at university, and googling for 
>>>'relational algebra extend' doesn't seem to give anything that seems 
>>>immediately enlightening. Would you care to elucidate?
>>
>>Extend derives new attributes from existing attributes. Thus given a 
>>relation including a date_of_birth attribute, one might derive a new 
>>relation that extends the original relation with an age attribute.

>
> This is a very straightforward thing to do in object-orientation in
> general. If you consider the 'Point' class I sketched at in my previous
> post, it could be implemented to have two 'intrinsic' properties, 'x' and
> 'y', , which would be its cartesian coordinates (aka 'abscissa' and
> 'ordinate' if you are so inclined); but could also expose the derived
> properties 'r' and 'theta', its polar coordinates, like so:
>
> _at_interface Point : NSObject {
> double x;
> double y
> }
> _at_end
> _at_implementation Point
> // the usual 'x' and 'y' accessors omitted for brevity
> - (double) r { return hypot([self x], [self y]); }
> - (double) theta { return atan2([self x], [self y]); }
> _at_end

Extending a type and extending a relation are very different things. You have only demonstrated a mapping between possible representations. Suppose I have a relation R with two points P1 and P2, I might write:

extend R add Distance(P1,P2) as D12

in the above, R, P1, P2 could all be expressions and the entire expression could be nested with other relational expressions.

If no Distance operation existed, I, as a user, could just as easily write the above as:

extend R add sqr(

	(the_x(P1)-the_x(P2))*(the_x(P1)-the_x(P2))
	+(the_y(P1)-the_y(P2))*(the_y(P1)-the_y(P2))
) as D12

without waiting for someone with authority to extend the type system to create what I need. The corollary is not every user has to have authority to extend the type system.

> These would be available for use with higher-order messaging, through
> key-value coding, and indeed in any other way you want (as they are simply
> methods that can be invoked through sending the appropriate messages to
> the Point object(s) in question).

But, someone must first create them as methods. Received on Thu Jun 29 2006 - 18:33:41 CEST

Original text of this message