Re: Clean Object Class Design -- What is it?

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 16 Jul 2001 17:05:09 -0400
Message-ID: <FNI47.59$e85.23425095_at_radon.golden.net>


>> >> As it is in the relational world. Why should any data type expose the
>> >> internal representation of the data to the user? Doesn't this violate
>> >> physical independence?
>> >
>> >Ummm....
>> >
>> >UPDATE Foo SET Bar = "Baz"
>> >
>> >That looks like direct data access to me.
>>
>> First, you have to define what Bar is. If it is simply a character
 string,
>> then someone made a conscious decision not to encapsulate. What is your
>> point? People can do such foolish things in any OO language if they
 choose.
>> No language forces one to encapsulate no matter how much it supports
>> encapsulation.
>
>Not at all. We can assume, for simplicity's sake, that "Baz" represents the
>sequence of three ASCII characters (or 4, if your preferred language
 includes a
>terminating null), but that makes no requirements on Bar, other than that
 it
>can set its value from such a sequence of characters. Or is that too
>object-oriented for you?

Again, what is your point? If Bar is not a character string then the assignment is not direct data access -- it is encapsulated access.

>> >And yes, you can wrap some view
>> >around your tables, but you are still updating columns of rows of tables
>> >*directly*.
>>
>> Yes. And the combination of row identifier, column name and table name
>> identifies an object instance. Your point? Is it illegal to assign to
 object
>> variables in the OO world?
>
>At the very least, it is extremely bad form. Rather, objects are updated
>through methods that may perform transformations and pre/post-condition
>checking, etc.

How does assignment not do those things? I can fully specify all of these things for the assignment operator in C++, for instance. Is assignment not just another method?

In any case, nothing prevents one from calling any other mutator on an object variable in a relational database.

>> >Let me try again. Suppose you have a Point. A view exposes this Point in
>> >Cartesian coordinates, but the underlying representation is in polar
>> >coordinates. In OO, I can constrain that X and Y must be set together,
 but
>> >not independently.
>>
>> I disagree with the constraint. Regardless of the underlying
 representation,
>> I should be able to change one of the cartesion coordinates with
 predictable
>> results -- this is simply the well-defined mathematical operation of
>> translation along one axis or dimension.
>>
>> The value is the same regardless of the representation.
>
>I've been through this before this before in other forums. Translation
 along an
>axis is simply a specialized case of movement where movement along the
 other
>axis is 0. Why complicate the interface (one of your favorite points) when
 a
>single, generalized "move" concept will apply.

Since the object must define x and y properties as well as angle and magnitude properties regardless of representation, why complicate the interface by adding a redundant move method?

>However, I was not describing
>movement (which is a case of applying a delta value to the existing value).
 I
>was describing setting the position of a point.

I would assume that one can assign any point value to a point variable unless the variable has some additional explicit constraint. I would likewise assume one can assign new values to any of the properties regardless of representation.

>> >aPoint->setXY(5,6);
>> >
>> >Updating through a relational view is trouble:
>> >
>> >Update Point SET X = 5, Y = 6
>>
>> You are assuming that the table is the object, which is incorrect. The
>> column is the object:
>>
>> Update Circles Set Center.X = 5 Where Center.Angle = Degrees(45)
>>
>> One could rewrite the where clause to "Where Center.X = Center.Y", but I
>> wanted to demonstrate that the logical interface should be independent of
>> representation.
>
>I don't understand what your point is. You discounted my example without
>offering a counter-example.

If you cannot see the counter example when it is right in front of your face, I do not know what to say. You included the counter example in the excerpt, after all.

>What you provide above is a classic sledgehammer
>approach to database update.

Huh?

>In what kind of application would I want to
>blindly set the radius of all circles to 5 simply because they happen to
 have
>an angle of 45 degrees (never mind the irrationality of a circle having a
>center angle :-) ?

A graphic design application where the artist wants that visual image. Of course, in my example I did not change the radius, I just moved all of the circles that were on the diagonal x=y line horizontally until they were on the vertical x=5 line.

If the center is a point, it has a magnitude and angle (or phase) just as it has cartesian coordinates. At the logical level, it always has all four of these properties irrespective of the underlying representation. I find it irrational to assume the value does not have all of these properties.

Hmmm... I think that making the size of my circles depend on their perpendicular distance to the x=y diagonal would make an interesting visual effect. I want the radius to get exponentially smaller the further from the diagonal....

Update Circles
Set Radius = 5 * EXP(-ABS(

                  Double(Point(Center.Magnitude,Center.Angle -
Degrees(45)).Y)
))
;

Yeah, that's the look I want! ;-)

>> >Can a trigger on X know whether Y is also being updated? Can it prevent
 X
>> >being updated if Y is not also updated?
>>
>> The initial assumptions are false, which renders these questions moot.
 (And
>> I would not want to use a trigger for a constraint, in any case.)
>
>You seem to work hard at not understanding other people's points. If not a
>trigger, then how?

By declaring the constraint once to apply at all times and not just when specific events occur.

>It has been a few years since I've had to use an RDBMS, but
>triggers were the only mechanism I knew about to enforce complex
 constraints
>such as I've postulated.

I doubt that you have ever used an RDBMS. When SQL database vendors choose to push their work onto users by implementing triggers and stored procedures instead of declarative integrity, I see no reason to blame the principles they ignored in doing so.

>Or because you can't accept my toy example of constraining multiple
 attributes
>to be updated together, you refuse to think any further?

If you cannot see how my counter example responds to the issues you raised, you are the one who refuses to think any further.

Since Point is an object class, one can constrain it any way one chooses. You built a straw man when you broke it apart into two independent columns of a view.

Syntax to update the values together:

Update Circles Set Center = Point(5,6) Where ShapeID = 1.

Or, if you prefer a different syntax in your application programming language:

ACircle.Center = Point(5,6)

Or:

APoint = Point(5,6)

Or even:

aPoint->setXY(5,6);

>> >While it may be possible for a programmer to bypass methods of a class
 that
>> >enforce constraints in a ODBMS, you deserve what you get. It is a fact
 that
>> >in the ODBMS with which I am familiar, one programs "closer to the
 metal"
>> >than with RDBMS. This can be good and bad. With power comes risk.
>>
>> I can see many ways in which it is bad. In fact, this "closer to the
 metal"
>> requirement prevents the DBMS from performing some of its most basic
>> functions. I fail to see how this can ever be good.
>
>Then we won't spend any time on the bad.
>
>The power of the ODBMS is that it allows programming in a data
 representation
>"natural" to the programming paradigm. You will argue that this comes at a
 loss
>of physical independence.

Not if one uses a relational database with proper support for domains as the ODBMS.
>The power of the ODBMS is that it allows performance an order of magnitude
>faster than the RDBMS.

Since one can physically store the data indentically in an RDBMS, one can achieve identical performance. No benchmark is required to comprehend this point.

If a particular vendor provides little or no physical independence, blame the vendor's implementation -- not the data model. If a particual vendor provides little or no support for domains, blame the vendor's implementation -- not the data model.

My criticisms of the non-relational ODBMSs are directed at fundamental flaws in their logical data models, at false assumptions in their very conception and at widely held misconceptions. No product will ever overcome such flaws.

No benchmark will ever address the issue of how much effort the DBMS requires of users to adapt the database to new peformance requirements. No benchmark will ever address the cost associated with hiring teams of experts with intimate knowledge of the DBMS's internals, which is required to achieve the performance vendors achieve in benchmark tests.

>The power of the ODBMS is that I don't have to write 20% more code handling
>translation from relational result tables to a form that programmers can
use.

This is the same power a relational ODMBS database with proper support for domains provides. The question becomes: Why use a non-relational ODBMS?

>> >> Again, your initial assumption is false which renders the remainder of
 the
>> >> argument irrelevant.
>> >
>> >Again, your reading is flawed. The only one who needs to know where an
>> >object is clustered is the programmer who writes the inserter.
>>
>> Or the user who queries it or the user who must know what inserter to
 call
>> or ...
>
>Why? Why does a query care about where the data is clustered (in ANY
database)?

The user must know what inserter to call. The non-relational DBMS requires extensions to the language, ie. the equivalent of additional operators at the logical level, to encapsulate the clustering. A relational DBMS separates the issue of clustering completely from the logical operators used.

>What do you mean the user who needs to know which inserter to call?
 Whatever
>inserter he calls enforces whatever constraints have been established on
 that
>object.

What if the user performs an assigment to the list which replaces its entire contents with all of the previous contents and an additional item? The user has not used the insert method but has specified a logically equivalent operation. By tying physical storage to the logical interface, one needlessly forces the user to learn about physical issues.

>> >I don't think
>> >this is different between relational or object. And I don't believe any
>> >"user" of any database routinely (if ever) inserts data using raw SQL
>> >commands.
>>
>> I do it all the time. Most of the people I work with do it all the time.
 Am
>> I not a user of the database? Are not my colleagues?
>
>Depends. I don't consider the folks writing application code for the clerks
 and
>managers as "users".

How do the programmers not use the DBMS? For that matter, DBA's are users, and the DBMS protects the integrity of the data against their data entry mistakes too.

>I consider "users" as non-IT types who are using a system
>to accomplish some business-oriented goal.

I am the programmer of certain applications, and other people are the users of those applications. I did not program the DBMS; I am one of the users of the DBMS.

>> >The relational database requires someone to write SQL to accomplish
>> >clustering.
>>
>> First, you are assuming an SQL database, which is a false assumption.
>
>Fine. What would you prefer?

A relational database.

>> Second, someone tells the database to cluster the data at the physical
 level
>> without changing any aspect of the logical interface. Tomorrow, someone
>> might tell the database to uncluster the data. Nobody has to change the
 way
>> they insert the data.
>
>OK, where's the difference here? If the action of inserting is abstracted
>behind an insertor interface (good OO practice), I can change the effects
>(clustering or not) of this action with no impact to the "users" very
easily.

...only if the user calls the insertor interface. What if the user assigns a whole new value to the collection?

>> >> Do you honestly contend that requiring users to learn and use unique
 methods
>> >> for every operation will aid comprehension for business analysts,
 managers,
>> >> shop supervisors and clerks?
>> >
>> >See above. Since the shop clerk is using a data entry screen, what
>> >difference does it make what happens underneath?
>>
>> See above. Not every user uses a data entry screen, and for those that
 do,
>> some user has to create the data entry screen.
>
>Your point? Your user either has to know which table(s) or view(s) to
 insert
>and the names of the columns OR he constructs an object according to the
 public
>interface. Same learning curve to me.

Once I teach a user about a single database object type, relation, and its operators, I can point the user at the system catalog to learn which other views to manipulate -- using the same object type and operators. If the DBMS exposes multiple object variable types, each with a unique interface, the learning curve gets a little steeper.

Does the user of a non-relational ODBMS use a set or a collection or a hash or a simple object method to query the system catalog?

>> >If you delete a person, the employee will also be deleted automatically.
>>
>> Really? You are forced to use cascaded deletes? Sounds dangerous to me.
>
>Why? It seems that you have two choices to enforce referential integrity:
>either delete the employee along with the person (cascaded delete), or
 allow
>the employee to have a dangling reference to a non-existent person -- oops,
>that's not referential integrity. What would you propose?

You could also inform the user that they must make sure no employees reference the person prior to allowing the delete. The user then has options and decisions to make. The user could delete the employees. The user could refer the employees to a different person. The user could realize the delete was an error and leave everything alone.

Cascaded deletes have their place, but other options are often more appropriate.

>(And the cascaded delete is a design decision of the database designer, not
 a
>requirement of all object databases).

That wasn't clear in your example. If this is the case, the designer still has just as many decisions to make and just as difficult a job in the end.

>> >As mentioned elsewhere, the ODBMs market is immature. The standards body
 is
>> >impotent if not irrelevant. Some vendors use DDL. Some use language
 header
>> >files. There is no common method among or between products.
>> >
>> >So I don't expect anything. I just deal with what I have.
>>
>> It must be easy to argue from a position where anything goes. Would you
>> agree that most of my criticisms would apply to those products that
 support
>> no application independent language? Would you agree that all of my
>> criticisms would apply to any product that does not allow database design
>> independent of application design?
>
>I "argue" from the position of reality.

As do I.

>Can you define what you mean about an "application independent language"?
 If by
>this you mean a language different than the application programming
 language,
>then I'm not sure I'd agree with you. It is always a maintenance problem to
>keep different representations of the same thing in sync. If I can remove
 one
>of those maintenance issues, I remove a major source of errors.

Yes, that is what I mean. A DDL or DML different than the application programming language.

Would you agree that not having a language different than the application programming language limits the use of the DBMS to a single application programming language? Would you agree that this limits the use of the DBMS to highly skilled programmers who comprehend the language? Would you agree that this severely limits logical independence? Would you agree that this encourages physical dependence even if it does not require it?

Your remark about different representations confirms in my mind that you intend no physical independence at all in this situation.

>Finally, I don't know that I've ever seen a database designed independent
 of
>application design -- no, that's not true. The ones I have seen are
 unweidly
>and extremely difficult to use. So do I have to choose between a design
 that is
>useful for one application versus a design that is useful for none?

It does not surprise me that you have encountered incompetent data modellers in your past experience. The industry discourages people from ever really acquiring any data modelling skill. Carl's posts are prime examples of that in action. How many times have you heard (or said) "That's just theory. It's not practical." "?

As long as you (and most practitioners, for that matter) cling to misconceptions and fallacies, I suppose those might be your only two choices. If you learn how to recognize and demand quality from your data modellers, new options open up.

>> For those ODBMs that are not relational databases and that do use DDL,
 what
>> do you see as the distinguishing characteristics that make them superior
 to
>> relational databases?
>
>See above. Performance.

Since relational databases allow any physical layout of the data, they can achieve equivalent performance characteristics to any other logical data model.

I interpret your statement regarding performance to endorse a direct mapping of the logical interface to physical structures without any kind of physical independence. I have already addressed the severe problems this causes.

To answer your previous question: No. No DBMS that lacks any kind of physical independence is ever a superior choice to an RDBMS for any application.

If all you want to do is store the state of your application for future continuation, I would not suggest using a DBMS at all. I might suggest you dump core to a file.

>No impedance mis-match with application programming
>language.

Since relational ODBMSs address this issue, this isn't really an advantage of non-relational ODBMSs.

>> >> If I can assume these things, what are the advantages you think a
>> >> (non-relational) OODB has over a relational database? Apparently, you
>> >> disagree with the other vendor.
>> >
>> >For me, the advantages over relational databases are greater expressive
>> >power and improved performance. I've heard you reject these claims but I
>> >haven't seen any proof to sway me.
>>
>> Since relational databases allow user defined object classes (domains),
 what
>> gives the greater expressive power? In most cases, isn't this just
>> linguistic complexity that pollutes the logical design with physical
 issues?
>
>I don't know.

You claim a greater expressive power, but you do not know how or why?!?

>> Given that performance is determined by the physical design of the data
 and
>> given that relational databases allow any physical design independent of
 the
>> logical design, what improves the performance?
>
>Implementation specific.

Exactly! And the physical implementation is independent of the logical data model. How does that make one logical data model superior to another with respect to performance?

Relational proponents have long ago demonstrated the huge deficiencies of all the other known logical data models. The onus is on those proposing a new data model to demonstrate how those criticisms do not apply to the new data model. The onus is on those proposing a new data model to demonstrate that the new model is as good as the relational model.

So far, you have refused to even describe the new model you espouse let alone demonstrate its worth.

>In one case, the "relations" are pre-computed and
>stored as part of the persistent object.

The relational model allows this at the physical storage level.

>Thus, traversing the relations is
>direct access. And if (as is often the case), the related data is stored
 near
>the original data, the related object is likely already in cache, offering
>memory access performance.

The relational model also allows this physical clustering.

>This is always better than the best (general
>purpose) optimizer can do joining tables.

I have to challenge the truth of the above statement. Since the relational data model allows the exact same physical layout, the optimizer will support the same access paths.

>> >> In a relational database, the user learns a small number of statements
 for
>> >> changing the state of the database. In SQL, those statements would
 include
>> >> INSERT, UPDATE, DELETE. Casual users woud not have authority for
 CREATE,
>> >> ALTER or DROP and would have no need to learn them.
>> >
>> >YMMV. In my experience, "casual" users have no business updating data.
 Any
>> >modification that happens routinely is hidden behind a user interface.
 So
>> >then what?
>>
>> Some user must write the user interface. I often consider that user
 casual
>> too.
>
>Someone writing a user interface to modify the database is not "casual".

Compared to a DBA that person is a casual user. That person does not extend the type system of the DBMS. That person does not need to see the entire database schema. That person does not determine or change the database schema. That person does not determine or change the physical design of the database.

The DBMS presents an appropriate logical view of the data to the application and the programmer just submits data to that logical view.

While not a casual user of the user-interface, the programmer (or the programmer's application) is a casual user of the database.

>Of
>course, in your model, that casual user must translate his application
>programming concepts into SQL statements to accomplish these modifications.

If you assume SQL as the DBMS language. I don't. I would prefer a relational database language that is closer to the algebra. I would prefer a relational database language that prohibits duplicate rows, that provides adequate support for domains, that does not allow NULL for missing information, that properly supports updatable views etc.

Even for SQL databases, the programmer does not necessarily have to do this translation. It's all a question of the middle-ware used, and proper support for domains simplifies the task of writing the middle-ware.

>In
>my model, the user uses the exact same paradigm he uses to modify any other
>object (persistent or not).

Do you mean paradigm or programming language? If you mean paradigm, which definition do you mean? "Example", "Declension" , or "Assumptions, concepts, values and practices" ?

If the last, doesn't the programming language already provide all of those?

Since the application programming language is an independent issue, all you really require is good middle-ware and proper support for domains in the DBMS.
>> >Because the "casual" user is forced to use the public interface of the
>> >object, he is not able to violate the integrity constraints that were
>> >programmed in.
>>
>> This is just restating that the user must know about the integrity
>> constraints beforehand, which is impossible if the integrity constraints
>> change after the fact.
>
>No, it is stating (clearly, I thought) that the user does not need to be
>concerned about such constraints because they are enforced by the methods
 he
>invokes.

Unless the user has the capability to write new methods, in which case the user can easily forget to enforce some of the integrity constraints previously enforced in existing methods. Declaring the constraints in a relational DBMS causes the DBMS to enforce the constraints on all users at all times -- including DBA's and users who have authority to write stored procedures etc.

Even if the user does not know about the integrity constraint, the DBMS enforces it. The user does not need to learn about any custom methods for the DBMS to enforce integrity. The user can learn a few simple commands provided universally in the DML, and the DBMS still enforces integrity.

>> >> The following question is particularly important:
>> >>
>> >> Is a circle an ellipse?
>> >
>> >No it's not. It's tired. Go to any OO reference for a full treatment.
>>
>> But it is an ellipse in the real world. I can use a circle anywhere I can
>> use an ellipse. Even if I cannot model it with inheritence, why should I
>> give up polymorphism?
>
>No you can't. An ellipse can vary its semi-major axis independently of its
>semi-minor. A circle is constrained to have a single radius. You can
 construct
>an ellipse where semi-major = semi-minor from a circle, but they are not
 the
>same thing.

This means that no programmer should ever choose inheritence when a future requierment might constrain any subtype.

The mutators of a circle are different, but the operators on circle are a proper superset of the operators on ellipse.

Does this mean that I have to re-write all of the common operators? Does this mean that I have to rewrite every algorithm that operates on ellipse values if I happen to want to use the algorithm on my circles?

In the real world, I cannot think of any more immutable relationship than a circle is an ellipse. How can any circle cease to be an ellipse?

I still see no logical requirement for inheritence. What design criteria drive the decision to choose inheritence to model specialization/generalization?

>You have not poked any holes in the model, just argued generalities from a
>single case (usually dangerous).

And you have not identified any common properties or methods of vehicles that make your proposed inheritence model sensible.

>> >> A multi-valued dependency is a multi-valued dependency. A join
 dependency
 is
>> >> a join dependency. Each relates to multi-way relationships. Do OODBMS
 not
>> >> support multi-way relationships?
>> >
>> >Sure, they directly model N-ary relationships.
>>
>> Does the OO world have a term other than join dependency to describe the
>> same situation? Or are you snobbishly rejecting relational terms even
 when
>> they describe situations for which the OO world has no term?
>
>Ummm, aggregation?

That's close. It is related but not quite the same. Aggregation is more akin to join than it is to join dependency. An aggregation assumes that the model is already decomposed. Join dependency describes a situation that allows for decomposition into an aggregation. (Incidentally, every relation with degree greater than 1 is an aggregate of sorts.)

The OO term is close enough to allow for discussion, though. How many OO programmers consider update anomalies when considering aggregation? What objective criteria do OO programmers use for choosing aggregation in their designs?

>I just don't know of any commercial products that provide such a feature.
 I've
>never enoutered a "domain" in my RDBMS experience.

I doubt you have ever used an RDBMS. How many did you use that prohibit duplicate rows? That require the DBMS to present all data as values in relations? (Hint: NULL is not a value.)

>> >> >The type of collection is an implementation decision that *may* be
 driven
 by
>> >> >the problem space. However, I don't see how the type of collection
 could
>> >> >make one model unsuitable for another application.
>> >>
>> >> If you absolutely have to have a hash for a performance critical,
 business
>> >> critical application, you don't see how that might affect another
>> >> application that prefers to index into an ordered array?
>> >
>> >I'm confused by your circular reasoning. On the one hand you argue for
>> >physical indepdence, then you construct straw-man applications that
 depend
>> >on a particular physical implementation.
>>
>> The relational model, through physical independence, will allow any
 physical
>> implementation without changing the logical model.
>>
>> The application that wants to think of a relation as an ordered array,
 can.
>> It doesn't have to break when the DBA stores the data in a hash. It will
>> just take longer to run.
>>
>> The non-relational object database exposes these purely physical
>> considerations in the logical interface causing existing code to break
>> completely when the physical requirements change.
>
>An array allows direct access by ordinal position. If the DBA wants to
 change
>the physical implementation for some unspecified reason that make
 performance
>worse (???), that's alright as long as access by ordinal position is still
>supported. Operator overloading in OO languages such as C++ can achieve the
>same abstraction without breaking existing code.

The application is broken until the programmer overloads the operator.

>If access by ordinal position
>is not supported, you haven't just changed physical implementation but the
>logical model.

True. Since the relational DBMS allows explicit ordering and the middleware must present the result as a series of records to languages that do not support set-level representations, access by ordinal position is supported.

>> >Ummm.... what's a CURSOR? Let's assume that the result of your query is
>> >analogous.
>>
>> A cursor is an application programming construct which allows a
 procedural
>> programming language to navigate through a relation as if it were an
 array
>> or stream of records. It is an artifact of the application programming
>> language and not an artifact of the relational model.
>
>OK, but a database that cannot interact with application programming
 languages
>is useless.

My earlier points related to the DBMS's ability to optimize queries and integrity constraints and how additional collection types such as bag, set, array and hash complicate that task. Since relational databases do support cursors to application programming languages, your comment serves no real purpose.

If all the DBMS supports for querying is simple restriction represented by a cursor, it doesn't really support querying in the same sense that a relational DBMS does. I would say that it supports filtering rather than querying. It doesn't really support any kind of optimization, and without that it cannot really support much for integrity enforcement.

As a result, it forces the work and responsibility of integrity enforcement onto users. Am I really expected to believe that this makes things easier or more natural for users?

>> If you are saying that the result of a query is not the same as any of
 the
>> collection types and is a cursor instead, then you are saying that the
 OODB
>> does not have closure. The result of a query cannot be the input to
 another
>> query. One cannot use a query to specify a view or an integrity
 constraint
>> or a subquery etc.
>
>Frankly, I don't know. I know that some ODBMSs support query languages
 (OQL,
>SQL3, whatever), but I tend not to use them in my applications. I don't
 know of
>any ODBMS that implements a view, but that is a table-oriented concept.

Since your only exposure to views is likely through SQL databases, I don't find it surprizing that you think the concept of view is table-related. However, I must suggest that the concept transcends the logical data model.

An HR user will have a different conceptual image of employee and its various relationships than a payroll user or a manager will have. The DBMS should not confuse the HR user with details irrelevant to the HR user's needs. I think these statements apply equally to any kind of database.

Non-relational object databases force the task of providing the equivalent of views onto application programmers working outside of the DBMS. Received on Mon Jul 16 2001 - 23:05:09 CEST

Original text of this message