Re: Clean Object Class Design -- What is it?

From: Jim Melton <jim.melton_at_lmco.com>
Date: Sat, 21 Jul 2001 23:33:40 GMT
Message-ID: <9i2os5$2jh3_at_cui1.lmms.lmco.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:HqQ07.4100$md2.51811478_at_radon.golden.net...
> >> One would want the physical store independent of the logical
 specification
> >> in any case. This is a good thing.
> >
> >Let's start with a naive question. Why?
>
> One wants the user to see as simple a view of the data as possible. One
 does
> not want to force users to understand inessential data to complete their
> tasks. One prefers to say: Such and such data is here. One prefers not to
> say: Such and such data is here and is duplicated here or can be derived
> there.

So a user here would be someone who wants to query the database?

I think a fundamental disconnect between relation practioners and OO programmers has to do with the whole notion of access to data. In an OO world, *direct* access to data is evil. It is precisely because of the integrity arguments you cite frequently that such is true. It is also *precisely* to hide implementation details from "users". So rather than state "such and such data is here", I would state, "this object supports a method that returns this data".

A relational database is all about access to data, and the accompanying machinery to ensure integrity is maintined. However, if access is already constrained through methods, the machinery to ensure integrity is also maintained.

> For example, if one decides that one needs to cluster invoices with
 purchase
> orders for performance reasons, one does not want to force invoice users
 to
> know this.

Here a user is a database programmer? Such a user probably *does* need to be aware of performance constraints on the database. Again, the OO approach would be to abstract the insertion into a factory method that performs the clustering in the same way a relational DB would.

> >> Unless you normalize the data, how do you even know whether you are
> >> enforcing all of the constraints?
> >
> >One of the basic principles of object-oriented programming is that a
 class
> >is responsible for ensuring that its internal state is coherent.
>
> What about its external integrity?

What is external integrity? Guaranteeing that the "PERSON_ID" field of an employee record refers to an existing person? That is part of the internal state of an object. You insist on referring to ODBMS as network models; at least give them the benefit of a network. The relations are pre-computed and stored. If such a relation is invalid, it is a dangling pointer and the internal state of the object is invalid.

> >This
> >statement does not imply any mechanism of achieving that constraint,
> >particularly in the context of persistent data. However, the answer to
 the
> >above question is that enforcing the constraints is a given in an
> >object-oriented approach.
>
> Actually, I would rephrase that: It is a given that users must enforce
> integrity, because no central or consistent means exists.

And a user here would be?

What do you mean by a central or consistent means? The ODBMS I am most familiar with (Objectivity) absolutely does enforce referential integrity and it is inherent to the DDL.

> >While it may take extra thought and care in the
> >relational world, it is a normal part of doing business in the OO one.
>
> Normal for whom? For every user who writes data accesses?

Data access (read) does not need to "enforce" referential integrity. Data update is the purview of the class designer, not a casual user.

> I would say that integrity is a normal, integral part of doing business in
> the relational world and not at all a normal part of doing business in the
> OO world.

And I would disagree with you.

> >> >These data constraints, however, are
> >> >encapsulated in the code that is used to access the data.
> >>
> >> This is not good. Every application that gets written must then enforce
> >> exactly the same constraints. This is better handled by the database.
> >
> >No one said the code couldn't be part of the database.
>
> What prevents a user from writing an additional method that performs a
> similar task to an existing method and that violates the integrity
> constraint previously enforce in the existing method?

And a user here would be?

Are you suggesting that your sales analysts are writing C++ code to access an object database just so they can by-pass built-in integrity constraints and prove your point?

A programmer who does not program well deserves what he gets. That's no different in any model.

> In the relational world, someone must make a conscious decision to remove
 a
> constraint before anyone can circumvent it because the database will
 enforce
> that constraint for all applications and all users.
>
> It does not matter whether the code is in the database -- it is still
> application code.

No, I don't think so. You draw a sharp distinction between your SQL statements that establish keys and indices and constraints, the stored procedures that trigger on certain actions to enforce referential integrity, and the class methods that and ODBMS programmer would write. Such a distinction does not exist. Integrity doesn't "just happen". Someone has to program it in.

> >Now you are talking
> >about specific implementations. Is this a theory discussion or a practice
> >one?
>
> First, I was talking about a general principle that has implications for
> practice.
>
> Second, the discussion is both. What is the use of one without the other?
> People ignore theory at their peril. If one ignores a fundamental
 principle,
> one can expect adverse effects in practice.
>
> Do you mean that we are not allowed to even mention the adverse practical
> implications when discussing general principles?

No, I mean that the location of object code is implementation dependent. I believe that some of the SmallTalk databases allow code to execute on the server. Another database is not client-server at all, so all code resides locally. I'd like you to avoid sweeping generalities and keep specific.

> >If the latter, then perhaps you should specify which database(s) you
> >are criticizing.
>
> Generally, all of them. If I want to single out a specific product for
> praise, I will.

So you have sufficient experience with every object database to paint them all with the same broad brush? Amazing!

> >> The question is: If you do not know what normalization is, or even when
 to
> >> look for the more esoteric potential update anomalies, how do you know
> >> whether your class design will avoid them?
> >
> >I'm not sure if this is one of the vocabulary snobberies I mentioned at
 the
> >outset or not.
>
> I was responding to a comment someone made that they did not know anything
> about normalization, but that their class designs are always normalized.
> Apparently, doing things that would lead to less than normalized schemas
 are
> "stupid" if I recall correctly.
>
> The premise espoused seemed somewhat contradictory. Do you not agree? Is
 it
> snobbery to question a contradiction?

Perhaps not. However (and I left the original statement in on purpose), what you wrote was not questioning the contradiction, merely the poster's vocabulary.

> >It is my opinion (FWIW) that the major difference between a
> >class diagram and an ER diagram is the capability of abstracting
 complexity
> >through inheritance.
>
> Hmmm. In another thread, we established that the choice of inheritance vs.
> role often has profound logical implications. Switching from inheritance
 to
> role could prove expensive.
>
> Is there really a good reason to make that distinction in the first place?
> What does the added risk buy us?

Inheritance is a powerful tool, used properly. It only describes the "is a" relationships among objects. An automobile "is a" vehicle. It is not a role of vehicle. Automobile is fully substitutable wherever a vehicle may be used. This is basic OO.

Inheritance should only be used to describe immutable relationships. Employment is not immutable. So while one analysis might conclude that an employee "is a" person, a better analysis would decide that employee is just a role a person plays.

If I have systems that understand "vehicle", I can extend my model to include "airplane" and "submarine" as subtypes of vehicle and the existing systems function properly using the new data types. This is powerful and useful.

> >I believe that any well thought object model will
> >already be normalized, because that is the nature of object modelling.
>
> How many OO programmers do you know who would even recognize when to look
> for multi-valued dependencies or join dependencies?

Since you've used inherently relational terms, I'll humor you by saying precious few. However, the cardinality of relationships is a key aspect of object modeling that all OO programmers must understand. As for join-dependencies, I don't even think the concept is transferrable.

> >It is possible to do a thing and not know someone else's name for
> >the thing.
>
> True, to a point. In fact, my whole point in starting this thread was to
> highlight the fact that normalization is as necessary in OODB designs as
 it
> is in relational designs. Do you think it is responsible for a vendor to
> instruct people to ignore the thing no matter what its name?

Nope. I can't necessarily defend the original poster. However, you've gone far afield from that original thought and damned all object databases and those who use them.

> >> Well, I would argue that a class model does not imply a proper schema.
> >> Certainly, one can automaticaly generate a schema from a class model,
 but
 I
> >> do not think the result would be very good. Doing so will tend to bias
 the
> >> database toward a single application at the cost of all other potential
 uses
> >> of the data.
> >
> >You've made this claim several times. I don't get it. A good model is
 what
 I
> >call "physics-based". That is, it centers on the most stable aspects of
 the
> >problem space. NOT the solution space. Just as the laws of physics don't
> >change often, the most stable part of your problem domain is the
 perspective
> >to use in modelling.
>
> Let me ask a naive question: What aspect of the problem space causes one
 to
> choose among a hash, a bag, a set, a collection or an array?

Let's see, a bag is an unordered collection, a set is a collection with no duplicates, collection is a generic term, and an array is an ordered collection. Don't those concepts ever occur in your problem set?

I did intentionally leave out hash, because it specifies a method of computing the index. Let me ask a similarly naive question: How many ways can you order an index in a relational database? Is a binary search the best you can do?

The type of collection is an implementation decision that *may* be driven by the problem space. However, I don't see how the type of collection could make one model unsuitable for another application. So what if a collection is implemented as a hash map? I can still query it. The "user" (a term you seem to use with little precision) doesn't need to concern himself.

> >Sure, I've seen a lot of "single-use" object models.
>
> When one chooses an array over a hash or vice versa, how is one modelling
> for all possible uses and not biasing the database toward a single
> application?

How is one biasing toward a single application?

> >I've also seen
> >"single-use" relational models.
>
> I prefer to keep this as a discussion of the data models themselves and
 not
> focus on individual skill. To this end, I prefer to assume at least a good
> and competent designer for each model.

Then we are finished.

> >That's just a question of quality of
> >modelling, not inherent superiority of one technique over the other.
>
> What does this prove? Are you saying that a single incompetent doctor
> renders all of medicine useless or inferior to incantation?

Are you saying that a single competent witch doctor exists? If so, doesn't that validate his technique?

> >Can you explain why (apart from slipshod, ignorant modellers) an
> >object-oriented approach is necessarily biased to one application?
>
> See above regarding the choice among hash, set, bag, collection and array.
> Each has a different logical interface. A relational database can achieve
> comparable performance characteristics to each with a single logical
> interface, the relation.

No, each has a common logical interface. Each has different performance properties which are well known and documented in computer science. There are obvious reasons to choose one implementation over another. And yes, the model often does dictate this.

Does the relation have different performance models?

> >> In another thread, we explored what was claimed to be an "obvious"
 class
> >> design where "manager" was initially modelled as a sub-type of
 "employee"
> >> and "employee" was initially modelled as a sub-type of "person". We
> >> concluded that, in many applications, it is more appropriate to model
 both
> >> "employee" and "manager" as roles of "person".
> >
> >And the "obvious" class design is wrong. Inheritance is not a solution
 for
> >all the world's ills. There are those in the OO world who argue that
> >inheritance is usually wrong. I'm a practioner, not a theoritist, but I'd
> >imagine that there are some rules that could be codified for evaluating
> >whether inheritance is appropriate or not. In this case, a little
 educated
> >analysis is enough (what about the person who is employed by two
> >companies?).
>
> I understand that Date and Darwen reject the idea of inheritence in the
> database. I believe the grounds are that inheritence is not required for
> polymorphism and that it violates physical independence.

Interesting. I'd like to know how to achieve polymorphism without inheritance.

> Personally, I do not see the need to choose between inheritence and role
 in
> the first place. I see the choice as a risk with no compensating benefit.

You must deal with uninteresting data.

> >> The distinction between sub-type/super-type and object/role, however,
 is
> >> entirely arbitrary. In an enterprise-wide system, you will have
 fundamental
> >> conflicts among those applications that need the relationships modelled
> >> through inheritence and those applications that need the relationships
> >> modelled as roles.
> >
> >Please define an application that needs relationships modelled through
> >inheritance?
>
> I can't think of one, which is why I question the benefit of the
 distinction
> between inheritence and role in the first place. Since you seem to believe
> that inheritance provides some benefit, perhaps you could define such an
> application?

Unfortunately, I am not at liberty to do so.

> >Inheritance is not about modelling relationships; it is for
> >abstracting complexity.
>
> How does it do that? I can see how choosing inheritence introduces risk. I
> can see how having both inheritence and role is more complex than having a
> single value-based method of identifying all relationships. Does having to
> deal with the distinction not introduce complexity?

Well, since you like to throw Date and Darwen around, perhaps I could point you at Booch and Rumbaugh?

> >You seem intent on a jihad against object databases. I don't know why.
>
> "Jihad" implies a religious, fanatical motivation, which I find insulting.
> Why should I be any more for or against an object database than I am for
 or
> against a network model database? Other than the name, what exactly is the
> difference?

Your posts have certainly achieved a fanatical tone.

Since I've never encountered the term "network model database" prior to your posts, perhaps you'd like to elucidate? Then I might be able to tell you the difference. Received on Sun Jul 22 2001 - 01:33:40 CEST

Original text of this message