Re: Clean Object Class Design -- What is it?

From: Bob Badour <bbadour_at_golden.net>
Date: Tue, 17 Jul 2001 15:09:04 -0400
Message-ID: <Ua057.81$YH1.29301777_at_radon.golden.net>


>Nope. Direct assignment (even if there is a defined conversion among data
>types) is still direct assignment. It is not encapsulation.

We will just have to disagree on this. I suggest you look up the assignment operator in C++, though. In the end, I don't know how you are going to change any property of any object without assigning something somewhere.

>But the attribute you are assigning is an element of some other object that
 is
>responsible for changing its state coherently. Exposing direct assignment
>prevents the object from being able to enforce those constraints.

I suggest you look up the "Property Let" declaration in VB.

>I ran across the dbdebunk web site and spent a few hours reading some of
 the
>articles. I now know better what you mean by what you say (since in most
 cases
>you are (nearly) repeating Date's words).

I consider his recent works a challenge to the industry to use precise terminology in discussions. I think it is an appropriate challenge, and I strive to meet it.

I am reasonably well-versed in Chris Date's positions on things, and I have to admit that I agree with almost all of them, if not all of them. I also have to admit that Date's depth of knowledge and quality of thought far exceed mine.

Although rare, my opinions on database systems are far from original. I have never claimed they were.

>One of the points made in one of
>those articles was that Date wants to enforce consistency at a "statement"
>level, not a transaction level. Guess what? That precisely the purpose of
 class
>methods!

A single statement can invoke many class methods so the scope is very much different.

>You can argue syntax, but conceptually, that's what I've been trying to
 explain
>to you.

It's more than just syntax.

If your class methods refer to the object classes defined in the database to extend the database, their methods extend the logical interface to the DBMS. This is true in the relational model, as well.

The more you extend the logical interface to the DBMS, the more detail you force your users to learn. These extensions necessarily complicate the interface. As a result, the DBMS should create no requirements for unecessary extensions. Your approach requires that the user see nothing but extensions.

If you decide to specify your constraints as step-wise procedural algorithms, you decide to forego the benefits of a couple thousand years of mathematical discovery. You decide to prevent your DBMS from using functional dependencies, mathematical identities etc. for optimizing integrity enforcement. You decide to hobble the DBMS from applying the optimization of algorithmic replacement.

If, on the other hand, your methods refer to application object classes, you have conceded the point that the DBMS does not support integrity and forces this vital task onto applications.

In either case, nothing prevents a user from adding a redundant method that violates integrity.

The relational model separates the issue of integrity enforcement from all other issues. This allows the designer to extend the DBMS in only those ways necessary -- keeping the solution as simple as possible.

Consider:
*How can you constrain simple, built-in data types? *Without functional dependencies, logical/set identities etc., how does the DBMS avoid performing redundant integrity checks? *If an operation requires multiple methods to move among consistent states, how does the DBMS allow the operation?
*If a constraint involves multiple object classes, how can you specify it only once?
*How do you prevent a DBA from writing an erroneous method that violates an existing integrity constraint?

Those are just a handful of examples off the top of my head.

>> >I've been through this before this before in other forums. Translation
 along
 an
>> >axis is simply a specialized case of movement where movement along the
 other  

>> >axis is 0. Why complicate the interface (one of your favorite points)
 when a  

>> >single, generalized "move" concept will apply.
>>
>> Since the object must define x and y properties as well as angle and
>> magnitude properties regardless of representation, why complicate the
>> interface by adding a redundant move method?
>
>Because it is not redundant. It is useful to move a point a relative amount
>from wherever it is (the "nudge" command in many drawing packages). The
 "user"
>does not want to compute the new cartesian coordinates of the point, he
>probably doesn't want to know about the coordinate system at all. He just
 wants
>to move it in some direction by some amount.

Good. You have answered your own question.

>> >However, I was not describing
>> >movement (which is a case of applying a delta value to the existing
 value).
 I
>> >was describing setting the position of a point.
>>
>> I would assume that one can assign any point value to a point variable
>> unless the variable has some additional explicit constraint. I would
>> likewise assume one can assign new values to any of the properties
>> regardless of representation.
>
>Back to where we started. If a point is a "tuple"

Straw man. Assume a point is an object value or object variable, instead.

>> >In what kind of application would I want to
>> >blindly set the radius of all circles to 5 simply because they happen to
 have
>> >an angle of 45 degrees (never mind the irrationality of a circle having
 a
>> >center angle :-) ?
>>
>> A graphic design application where the artist wants that visual image. Of
>> course, in my example I did not change the radius, I just moved all of
 the
>> circles that were on the diagonal x=y line horizontally until they were
 on
>> the vertical x=5 line.
>
>This is what I meant by the sledgehammer. I cannot envision a real
 application
>where an artist would want to move all the circles in his drawing from a
>diagonal (which could very well be serendipity and not an explicit
 relation) to
>a vertical line. The imprecision of the "update" statement is frightening
 to
>me.

When one needs a sledgehammer, it is nice to have one. If one wants to move or resize each circle one at a time, the relational model allows it. If one wants to move or resize a whole set of circles at once, the relational model allows that too -- navigational databases do not.

>> >It has been a few years since I've had to use an RDBMS, but
>> >triggers were the only mechanism I knew about to enforce complex
 constraints
>> >such as I've postulated.
>>
>> I doubt that you have ever used an RDBMS. When SQL database vendors
 choose
>> to push their work onto users by implementing triggers and stored
 procedures
>> instead of declarative integrity, I see no reason to blame the principles
>> they ignored in doing so.
>
>Here we are back at my biggest beef with you. By your definition, there
 does
>not exist a commercial relational database.

And there won't until the market lets go of its false assumptions and misconceptions!

Why should vendors provide support for declarative integrity constraints when they can fool customers into thinking that triggers and stored procedures solve the problem?

Why should relational DBMS vendors provide adequate support for domains when the markets that demand them most scoff at the idea of using a relational database?

Why should vendors provide adequate physical independence when the markets accept the status quo? Or even worse, when the markets assume that physical independence harms performance?

>No product actually provides
>physical independence, because the tables of most database systems is the
>physical as well as the logical model.

Excuse me? Did I hear you just say that no product actually provides B+tree or ISAM indexes without changing tables, views and queries? Did I hear you just say that no product actually provides hashed indexes without changing tables, views and queries? Did I hear you just way that no product actually supports data clustering without changing tables, views and queries?

Plenty of products provide considerable physical independence -- they just do not provide enough.

>> >The power of the ODBMS is that it allows performance an order of
 magnitude
>> >faster than the RDBMS.
>>
>> Since one can physically store the data indentically in an RDBMS, one can
>> achieve identical performance. No benchmark is required to comprehend
 this
>> point.
>>
>> If a particular vendor provides little or no physical independence, blame
>> the vendor's implementation -- not the data model. If a particual vendor
>> provides little or no support for domains, blame the vendor's
>> implementation -- not the data model.
>
>You just don't seem to get it. Blaming the vendor is fruitless.

You don't seem to get it. If you want to claim that Oracle doesn't provide adequate domain support for your purposes, go right ahead. If you want to claim that Sybase doesn't provide adequate clustering etc, go right ahead.

If you spread the misconception that the flaws these products introduced by failing to implement the relational model is inherent to the relational data model, you effectively destroy any potential demand for these features in relational databases.

It is very easy to propagate ignorant prejudice and very difficult to overcome it.

>> My criticisms of the non-relational ODBMSs are directed at fundamental
 flaws
>> in their logical data models, at false assumptions in their very
 conception
>> and at widely held misconceptions. No product will ever overcome such
 flaws.
>
>You might levy very similar criticisms of non-object, semi-relational (SQL)
>databases.

Sure. I do plenty of that, but this is a discussion of the relative merits of relational and non-relational object databases. I initiated this discussion after reading preposterous claims from a vendor in another thread.

The vendor was preying on widespread ignorance and misconception to sell his product, and at the same time he was reinforcing ignorant prejudice against the relational model.

>> >The power of the ODBMS is that I don't have to write 20% more code
 handling
>> >translation from relational result tables to a form that programmers can
>> use.
>>
>> This is the same power a relational ODMBS database with proper support
 for
>> domains provides. The question becomes: Why use a non-relational ODBMS?
>
>Because I can buy one today and I can't buy your theoretical dream.

So buy one today. If people approach a market saying: "I am willing to reluctantly accept substantial costs and deficiencies in the product I choose to get a feature I need," an efficient market will eventually deliver that feature without the substantial costs and deficiencies.

If people approach a market without even recognizing the substantial costs and deficiencies and refuse to even consider a product without those costs and deficiencies, an efficient market will go to great lengths to deliver those costs and deficiencies.

If you really think you must buy a non-relational ODBMS today to get the domain support you need or to get the physical structures you need, please be in the former group and not the latter group.

>> >> >> Again, your initial assumption is false which renders the remainder
 of
 the
>> >> >> argument irrelevant.
>> >> >
>> >> >Again, your reading is flawed. The only one who needs to know where
 an
>> >> >object is clustered is the programmer who writes the inserter.
>> >>
>> >> Or the user who queries it or the user who must know what inserter to
 call
>> >> or ...
>> >
>> >Why? Why does a query care about where the data is clustered (in ANY
>> database)?
>>
>> The user must know what inserter to call. The non-relational DBMS
 requires
>> extensions to the language, ie. the equivalent of additional operators at
>> the logical level, to encapsulate the clustering. A relational DBMS
>> separates the issue of clustering completely from the logical operators
>> used.
>
>The user of an object model must know the supported method of the model.
 This
>does include the factory methods, as well as other accessor/mutaor methods.
>Clustering is just one facet of all this that is (properly) encapsulated in
 the
>factory method(s).

This brings us back to the points I made earlier about unecessary complexity in the logical interface. The relational model allows users to instantiate the object variables of their choosing without requiring users to know about factory methods. The relational model allows the DBMS to cluster the data or not cluster the data no matter how the user inserts it or changes it.

>Consider: Relational theory requires that the constraints of the database
 be
>satisfied at all times (that the database only contains true propositions).
>What happens if I try to violate one of those constraints, e.g., SET
 Circle.X =
>"blue", using your point example above? Is the entire transaction aborted?

That depends.

If the statement appears in an interactive dialog with the user, the DBMS could handle the error either way. Many interactive tools allow one to specify whether to rollback on error.

If the statement appears in application code, I would expect the compiler to catch the error long before runtime.

If the statement appears in a method of a DBMS object class, I would expect the DDL statement that records the method to the DBMS to fail.

How does your non-relational object DBMS handle the error during interactive user dialogs?

>One of the reasons for encapsulation is to prevent just such a thing from
>happening. You can't set X to "blue" because the strongly typed language
>doesn't provide an interface for such a mal-formed expression. So, yes, the
>user *does* have to know what method to call.

The question is not whether the user must know what essential methods exist for object classes in the DBMS. The question is whether the DBMS requires additional non-essential methods to enforce integrity or to determine physical storage.

>> >What do you mean the user who needs to know which inserter to call?
 Whatever  

>> >inserter he calls enforces whatever constraints have been established on
 that
>> >object.
>>
>> What if the user performs an assigment to the list which replaces its
 entire
>> contents with all of the previous contents and an additional item? The
 user
>> has not used the insert method but has specified a logically equivalent
>> operation. By tying physical storage to the logical interface, one
>> needlessly forces the user to learn about physical issues.
>
>Precisely why I wouldn't allow such a blind assignment. Encapsulation is
 your
>friend.

So, you require your user to know about the single method that will satisfy the user's needs. If the user composes a reasonable equivalent from the user's existing knowledge of the language, you want the DBMS to return an error even if the reasonable equivalent would satisfy all of the integrity constraints?

Doesn't this just confirm what I have stated all along? The user must learn additional inessential details.

>> >OK, where's the difference here? If the action of inserting is
 abstracted
>> >behind an insertor interface (good OO practice), I can change the
 effects
>> >(clustering or not) of this action with no impact to the "users" very
>> easily.
>>
>> ...only if the user calls the insertor interface. What if the user
 assigns a
>> whole new value to the collection?
>
>What makes you think calling the insertor interface is optional? Again, we
 are
>talking about encapsulation.

Really? I thought we were talking about logical interface complexity.

>I think you are grossly overstating the truth. No user will peruse the
 system
>catalog and discover useful relations and views to accomplish business
>purposes.

I do it all the time, as do my colleagues.

>Semantics are much more difficult.

Only because you make them so. Semantics, relational predicates, business rules -- all the same thing.

>So
>whether a "user" has to invoke the calculateEarnedValue() method or select
 the
>appropriate columns over the appropriate view, the learning curve is about
 the
>same.

You are ignoring that I have given the user a powerful learning tool to help in the climb.

>> Does the user of a non-relational ODBMS use a set or a collection or a
 hash
>> or a simple object method to query the system catalog?
>
>I don't know. I've never queried the system catalog. I guess there is one.

I guess it's not as easy to use as a relational one, then.

>> >> >If you delete a person, the employee will also be deleted
 automatically.
>> >>
>> >> Really? You are forced to use cascaded deletes? Sounds dangerous to
 me.
>> >
>> >Why? It seems that you have two choices to enforce referential
 integrity:
>> >either delete the employee along with the person (cascaded delete), or
 allow
>> >the employee to have a dangling reference to a non-existent person --
 oops,
>> >that's not referential integrity. What would you propose?
>>
>> You could also inform the user that they must make sure no employees
>> reference the person prior to allowing the delete. The user then has
 options
>> and decisions to make. The user could delete the employees. The user
 could
>> refer the employees to a different person. The user could realize the
 delete
>> was an error and leave everything alone.
>
>So the user is now responsible for enforcing constraints?

The user does not enforce the constraint. The DBMS enforces the constraint and informs the user what the constraint is. The user applies the information to achieve the correct result.

>If the user decides that a person should be deleted
>from the database, how are you going to enforce the constraint
automatically?

Oh, sorry. I guess I wasn't precise enough in my terminology. You were using the second person to describe strategies the DBMS can apply for enforcing integrity. Change "You could also inform the user" to "You could request the DBMS to inform the user" above.

The DBMS enforces the constraint and informs the user. I apologize for any confusion my sloppy verbiage caused.

>Strange. I thought you were arguing from a position of theoretical purity
 with
>no product to actually use. Is this not the case? What product do you use
 that
>provides all the things you demand of ODBMS?

Hmmm... Let's see: I demand physical independence, logical independence, logical identity, declarative constraints and an optimizer.

Oracle, Sqlbase, FirstSql, Sybase ASE, Informix, Sybase ASA all provide all of those things.

>> >Can you define what you mean about an "application independent
 language"? If
 by
>> >this you mean a language different than the application programming
 language,
>> >then I'm not sure I'd agree with you. It is always a maintenance problem
 to
>> >keep different representations of the same thing in sync. If I can
 remove
 one
>> >of those maintenance issues, I remove a major source of errors.
>>
>> Yes, that is what I mean. A DDL or DML different than the application
>> programming language.
>>
>> Would you agree that not having a language different than the application
>> programming language limits the use of the DBMS to a single application
>> programming language?
>
>No. There are certainly products out there that support multiple languages
 as
>well as interoperability among languages.

Which implies for all but one of those languages that the DBMS has a different DDL or DML language.

>> Would you agree that this limits the use of the DBMS
>> to highly skilled programmers who comprehend the language?
>
>No. I don't know of any mainstream product that does not support ODBC so
 your
>happy Excel users can use your language-centric object database.

What language do the Excel users use to insert new objects into the database?

>> Would you agree
>> that this severely limits logical independence?
>
>Perhaps. Probably. However, inasmuch as OO in practice espouses re-use,
 logical
>independence in the way you are thinking about it may not be that important
 in
>an OO environment.

Having written many OO applications over the last 14 years or so, I would have to strongly disagree with the above statement. Reuse does not solve the problems I have with executable code in the field or with independent upgrade cycles.

>> Would you agree that this
>> encourages physical dependence even if it does not require it?
>
>Yes. Direct mapping to the programming language (even if you do use a DDL)
>encourages if not requires some measure of physical dependence. But not as
 much
>as you like to think. I am dependent on the mapping between how my data is
>persisted and how it is reconstituted into language class instances. But
 there
>are a raft of other physical details that are hidden and not necessarily
>exposed by the language class support.

I have less concern about the hidden minutiae than the big differences that can lead to orders of magnitude difference in performance.

>> As long as you (and most practitioners, for that matter) cling to
>> misconceptions and fallacies, I suppose those might be your only two
>> choices. If you learn how to recognize and demand quality from your data
>> modellers, new options open up.
>
>The problem is one of accountability. I am not in a position to demand
 anything
>of anyone. I am stuck with the reality of what is.

Every consumer demands. That's just economics 101.

If you have the ability to communicate the flaws in a data model to the data modellers and if you have the ability to communicate how those flaws impede your business, you can change your reality.

Why is it that everyone except Shaw wants me to be a reasonable man?

>> >> For those ODBMs that are not relational databases and that do use DDL,
 what
>> >> do you see as the distinguishing characteristics that make them
 superior
 to
>> >> relational databases?
>> >
>> >See above. Performance.
>>
>> Since relational databases allow any physical layout of the data, they
 can
>> achieve equivalent performance characteristics to any other logical data
>> model.
>
>Here you are comparing concrete products to theory again.

I am comparing what is possible with a relational database to what is possible with a non-relational database. You made the claim that performance distinguishes the logical data models -- not me. I have clearly communicated how the logical data model does not affect performance.

If you want to criticize specific products for not delivering the physical layout you need, do so. You won't hear any argument from me. Apply that criticism beyond its valid scope, and I will again contest it.

>> >> Given that performance is determined by the physical design of the
 data
 and
>> >> given that relational databases allow any physical design independent
 of
 the
>> >> logical design, what improves the performance?
>> >
>> >Implementation specific.
>>
>> Exactly! And the physical implementation is independent of the logical
 data
>> model. How does that make one logical data model superior to another with
>> respect to performance?
>
>This is the last time I'll say this, then I'll just cut the rest of these
>redundant arguments. You are arguing theory against implementation

I am arguing logical data model against logical data model. You were the person who inappropriately introduced an implementation detail as a feature of the logical data model.

>> Relational proponents have long ago demonstrated the huge deficiencies of
>> all the other known logical data models. The onus is on those proposing a
>> new data model to demonstrate how those criticisms do not apply to the
 new
>> data model. The onus is on those proposing a new data model to
 demonstrate
>> that the new model is as good as the relational model.
>
>And successful projects that cannot be accomplished using the legacy
 technology
>don't count?

Anecdotal evidence does not refute rigorous or scientific analysis.

>> So far, you have refused to even describe the new model you espouse let
>> alone demonstrate its worth.
>
>Never my intention. I jumped in the middle of this because you made
 sweeping
>generalizations about object databases that do not bear up under my
experience.

I think I have demonstrated that my generalizations do bear up. Have you demonstrated that non-relational object databases provide much physical independence? Have you demonstrated that non-relational object databases provide any inherent performance advantage over relational ones? Have you demonstrated that non-relational object databases provide as simple an interface?

>No, it means that your programmer should never choose inheritance when the
>subtype constrains existing operations of the base type.

Immutability is not the definitive design criterion? If observing that a circle is always an ellipse does not make inheritence safe to use, I don't know what would.

I see that Chris Date recently published an excellent article on the circle/ellipse issue at http://www.dbdebunk.com/ Received on Tue Jul 17 2001 - 21:09:04 CEST

Original text of this message