Re: Clean Object Class Design -- What is it?

From: Jim Melton <>
Date: Wed, 05 Sep 2001 07:41:37 GMT
Message-ID: <>

Content-Type: text/plain; charset=us-ascii; x-mac-type="54455854"; x-mac-creator="4D4F5353" Content-Transfer-Encoding: 7bit

Bob Badour wrote:

> Jim Melton wrote in message <>...
> >Frequently, users *do* have a hard time disambiguating similar "variables".
> >(Can you tell twins apart?).
> Yes. Each generally has a unique name, social security number, drivers
> license number etc.

And when you *see* two twins are you able to discern their driver's license numbers? No. How do you discern them? Usually, it is spatially because humans are able to reason that two "objects" do not occupy the same space at the same time. But this spatial distinction is a poor choice for programming because it is temporally unstable.

Just because someone can assign an arbitrary identifier to something does not mean anything. In fact, these arbitrary identifiers are routine mis-used in identity theft. There is NO more unique identifier than one that captures the intrinsic identity of an object. All arbitrary ones are subject to error. Fingerprints or retinal scans are better unique identifiers because they are truly unique. (Of course, the technology for sensing these unique identifiers can still be spoofed).

> >That's why query by example is a powerful user
> >interface technique. When the user says, "That one" he is using a
> "pointer".
> He is also using a digit and an index (finger). <g> When the user inspects
> identifying attributes to decide which to point at, he is using explicit
> attribute values.
> Attribute values help the user point when the user must point, but at other
> times pointers actually get in the user's way.

Since your other responses are consistently esoteric, academic, and detached, should I interpret this uncharacteristic levity as agreement? Or simply that you cannot refute the argument?

> >> >This notion of intrinsic identity is reinforced in object
> >> >databases by our pointers to objects.
> >>
> >> Yes, by pointers to variables. But this does not help users disambiguate
> >> similar values stored in separate variables. Are you saying that your
> >> entities have not logical identity? That users cannot disambiguate
> similar
> >> entities?
> >
> >Often not.
> How not and when not? Users cannot do much of value if they cannot even
> identify what they are talking about.

Sometimes the "thing of value" is simply to be able to figure out how to identify what they are talking about. Consider forensics or intelligence.

> >A way to reference the data
> >that recognizes its intrinsic identity is required.
> Which is why humans have always required human-understood logical
> identifiers.

Because prior to the advent of computers humans had to perform data retrieval and sorting functions that should properly be hidden from the humans. Computers may need to deal with arbitrary identifiers (in which case an OID is as useful as any), but humans gain NOTHING by having these values "exposed to the logical interface".

> Restating your previous statement:
> In a relational database, the example (or pattern) is
> always to copy the data out of the database, perform some manipulations
> (as
> required), then find the appropriate record(s) again and modify whatever
> values
> are changed.
> The example (or pattern) is simply incorrect as an archetype of relational
> databases. It might be a typical example of an application programme,
> however. A relational dbms allows one to operate directly on the data
> without any copying. On the other hand, the majority of so-called object
> dbmses do require such copying.

As I have said before, a database without an application that uses it is like a disk drive with no power applied. The data may be there, but until someone accesses it is just data.

A DBMS is infrastructure. Non-programmer "users" don't use the database directly. Programmers write software to enable users to use the data more efficiently. Other users use existing software to use the database.

I know you disagree with me, that you see the DBMS as an end to itself, but we will not convince each other to change our views. I'll concede that my statements were not as crisply precise as you insist upon. Try this:

In *programming* with a relational database, the pattern...

> When you use the term object in the discussion of data management, you
> equate object with data ie. the subject matter of the discussion.
> When you use the term object in the discussion of variables, you equate
> object with variable ie. the subject matter of the discussion.
> When you use the term object in the discussion of type, you equate object
> with type ie. the subject matter of the discussion.
> When you use the term object in the discussion of identifiers, you equate
> object with identifier ie. the subject matter of the discussion.
> When you use the term object in the discussion of relations, you equate
> object with relation ie. the subject matter of the discussion.
> Why not simply use data, variable, type, identifier and relation as I
> suggested?

Because the truth is that an object is all of these. In SmallTalk, for example, every "variable" is of an object type (class). Aggregations of "variables" (relations?) are similarly characterized by an object type (class) because there is no fundamental distinction between "simple" and "complex" data. Yes, the distinction between class and object is frequently blurred in common speech, but that is because the distinction is readily apparent from usage. A specific variable is an instance of a class type, just as it might be an instance of an integer type.

Your words are foreign to me. While they have obviously been imbued with great meaning for you, they are confusing and imprecise in communicating with me. So whose vocabulary is right?

Neither. Communication is not about right and wrong. It is about expressing and exchanging ideas. Since your recent arguments are more about the (im)precision of words, it is now clear to me that there is nothing more to be gained from this discussion.

> >> You have it backward. ODBMSes require the above process, but relational
> >> databases do not. One can send a set-oriented command to the RDBMS that
> >> manipulates data entirely within the DBMS process.
> >
> >Really. Can I send a set-oriented command to the RDBMS to find a
> least-squares
> >path through a series of points?
> Provided the DBMS supports the operation, yes you can.

And if pigs could fly...

Don't give me anymore nonsense about theoretical databases. Here in the real world we have to retrieve data from databases and perform algorithmic operations on them.

> You can also tell the DBMS to increase the amount in an account by the same
> amount it decreases the amount in another account, without copying all of
> the account information back and forth. How many object databases can say
> the same?

I think I can name 2. You see, Objectivity is not truly client/server in that the entire DBMS code is loaded into application program space. Consequently, the application has exactly the same access to the data as the DBMS.

And I believe that Gemstone/S allows methods to be invoked in the server.

> >Can I send a set-oriented command to the RDBMS
> >to find a statistical probability that two measurements (including their
> error
> >distributions) represent the same event?
> Provided the DBMS supports the operation, yes you can.
> You can also tell the DBMS to delete any information about customers who
> have not made a purchase in the previous five years without copying any
> customer or purchase information back and forth. How many object databases
> can say the same?

See above. Oh, and add that if wishes were horses...

> >Can I send a set-oriented command to
> >the RDBMS to predict the likely next state of a Markhov model?
> Provided the DBMS supports the operation, yes you can.
> You can also tell the DBMS to populate a workflow table to identify all of
> the entities whose state must change in the next step of some game, without
> copying any data back and forth. How many object databases can say the same?

See above. By the way, I presume that this populated workflow table will then be "copied" to some "user" who can take advantage of it.

These are really your weakest arguments yet.

> >Believe it or not, sometimes people want to apply algorithms to the result
> of a
> >query.
> If an application requires a local copy of some data, this is a pattern of
> the application and not a pattern of the relational model. Unfortunately,
> so-called object dbmses that confuse applications with data management
> require such copying even when completely unnecessary.

Once again, you have it backwards. Because the table-oriented databases have so thoroughly isolated themselves from application programming languages, it is necessary to copy data from result table into application data structures before algorithmic operations can commence.

Object databases are more naturally integrated in to application programming languages, so the data can be used naturally, without any unnecessary copying.

Or, if it will help you to understand what I mean, replace "copy" above with "translation". Because I really don't care about the movement of data from disk to database cache across a network (or not) to client-side cache (or not). What I care about is the completely superfluous step of translating result-table columns into application program data structures (that, by the way, should enforce the same integrity constraints the database requires) so they can be used to do "interesting" work.

> The relational model, by using value based identity, facilitates consistent
> use of data across disparate applications. The user examines the same
> identifiers in a relational dbms that the user examines in a spreadsheet, a
> statistical regression application, on a report, in a UI grid or anywhere
> else.

Here is one of our disagreements that I will not debate with you. Object theory (scoff if you will) says that object identity is intrinsic and independent of any attribute values. Relational theory says that identity is value based (so you will invent logical identifiers if sufficient uniqueness is not apparent in the existing model).

> >In that (extremely common) case, the relational (or SQL if you can prove
> >otherwise) *paradigm* [pattern] is to copy the data from a result table
> into a data
> >structure the algorithm can use.
> As I explain above, this pattern is an attribute of the application and not
> at all an essential pattern of relational databases or even SQL databases
> for that matter. It is, however, an essential pattern of most so-called
> object dbmses.

I'll grant you that it is a pattern of application development. I do not grant that (if we leave the application development arena) it is intrinsic to ODBMS and extrinsic to RDBMS.

> >Object databases do not require this extra
> >step.
> Are you claiming that object databases operate directly on the data in the
> database without copying data into application object variables?


> >> >In the object database, this data copying step is eliminated.
> >>
> >> Actually, in the object database, this data copying step is required in
> >> order to make the data available to the application programming language
> for
> >> data manipulation. It is not required in an RDBMS because relational
> >> databases have their own data manipulation language.
> >
> >You are just creating a different programming environment out of your
> >(theoretical) RDBMS.
> Not at all. The DBMS is a data management environment, and it only makes
> sense that it can manage data directly.

If you can do application level things using the RDBMS language, you are doing application programming (accomplishing a *task*). This is a programming environment by any definition.

Perhaps it would be useful for me if you would define the scope of "managing data", since earlier you allowed this to include complex calculus (including, by the way, iteration).

> >If all processing occurs in the context of the DBMS,
> I have never made any such claim. Are you totally unaware of the conceptual
> difference between "allow" and "require"? Much of your argumentation style
> involves equating the two.

As does yours. See above where you say "in the object database, this data copying step is required... It is not required in an RDBMS." Here you prototypically compare implementation products with theoretical possibilities. And your assertion of what is required of an object database is false.

> >system cannot scale well and the DBMS becomes a bottleneck.
> Even though this is a response to a straw man, I must point out that you are
> imposing additional faulty preconceptions and assumptions in the above
> statement. Nothing prevents distribution of an RDBMS, nothing prevents
> parallel processing of data in an RDBMS, nothing prevents massively huge
> scaling of an RDBMS.

Well, once you start distributing your RDBMS it looks an awful lot the the data copying that you were arguing against just a little while ago.

I'll grant you that it is theoretically possible to do everything you state above. I'll also point out that in current state of the art the database engine is a bottleneck for all high-throughput applications.

> >Unless your RDBMS
> >data manipulation language can support all the kinds of algorithms that are
> >coded in other languages, this is (at best) a red herring.
> Presenting a pattern inherent in a specific application as if it were
> inherent in the logical data model of the dbms is at most a straw man.

Which you cheerfully signed up for just a few paragraphs above.

What I said is that there will be algorithms in yet to be constructed applications that are not achievable using only the programming environment of the DBMS. That's when application programming comes in and the data copy/translation "paradigm" I started with comes into play.

> >Persistence without data copy.
> Out of curiosity, which object dbmses provide persistence without data copy?
> How does the object dbms data model support this?

You said persistence implies data copy. I gave an example where it does not. And, if you will allow my clarifying comment of translation for copy, I'd submit that all object databases provide persistence without translation.

> >> >I say conceptually, because obviously as data is moving to and from disk
> >> there
> >> >is copying going on. However, an object reference allows me to manipulate
> a
> >> >persistent object directly without regard to this copying.
> >>
> >> One cannot ignore the copying going on. At a conceptual level, the
> >> programmer must still specify which object variables get copied into and
> out
> >> of the application programme's memory. At a conceptual level, the
> programmer
> >> must still specify when and how to retrieve values from the database.
> >
> >One certainly can.
> One certainly must not. An application that confuses an altered copy of a
> database value with the actual database value will not operate correctly.

Within the scope of a transaction, what is the difference? From the time I issue an UPDATE command until I commit the transaction, my view of the database will be different from any other users. However, in an *application* using a table-based database, there is yet another level of indirection, the values copied from the result table into application variables. The applications variables are not guaranteed to be consistent with the transactional state of the database for one instruction after they are first assigned.

Using an object database, operating on "application variables" is the SAME as operating on database values. If I modify an "application variable" (class instance), it is the same as issuing the UPDATE command above. There is EXACTLY ONE consistent view of the database throughout the entire transaction.

> >It is this point exactly that I was making above. Because
> >the ODBMS makes a persistent object reference *look* exactly like any other
> >programming language variable (pointer, if you wish), the application
> >programmer has no concern for the copying of object variables into/out of
> >memory.
> You have not addressed the point I raised above that one cannot simply igore
> the copying going on. At a conceptual level, the programmer must still
> specify which object variables get copied into and out of the application
> programme's memory. This refutes your point, and you have not addressed this
> counter-argument.

I don't know how else to explain it to you. If you have ever used an object database, this should be obvious to you because it is so fundamental. If I have a persistent class Contact and I have a reference (pointer if you will) to an instance of that class, e.g.,

d_Ref<Contact> aContact = ... // retrieve from database

then operations on the "application variable" directly affect the transactional state of the database, e.g.,


From that point on, ALL references (in the current transaction) will show the updated state. The Contact record may be manipulated and transformed in any number of other ways, but the view in the database (pending commit) is the same as the view in the "application variables".

> At a conceptual level, the programmer must still specify when and how to
> retrieve values from the database. This also refutes your point, and you
> have not addressed this counter-argument.

Since your premise is false, the rest of your argument is irrelevant.

> >Conceptually, the programmer must query the database for objects of
> interest,
> >but this is not a concept unique to persistent data.
> It does, however, invalidate your prior argument that the application
> programmer has "no concern" for such things.

I did not assert that the application programmer had no concern for queries. I also pointed out that querying a data store is not unique to a DBMS, or even persistent data in general.

> >Conceptually, the programmer must be cognizant of transaction boundaries
> and
> >transaction semantics. I can't think of any way to avoid this unless you
> give
> >up the concept of ACID transactions (including rollback).
> Again, it invalidates your prior argument that the application programmer
> has "no concern" for such things.

Argument never made.

> >> >(By the way, I consider this whole difference in paradigm with regard to
> >> >explicit copying into/out of the database as one of the key
> >> >philosophical/architectural differences between object databases and
> >> >relational/SQL databases)
> >>
> >> Paradigm: A set of assumptions, concepts, values, and practices that
> >> constitutes a way of viewing reality for the community that shares them,
> >> especially in an intellectual discipline.
> >
> >My dictionary had a slightly different definition (see above). Or, if you
> >prefer, try:
> Actually, the above is just one of several alternate definitions. If you
> mean pattern or example, why not say pattern or example? Why the twenty-five
> cent word?

Because it is a perfectly valid word that means precisely what I wished to say.

> Those who use the word are not even clear on what it means or which of
> several meanings they intend. If those who use the word intended clear
> communication, they would choose a less ambiguous synonym. I can only
> conclude that they intend to obfuscate.

I am very clear of what it means and precisely what meaning I intended. I'll reiterate my accusation that you are a vocabulary snob.

> >> The object oriented community have false assumptions, nebulous concepts,
> >> warped values and arbitrary practices. The relational community have
> >> explicit assumptions, precisely defined concepts, principled values and
> >> reasoned practices.
> >
> >My, you are painting with an awfully broad brush tonight.
> I have earlier demonstrated all of the above assertions and nobody has yet
> offered a valid counter to any of them.

Perhaps, but none of those points were the subject of the post to which you were responding.

> >All of this because
> >you chose to react to my choice of words instead of the point I was making?
> Your point was that copying of data is inherent to the relational model, and
> I have demonstrated the point's impotence. Your use of the word "instead"
> above misleads.

No, you have blithely misunderstood me. I've made it clear in the past that I am an implementor, not a theorist. My words should be interpreted from that perspective just as your should be interpreted from the realm of pure theory without concern for current state of the art.

My point was that copying (translating) data is inherent to programming with the (so called) relational databases available on the market to day. That step is obviated with the (so called) object databases available on the market today.

> I think it is important for people to understand the intellectual bankruptcy
> of the word "paradigm". Folks often use it to sound intellectual when they
> have no intention of using any intellect.


> Given the number of hypesters and hack writers using the term, people can
> easily fall into a lazy habit of aping them. One gains a very valuable
> discipline by expunging the word from one's vocabulary.

Your opinion. However, I do not intend to ask your permission before using any word in my vocabulary.

> >> I don't think physical copying has much to do with the differences in the
> >> "paradigms".
> >
> >Let me try to be more precise for you. Physical copying (or even logical
> >copying?) is a fundamental difference between programming with a
> result-table
> >(or cursor) database and an object database.
> Copying has nothing to do with the logical data model of the dbms. One can
> conceive of a day when we raise the level of application programming
> languages to more closely match the level of relational databases -- in
> order to obviate "impedance mismatch". No "logical" copying would be
> required for application programming because such a programming language
> would have statements appropriate to operate directly on relation variables.

And one might argue (in fact, I think I just did!) that the object databases available on the market today do just that. In fact, this is the single largest advantage to using an ODBMS over a relational alternative.

> Again, you are assuming that a pattern inherent to the application is
> inherent to the dbms to build a straw man. Any good programmer will tell you
> that good abstractions hide inessential physical implementation details and
> that horrible abstractions attempt to hide essential logical details.

No straw man. Real world experience. Anyone who has programmed using both (so-called) relational databases and (so-called) object databases should be able to substantiate this point. Translation of data into and out of the database engine is a physical implementation detail and it should be abstracted away.

Or are you arguing that it is a valuable element of the logical model that should be preserved (hint: see above where you rhapsodized of the day when such copying would be unnecessary)?

> I suggest to you that so-called object dbmses that attempt to hide the
> copying inherent to an application programme attempt to hide essential
> logical details.

Which would be?

> >When I say "SELECT A from FOO" I
> >must bind the returned value(s) for A to application-space variables before
> I
> >can use them.
> This is an attribute of your application programming environment.
> >Furthermore, if my algorithm ends up changing the value of A, I
> >must then issue an explicit "UPDATE FOO values (A = newvalue)" to ensure
> the
> >change is propagated to persistent memory.
> Again, this is an attribute of your application programming environment.
> >Note that before this update step,
> >the changed value of A is available to other processing in application
> space
> >and my application does not have a coherent view of the data space.
> How is this any different from the changed value of an object variable prior
> to committing the change to persistent storage?

Either you did not read or you did not understand the explanation I gave below.

> >In an ODBMS, the same "SELECT from FOO" will return me object reference(s)
> to
> >FOO objects.
> Do you not see how dynamic heap allocation and local copies of the data are
> inherent to this? Do you not see how this requires and exposes physical
> details to the user?

Nope. Where is heap or local copies intrinsic to accessing the data? If you want to get into the gory implementation details of a particular ODBMS, I'm pretty knowledgeable of how Objectivity does it, but I don't see how that is fundamental to the model.

What physical detail is exposed to the user (let's constrain the user for this discussion to the application programmer)?

> >If my algorithm needs the A value, it simply uses it [ print
> >obj->a() ].
> Which is inherently a local copy of the A value as it exists in the dbms
> that may no longer match the A value in the dbms.

Absolutely not. It is in fact guaranteed to be the A value in the dbms within the context of the current transaction. This includes any and all modifications to the A value for the duration of the transaction.

> >If it needs to update the value, it does it directly [ obj->a(
> >newvalue ) ].
> >The data space is consistent within my transaction (a second
> >SELECT statement will automatically see the updated value of A), but not
> >propagated to other transactions until the commit boundary.
> Again, this is not inherent to the data model. It is a property of the
> application programming environment, specifically to the middle-ware for
> lack of a better term.

I think we are mixing terms again. I don't see a huge difference in the "data model" between relational and object methodologies. We are, after all, modeling the same data. It is more a difference in how that model is represented. The fact is that the object databases represent the data in a form more natural to object programming environments. No question. The DBMS (any DBMS) is by definition middleware.

> >Thus, the programmer does not write any code to copy values into or out of
> >application space.
> Except for the code you omitted that queries the dbms for the value of obj,

Well, the code I omitted was omitted from both examples and is roughly equivalent. The key difference is that the ODBMS user does not specify result columns because a *reference* to the entire object is returned, rather than *copies* of specific columns in a result table.

My object database supports pretty much the same kind of attribute predicates that a SQL database does. In fact, they offer an add-on SQL engine (so I guess they support all the predicates SQL does).

> and the code you omitted that commits the changes to the dbms.


> >The PATTERN (paradigm) of table programming is copying data.
> You have not demonstrated this. You have demonstrated that the pattern of
> your application is copying data, and you have demonstrated two different
> methods that the middleware to an SQL database can accomplish this copying.

Fine. Show me an example of how I can perform algorithmic operations not supported by the DBMS on a query result from the DBMS without copying from a result table into application variables using any commercial (not so-called object) database of your choice.

> >The PATTERN (paradigm) of object programming is not.
> You have not demonstrated this, either. You have demonstrated that the
> object dbms, by limiting the user to only one of the methods above, exposes
> physical implementation details in its abstraction while attempting to hide
> logical details in its abstraction.

I can only say, "Huh?"

> >> >This whole concept of intrinsic identity is extremely critical in my
> domain
> >> >because often we do NOT know what attribute value could be used to
> uniquely
> >> >identify an object. Sometimes, all we know is that there is an object
> >> observed
> >> >or inferred through some phenomenology. Over time, we hope to discover
> more
> >> of
> >> >the attribute values attributable to that object, but in the mean time it
> >> must
> >> >be distinct from all other objects under consideration.
> >>
> >> How do the users of your system identify the distinct instances under
> >> consideration?
> >
> >Different ways in different contexts.
> But you want the dbms to use a single way, OID, in all contexts? Does the
> irony elude you?

Gaaack!! How can someone seemingly so intelligent be so dense?

Sometimes my users gesture to an icon on a map. Sometimes the follow a hyperlink on a web page. Sometimes they look at a row in a tabular display.

In all cases, the association between the UI element the user gestured to and the underlying persistent object is, in fact, an OID. It should be evident that user interface presentation is completely independent of underlying database representation. This is the power of the model-view-control pattern that has been around for quite some time.

> >> >Object databases handle this representation of uniqueness with object
> >> >references (commonly referred to as OIDs).
> >>
> >> Using pointers, yes, I know that. We already know what a disaster it is to
> >> expose pointers to users. If you do not expose OID to users, how do users
> >> identify unique instances?
> >
> >See, I don't get your point. An OID is not a pointer. In the database system
> I
> >use, an OID has a native representation (4 16-bit numbers) and a stringified
> >representation ( #dd-cc-pp-ss ). Neither of these are "pointers" any more
> than
> >a rowID is a pointer. Yet, because of the operator overloading in OO
> languages,
> >they can appear as a pointer to the programmer.
> Show either representation to casual database users and ask them whether
> OIDs are pointers. By the way, a rowID is a pointer.

Why would I EVER want to show one to a casual database user? What possible purpose would that serve?

> >Again, though, user interfaces are written to facilitate users doing their
> >jobs.
> Do you honestly think that users use OIDs to identify their data?

Nope. Nor would I ever claim that they should do so.

> >When a user of an on-line ordering system orders a new printer, he does
> >NOT copy the SKU number into an order-entry text field. He clicks on a
> picture
> >of the product. The user is POINTING to the data of interest.
> The user communicates to the application programme via the physical location
> of an image on the screen, because this is inherent to the communication
> medium. However, the user communicates the identifying SKU to the
> application programme via this medium and does not communicate the physical
> location which will change in the very next instant.

More precisely, the user clicks on an active user-interface element of the browser. The action associated with that element is to send a message to the back-end order-entry system. What is the content of the message? Depends on your system. It could be a SKU number (requiring a database lookup to find the product information that was just displayed to the user, such as price and availability)


it could be an object reference to the inventory item represented on the display, providing direct access to all the requisite information to complete the order with no redundant query.

You just can't tell by looking.

But my point remains, users are quite comfortable POINTING at data items of interest. It shouldn't be so difficult to see how application programs could do exactly the same thing to their advantage.

> >Why can't
> >software do the same thing?
> Because the user identified the appropriate SKU to the system using a
> fleeting location and did not mentally identify the data by its location.

Hmmm. You design software differently than I do.

> Software tried data management using pointers decades ago and it proved
> impractical. Should we outfit our infantry with catapults and broadswords?

What kind of "argument" is this? Argumentum ad ridiculum?

Decades ago the programming languages were not sufficient to properly take advantage of pointers. The state of the art has moved. Who knows, maybe you are right and there is no hope for object databases. Or maybe people who are not constrained by your view of the world will continue to use them to advantage and be successful. Only time will tell.

> >But I guess I could argue that OIDs do not *require*
> >navigation.
> Really? How do users manipulate order line items without navigating orders?

How about selecting order line items directly?

> >Again, though, they are not of much use without it.
> Knowing the average order size or price is useless? Knowing the average
> shipment size is useless? Really?

Of course, here you are asking question of orders. "Logically" you want to ask each order for its total price or size or each shipment for its size. Physically, you may have to group and sum order line items, but isn't this a physical implementation detail? Wouldn't it be better to ask each order for its total price and just compute the average in a straight-forward manner?

> >Arbitrary numbers are implementation artifacts of
> >systems that cannot properly represent intrinsic object identity.
> Are you honestly suggesting that OIDs will replace driver's license numbers,
> social security numbers, product codes etc? Are you suggesting that users
> find them more accessible than the existing artifacts? Are you suggesting
> that OIDs are neither artifacts nor arbitrary?

OIDs are implementation details as well. At issue is the representation of intrinsic identity such that I can unambiguously identify the "thing" of interest regardless of its attribute values. The goal is object linking. If the policeman who pulls me over for speeding swipes my driver's license through a card reader that communicates with the DMV's database, who is to say what information was used to pull up my record? Perhaps it could be an OID (or a URI or some other "pointer" that is not value-based).

I am saying (and you are not hearing because you are convinced that the opposite is true) that OIDs are NOT (properly) exposed to the user interface. They are NOT query elements. They are NOT used to specify joins (joins do not need to be specified because they are pre-computed and stored).

> >For example, a telephone number is an arbitrary identifier (although more
> >closely related to a pointer) for a specific end-point in the telephone
> >network.
> Have you never used a reverse-lookup feature on the internet? A telephone
> number is an arbitrary identifier for users of the telephone network. The
> transportation company I use for travel to the airport identifies its
> customers by phone number. This has its drawbacks, of course. A video store
> I used to frequent also used phone numbers to identify customers, and this
> caused problems too.

Yes, arbitrary IDs do have problems. They do not adequately capture intrinsic object identity.

What happens when you move and the phone company re-issues your phone number? Oops.

> The transportation company uses some additional identifier or has some
> facility to copy customer information easily because their call centre now
> picks up my address from the call identifier no matter which of my phones I
> call from.

Caller-ID. The phone number from which you call is automatically added to their database of phone numbers associated with you. I never said query by attribute value was bad. I said arbitrary identifiers were bad.

> The video store solved its problems by giving people without phones (or
> people who shared a phone with others) another number that was not a valid
> local phone number.

Another arbitrary ID.

> In other words, as the system grew to the point where it required data
> management, the system required a logical identifier usable by both humans
> and machines.

In other words, the system grew to the point where it required an identifier (if a phone number is logical, so is an OID) that was machine readable and technology did not advance to the point where this number could be shielded from the human users.

> >But if I had a way to "gesture" to your entry
> >in my "contact database" and pass a direct end-point (pointer) to my
> telephone
> >(or to my e-mail program, or to my envelope printer), then arbitrary,
> synthetic
> >IDs would phase out as archaic relics of an unenlightened past.
> I guess that's why we invented IP and DNS.

IP is a pointer. But, with firewalls and NAT, it is an easily masqueraded pointer.

DNS is a logical identifier. But what exactly does it identify? may resolve to a single IP address, but it is really just an entry point to another level of index (,,, etc.)

Nice try, but we aren't there yet.

> >> >This synthetic ID is stored in each phone number so
> >> >that it can be joined back to the contact.
> >>
> >> Incorrect, both logically and physically. Logically: An association table
> >> might expose the relationship between contact id and phone number.
> >> Physically: An RDBMS might store the phone number with the contact fields
> >> using juxtaposition to identify the contact, but if it does so, it exposes
> >> the association to the user using the contact identifier and phone number.
> >
> >If you wish to design your database such that all associations are through a
> >distinct "association table", that's fine.
> The second (ie. physical) example did not do so.
> >Object modelling has "link classes"
> >that perform the same purpose.
> And what advantage do these link classes provide over relations? Simpler
> interface? More consistent interface? Principled foundation? Psychological
> advantage? ??

Ummm, I don't believe I claimed an advantage. I believe my words were "the same purpose". As near as I can tell from your previous posting and the background reading I've been able to do, a relation is just a concept. It still have to be translated into a programming language representation to be implemented on a computer.

And frankly (feel free to jump in here and call me ignorant or pseudo-intellectual or obfuscating or whatever comes to mind), I don't see such a huge difference between modeling a separate "association table" and adding the join key to the phone number. The former is *more* complex to me, not less. The latter is the more common approach used by most database designers today, and while they may be theoretically wrong, it seems to work OK.

> Given that the relational model has proved its advantage over navigational
> systems, the onus now lies on any proposed new data model to prove its
> worth.
> So-called object dbmses are nothing more than a regression to the arbitrary,
> ad hoc, navigational databases of yesteryear.

I've heard your epithets, but I don't understand your proof. You authoritatively call object databases network databases, but (I think) that presumes some common root from which the network must be traversed? That is not (necessarily) the case at all.

> >But that is a heavyweight solution for simple
> >associations that are commonly modelled by repeating foreign key
> information in
> >the phone number table.
> Heavyweight in what sense? In the sense of the example I gave for how your
> original statement was incorrect physically?

Physically or logically? If the relational model is logically isolated from the physical representation, why was my original example incorrect physically?

> >All you've done is require two distinct identifiers (and allowed phone
> number
> >to be one) so they can be stored in your association table.
> In the example I gave for how your original statement is incorrect
> physically, the dbms does not store the data in an association table.
> >Of course, each of
> >these association entries will require a unique identifier...
> The combination of contact id and phone number suffices. The combination is
> familiar, simple and stable.
> >> >> >Perhaps each number would include a "type" tag (home,
> >> >> >cell, etc.). In order to associate this phone information with the
> >> contact
> >> >> >info, either a synthetic ID must be generated or the primary key
> values
> >> >> must be
> >> >> >replicated.
> >> >>
> >> >> I am not sure I understand your complaint. Are you complaining about
> >> >> redundant information in the logical view of the data? Pointers are as
> >> >> redundant, if not more so.
> >> >
> >> >A pointer is a physical implementation of a logical concept.
> >>
> >> A pointers is a logical exposure of a physical concept (location).
> >
> >Since the location of a {thing} is a physical concept, I hope we can agree
> that
> >a pointer is a physical thing.
> You have refuted your own earlier statement that it is a logical concept.


"A pointer is a physical implementation of a logical concept." Pointers are physical.

The intrinsic identity of an object is not a physical concept, although it can be represented physically by a "pointer".

An association between two object is not a physical concept, although it can be implemented in terms of "pointers".

However, an object reference is not a pointer, although it can be turned into one. If it were a pointer, the resulting object would always be required to occupy the same physical address in memory.

> At a single level of indirection, a pointer is a strictly physical thing.
> Others have argued that additional levels of indirection render the pointer
> logical. While I disagree with this position, I see no benefit in arguing
> for or against it.
> In my books, an IP is a physical pointer. It is a physical pointer with a
> complex decoding algorithm. Likewise, an OID is a physical pointer with a
> complex decoding algorithm.
> Additional levels of indirection allow some flexibility for rearranging
> physical locations at the cost of a more complex decoding algorithm, and
> other people would argue that this turns the physical pointer into a logical
> pointer. I won't argue for or against this point; I will simply observe that
> the pointer remains a pointer tightly married to a specific implementation
> with all of the disadvantages that entails.

Sounds to me like you are arguing against the "pointer" becoming logical.

> For instance, we have six billion people on this planet, and we have four
> billion unique IP addresses. What happens when everyone has several devices
> directly connected to the internet?

Subnets. DHCP. NAT. IP is NOT a physical pointer.

> >Let me be more precise. The phone number above has no *semantic* meaning
> unless
> >it is associated with the person whose phone it is.
> Or the organization whose fax it is, or the dial-up ISP whose modem-farm it
> is, or ...
> It always has the semantic meaning of an addressable node in the telephone
> network, and you are correct that by itself it has no further semantic
> meaning. In a sense, phone numbers are more syntactic than semantic.
> Your example presupposes phone numbers, which are assigned by telephone
> companies to individual nodes. In this sense, they are natural logical
> identifiers for users of the connected devices.

I just disagree that they are "natural" logical identifiers. When I had a dial-up ISP, they continually added new phone lines and decommissioned others and had to ask users to change their modem dial-up settings to use the new numbers. Most modem dial-ups store multiple numbers to use in case one or more of the numbers are busy.

A natural logical identifier would be "the Houston Earthink POP" and I wouldn't care what phone number(s) my modem dialed.

I don't mean this in an unkind way, but it seems your thinking about data modeling is stunted by your relational expertise. You immediately look about for some attribute value that can be used to represent logical identity instead of accepting that each "object/value/instance" has intrinsic identity independent of its attribute values. Data modeling should facilitate answering the questions that end-users are likely to want answered.

(go ahead, take your shots)

> At a logical level, a relational dbms exposes that association using
> relations. At a logical level, a navigational dbms exposes that association
> using physical attributes such as pointers or such as proximity thereby
> confusing two very distinct levels of discourse.

At a logical level, the purpose of the relation is to enable traversal from the contact to the phone record and/or vice versa. If I can achieve the same logical result in my object database, I don't see how it is such an evil thing.

You said you wouldn't argue that an indirected pointer is now a logical concept, but it seems central to your criticism of object databases. If it is a logical and not physical association stored in the ODBMS, then your above criticism evaporates.

> >> And you complain about the logical interface of the relational model... ?
> >
> >I (honestly) point out a real short-coming with the (real) commercial
> product
> >with which I program.
> It is a real shortcoming of the logical data model used. When you identify a
> real shortcoming of the relational data model, I will honestly admit it.

Why is it a shortcoming of the data model? Why is it intrinsically not possible to query across associations in an object database? Don't confuse implementation defects with model shortcomings. If this were a model shortcoming, then NO object database would EVER be able to query across associations. Are you sure you want to take on this proof?

> >There is no fundamental reason why this should be so, but
> >it is so and I refuse to play "what if" games.
> As they say "Denial ain't just a river in Egypt."


> >As you are so fond of saying, a
> >failure of commercial products is not a failure of the model.
> What aspect of the object model has your vendor failed to implement that
> results in the above shortcoming?

The ability to query across associations.

> >> >The second part, "the DBMS must *expose* (emphasis mine) the association
> >> ...
> >> >explicitly using values" I don't understand. If there is no *logical*
> value
> >> >that identifies the association, how should this exposure take place.
> >>
> >> The phone number must have a logical identifier, possibly the phone
> number
> >> itself. The contact must have a logical identifier or the users won't be
> >> able to easily identify contacts.
> >
> >Synthetic IDs are evil because they carry no semantic content. How often
> have
> >you mis-dialed a phone number?
> How many times have you mis-dialed the phone number because you accidentally
> pointed at the wrong line in the phone book?

I have never pointed at the wrong line in the phone book and had the wrong number automatically mis-dialed. In every case, I have failed to follow the line from the attribute value of interest (name, possibly address) to the arbitrary ID needlessly exposed into the logical interface (phone number).

> >A "logical" model that forces more of these into
> >the interface is flawed.
> A logical model that pretends they do not exist, or even worse pretends they
> are not necessary, is even more flawed.

It is a chicken and egg scenario. You say, "your model must have a unique logical identifier" so your customer makes one up (e.g., library card number). You then say to me, "you must model the library card number."

Sure, I can have whatever attributes in my model are required. And I can query on them just like you can. But the model was flawed when you had to create an arbitrary ID to support your model (and add the the bookkeeping complexity of generating and managing these IDs)

> >Quite often humans disambiguate by pointing.
> Before humans can point, they must disambiguate. The user cannot point at
> the correct location in the catalogue or on an order unless the user knows
> what he or she wants to identify.

The red one or the blue one? Let me see what the red one would look like!

Is the query to the back end by attribute value (red one) or object reference (I'll even grant you SKU for this one)?

Users do disambiguate by attribute values, but they find it tedious to specify sufficient attribute values to a computer to completely disambiguate. So you show a bag-o-attributes to the user and they pick what they want using whatever mental process they choose. Query By Example.

> >> All the more reason to suggest as simple an interface as possible -- the
> >> relational model.
> >
> >You've missed the point. Why does FedEx assign a tracking number to your
> >package?
> Above, you argue against logical identifiers. Does the irony escape you?

No irony. The relational model with its "simple" relations is too complex for human users. So they create an abstraction to represent the complexity. While I argue against arbitrary ids (like tracking numbers), I freely admit (and see above) that they provide a useful abstraction over value-based relations. They are just inadequate for human consumption and inadequate at representing the real abstraction: the intrinsic identity of the shipment.

> >Because identifying "the package that Bob Badour sent to Jim Melton on
> >Sept 1, 2001" is too complex (although it can easily be represented as a
> >relation). People routinely create concepts that may "add complexity to the
> >interface" in order to sheild themselves from greater complexity.
> Both examples above use the same interface; they are both propositions. The
> relational algebra allows users to derive one from the other. It also allows
> the DBMS to create multiple views -- one derived from another.

Yeah, the theory is wonderful. The application is a bear. Of course, one might argue that the tracking number is a pointer since it can be translated (through some complex algebra) into a record in the database.... And if that tracking number were an OID....

> >> >In order to deal with more
> >> >complex things, we hide complexity behind abstractions.
> >>
> >> Relations are very simple abstractions.
> >
> >One could represent all data as sequences of name-value pairs. Such data
> would
> >extremely simple, but exceedingly complex to work with, because the sequence
> >would be devoid of semantic content.
> It would also lack any theoretical foundation or guiding principle and would
> require people to construct special encodings that expose implementation
> details to users.

How so? Such a construct would fit nicely into a relational model and would benefit from all the mathematical rigor in the relational model. And since there would only be one relation, it should be an exceedingly simple interface.

> >> >Object classes have interfaces that reflect the complexity that is
> >> >already inherent in the data.
> >>
> >> Unfortunately, object classes often go beyond this and expose the
> complexity
> >> inherent in the physical representation of the data as well as that
> inherent
> >> in the data itself.
> >
> >One must question if you understand object technology at all. Since it is
> >completely possible to declare a class that is all interface and no
> >implementation (no data members), it is ludicrous to assert that object
> classes
> >expose implementation details (physical representation).
> And the abstract "order" class exposes no collection, or hash, or bag, or
> array of references to "order items"? The user can identify all associated
> instances of "order item" without resorting to an instance of "order"?

Certainly. Why not?

But why? If you were going to perform a query of the SQL form of "SELECT * FROM ORDER ITEMS WHERE ORDER_ID = "A347Z"", why is that worse than (pseudo language) "[SELECT FROM ORDER WHERE ORDER_ID="A347Z"].orderItems()" ??

In both cases you are still navigating the association.

Don't bother with a more complicated DML example. I'll concede that there are SQL statements that are exceedingly difficult to map directly to an object model. I'll also point out that people sell SQL front-ends to object databases, so I'll guess that people smarter than I have looked at the problem.

> >> >Sure, you can argue that a user must understand
> >> >some amount of the object model to become productive, but I don't see how
> >> that
> >> >is any different in any paradigm.
> >>
> >> There goes that word again. Why do you use it for almost everything? Are
> you
> >> not able to conceive of a meaningful word to use in its place?
> >
> >Obviously not. Why don't you offer an alternative that won't push your hot
> >button.
> The word has too many different meanings and people use it with too little
> understanding of any of them for me to pick a suitable synonym in the above
> context.

"System of thinking". Honestly, I don't think you try.

> "Example" does not seem to make any sense above. "Pattern" does not seem to
> make any sense above, either. Since object dbms lack any consistent
> theoretical framework, that definition makes no sense either.
> If you want to demonstrate that the sentence above has any meaning at all,
> you will have to identify a sensible alternative for me.

I'm losing interest in that altogether.

> >> Users understand relations with very little effort because all relations
> >> have an identical interface using identical operations.
> >
> >Syntax is never particularly interesting.
> Relations are semantic and not syntactic.

The "simple interface" you laud is syntactic. Each and every relation has different semantic meaning. Learning semantics is the key to using data productively.

> >Knowing *what* I can do is a far cry
> >from knowing *why* I would want to do it (and when I would NOT want to do
> it).
> Hence the integrity function of a relational dbms.
> >> >If I don't understand the way all the tables
> >> >are related and what fields join what tables in what context, how
> >> productive
> >> >will I be?
> >>
> >> Very productive. All you need to know is the way the system catalog tables
> >> are related.
> >
> >Nonsense.
> What does an object dbms offer that even begins to compare?

Is it a great surprise to you that ODBMS have similar functionality? Obviously, the DBMS must know how the data it stores is interrelated. But are the system catalog tables the same across vendors? In which case, are you arguing for a concept or an implementation.

> I have taught many users about system catalog tables and I have observed
> many of them repeatedly discover the information they needed by
> interrogating those same tables.
> >We have a diagram that depicts all the tables and relationships between
> tables
> >in a particular database used by our customer. It is incomprehensible.
> I don't doubt it. I am not a big advocate of diagrams.

A picture is worth a thousand words. Querying a system catalog provides fewer cognitive cues than a diagram.

> >> >Object databases use objects naturally to manage complex notions (and
> >> >relationships).
> >>
> >> I have yet to meet a casual database user who found objects natural. In
> >> fact, I have found many experienced, skillful application programmers who
> do
> >> not find them at all natural.
> >
> >It all depends in what circles you move, I suppose. Here in
> >comp.databases.object I think your findings would be somewhat different.
> I have yet to see any evidence of that.

You have yet to look (or listen).

> >> >Yes, I understand the concept. I did not ask you to agree with me.
> >>
> >> You have yet to exhibit any understanding.
> >
> > ... to your satisfaction.
> ... or at all.


> >One of the difficulties in discussing things with you is that you cannot
> agree
> >to disagree.
> I can when I see a reason to.

How about, "you have one opinion, I have another" ?

> >You must be right and I must be wrong. I see your point. I do not
> >agree with it.
> Unfortunately, you do not see my point. I see your point, and I understand
> the fundamental misconceptions from which you derive it. Even when I point
> out how flawed those fundamental misconceptions are, you cling to them and
> actively promote them.

You obviously have some fundamental misconceptions about my fundamental misconceptions. Since you have not understood me, you cannot have pointed out flaws in my thinking. You will not believe that I understand your point and I have proven singularly unsuccessful and communicating mine in a way you can understand. We are at an impasse.

> >Statements such as the above exemplify the allegation I made a
> >while ago about you being an intellectual snob (or something like that). It
> is
> >quite a condescending remark.
> If you espouse and promote the position that creating a unique and arbitrary
> interface for every relationship among data reduces complexity compared to
> using a simple set-based abstraction, you do not understand the concept of
> complexity.

Just a few paragraphs above you pointed out that relations had semantic meaning. Providing a codified representation of that semantic meaning is abstraction. It is not arbitrary (nor necessarily unique, but we've managed to avoid inheritance so far). Your inflammatory language does nothing to make your point more persuasive.

If the same construct (relation) has different semantic meaning in different contexts, then it is more complex than any class interface. If you retreat to some generic definition of relation that strips away semantic content, then you are left with relations being nothing but syntax. In either case, "just relations" is not less complex to me than a carefully thought-out object model.

Once again, it is time to agree to disagree.

> If, however, you use the term object at one time to mean a variable, at
> another time to mean a value, at another time to mean a collection of
> variables, at another time to mean a reference etc., you are simply using
> sloppy terminology of your own.

If, on the other hand, an object can be a variable that comprises (among other things) a collection of variables (or objects), then I am simply being consistent with my own terminology and you are not as facile with object technology vocabulary as you might like to think.

> >I may not use your words
> >with the precision that you would like.
> Unfortunately, we work in a precise field whose primary tasks are tasks of
> communication. Sometimes the communication involves humans, sometimes the
> communication involves machines and sometimes the communication involves
> both.

So far, I've been pretty successful at communicating with both machines and humans.

> You do not even use your words with precision, and this is a real impediment
> to accurate communication.

When people from different cultures meet, it takes concerted effort on the part of all concerned to communicate. You seem to have entered this discussion with extremely strong prejudices and biases that don't allow you to accept at face value the wisdom and experience of others. You have not shown an open mind to anything that might have contradicted your prejudices, but tenaciously clung to them and reiterated them at every opportunity.

I did learn a bit from trying to understand your side of this discussion. I doubt you can say the same.

> >But I have used databases that are
> >called relational and I have used database that are called object-oriented.
> Unfortunately, the databases you were told are called relational are not
> relational, and the databases you were told are called object-oriented are
> nothing more than network model databases with a fresh new scent.
> Until you actually know what a relational dbms is, it is irresponsible to
> make public claims denigrating them.

Well, I don't recall denigrating relational databases. I think I've just been defending object ones. I've tried to be very accommodating of your biases by resorting to terms like SQL database and so-called object database. But if I've offended, I apologize.

> >I
> >have decades of experience in writing software for large, complex systems.
> And
> >IN MY EXPERIENCE, complexity is best managed through the use of objects.
> Since your experience failed to even teach you what a relational dbms is, it
> offers little upon which to base a comparison.

Your arrogance knows no bounds. My experience has exposed me to the best the market has to offer. Theory without implementation is called vaporware. They may not meet up to your academically pure standards of what a relational database is, but the rest of the world understands them to be relational. Since your definition is out of sync with the mainstream, perhaps you need to put a little more effort into communicating and a little less into condescension.

I am fully qualified to base a comparison of commercial RDBMS and commercial ODBMS. That is all I have done.

> >Once again, I don't ask you to agree with me.
> I stand by my earlier statement. I don't expect you to agree.

Which one? That I don't have a clue what complexity is? That I don't know how to use the English language with precision? That I have no understanding? That my experience is worthless?

You are too kind.

Jim Melton, novice guru             | So far as we know, our
e-mail: | computer has never had
v-mail: (303) 971-3846              | an undetected error.

Content-Type: text/x-vcard; charset=us-ascii;
Content-Transfer-Encoding: 7bit
Content-Description: Card for Jim Melton
Content-Disposition: attachment;

fn:Jim Melton

Received on Wed Sep 05 2001 - 09:41:37 CEST

Original text of this message