Re: Clean Object Class Design -- What is it?

From: Bob Badour <bbadour_at_golden.net>
Date: 6 Sep 2001 00:09:09 -0700
Message-ID: <cd3b3cf.0109052309.6d70f1d3_at_posting.google.com>


Jim Melton <Jim.Melton_at_Technologist.com> wrote in message news:<3B95D72C.7889E50C_at_Technologist.com>...
> Bob Badour wrote:
>
> > Jim Melton wrote in message <3B91E25D.32112D29_at_Technologist.com>...
> >
> > >Frequently, users *do* have a hard time disambiguating similar "variables".
> > >(Can you tell twins apart?).
> >
> > Yes. Each generally has a unique name, social security number, drivers
> > license number etc.
>
> And when you *see* two twins are you able to discern their driver's license
> numbers? No. How do you discern them?

Not unless I am a policeman pulling one of them over at the side of a road. If I know them personally, I may examine subtle details (attribute values) of which I am not even fully conscious. If I do not know them so well, I will identify them by name as in "What's your name?" or "Which one are you?".

>Usually, it is spatially because humans
> are able to reason that two "objects" do not occupy the same space at the same
> time.

This, of course, is only helpful after I have identified each twin by some other means, such as name. It does not help me after the twins move unless I am able to fully track the movements of at least one twin.

> But this spatial distinction is a poor choice for programming because it
> is temporally unstable.

It is a poor choice in general for exactly the same reason.  

> Just because someone can assign an arbitrary identifier to something does not
> mean anything.

The fact that, in most cases, nobody needs to has apparently escaped you.

> In fact, these arbitrary identifiers are routine mis-used in
> identity theft.

As would OID. Since OID discourages one from truly recording any other identifying attributes, I suggest that OID will encourage identity theft.

> There is NO more unique identifier than one that captures the
> intrinsic identity of an object.

How exactly is OID intrinsic to anything? My name is intrinsic to me. My driver's license is intrinsic to me. My social security number is intrinsic to me. My hair colour is intrinsic to me. OID? I don't have one of those, and I hope I never do.

> All arbitrary ones are subject to error.

Are you suggesting that an explicit identifier is somehow more arbitrary than a huge random number?

> Fingerprints or retinal scans are better unique identifiers because they are
> truly unique.

Except when one encounters entire families who have blank fingerprints or when one encounters a person who has lost both eyes. Of course, if a fingerprint or eyes are required for the data modelled, these might make decent keys.

> > >That's why query by example is a powerful user
> > >interface technique. When the user says, "That one" he is using a
> > "pointer".
> >
> > He is also using a digit and an index (finger). <g> When the user inspects
> > identifying attributes to decide which to point at, he is using explicit
> > attribute values.
> >
> > Attribute values help the user point when the user must point, but at other
> > times pointers actually get in the user's way.
>
> Since your other responses are consistently esoteric, academic, and detached,
> should I interpret this uncharacteristic levity as agreement? Or simply that
> you cannot refute the argument?
 

I already did refute the argument. Apparently, the point that the user actually identifies the item of interest by attribute values and not by location has managed to elude you. The user communicates those values to the computer using a "pointing" technology inherent to the device and not inherent to the task of identification.  

> > >> >This notion of intrinsic identity is reinforced in object
> > >> >databases by our pointers to objects.
> > >>
> > >> Yes, by pointers to variables. But this does not help users disambiguate
> > >> similar values stored in separate variables. Are you saying that your
> > >> entities have not logical identity? That users cannot disambiguate
> similar
> > >> entities?
> > >
> > >Often not.
> >
> > How not and when not? Users cannot do much of value if they cannot even
> > identify what they are talking about.
>
> Sometimes the "thing of value" is simply to be able to figure out how to
> identify what they are talking about. Consider forensics or intelligence.

Forensic investigators and intelligence gatherers go to great lengths to properly catalog their observations ie. to provide logical identifiers to their observations so that they can manipulate them later.

Can you imagine a forensic investigator walking into a courtroom and making statements about a photograph without being able to identify who took it, where, of what etc?

I present to you Exhibit 'A' -- even in the courtroom, the lawyers attach logical identifiers to pieces of evidence.

> > >A way to reference the data
> > >that recognizes its intrinsic identity is required.
> >
> > Which is why humans have always required human-understood logical
> > identifiers.
>
> Because prior to the advent of computers humans had to perform data retrieval
> and sorting functions that should properly be hidden from the humans.

Are you saying that the identity of evidence in a jury trial should be hidden from humans?

Are you saying that humans should have no means to identify their driving records, credit card accounts, etc. after they drop their PDA or cellphone?

> Computers
> may need to deal with arbitrary identifiers (in which case an OID is as useful
> as any), but humans gain NOTHING by having these values "exposed to the logical
> interface".

...which is exactly why the relational model prohibits the DBMS from exposing them. Instead, the relational model requires that identity be explicit and potentially meaningful to humans.  

> > Restating your previous statement:
> >
> > In a relational database, the example (or pattern) is
> > always to copy the data out of the database, perform some manipulations
> > (as
> > required), then find the appropriate record(s) again and modify whatever
> > values
> > are changed.
> >
> > The example (or pattern) is simply incorrect as an archetype of relational
> > databases. It might be a typical example of an application programme,
> > however. A relational dbms allows one to operate directly on the data
> > without any copying. On the other hand, the majority of so-called object
> > dbmses do require such copying.
>
> As I have said before, a database without an application that uses it is like a
> disk drive with no power applied.

Let me clarify: The pattern given is not intrinsic to all applications, but only to specific applications. If you assume one of those applications, you are not saying anything about the DBMS. You are describing a pattern of a specific application.

As stated above, a relational dbms allows one to operate directly on the data without any copying -- this is a pattern of relational dbmses that few non-relational object dbmses can match.

> A DBMS is infrastructure. Non-programmer "users" don't use the database
> directly.

You haven't met the same users I have. I have worked in several situations where many non-programmer users use the database directly.

Of course, if you work in an ODBMS environment that requires a programming language like Java, C++ or Smalltalk for accessing the database, I can understand how you arrived at this faulty conclusion. You would have no opportunity to see real users at work.

> Programmers write software to enable users to use the data more
> efficiently.

DBA's model data to enable users, including the proper subset of users called programmers, to use the data more efficiently.

> Other users use existing software to use the database.

False generalization.  

> I know you disagree with me, that you see the DBMS as an end to itself, but we
> will not convince each other to change our views. I'll concede that my
> statements were not as crisply precise as you insist upon. Try this:
>
> In *programming* with a relational database, the pattern...

Again, the pattern is a pattern of a specific application and not a pattern of relational databases. One can write many applications using a relational dbms that do not follow this pattern. As I said previously, non-relational object databases require such copying, but relational dbmses do not.

> > When you use the term object in the discussion of data management, you
> > equate object with data ie. the subject matter of the discussion.
> >
> > When you use the term object in the discussion of variables, you equate
> > object with variable ie. the subject matter of the discussion.
> >
> > When you use the term object in the discussion of type, you equate object
> > with type ie. the subject matter of the discussion.
> >
> > When you use the term object in the discussion of identifiers, you equate
> > object with identifier ie. the subject matter of the discussion.
> >
> > When you use the term object in the discussion of relations, you equate
> > object with relation ie. the subject matter of the discussion.
> >
> > Why not simply use data, variable, type, identifier and relation as I
> > suggested?
>
> Because the truth is that an object is all of these.

Untrue. An object variable is a variable. An object's type is a type. All of the above are objects but no "object" is all of them.

> In SmallTalk, for example,
> every "variable" is of an object type (class). Aggregations of "variables"
> (relations?) are similarly characterized by an object type (class) because
> there is no fundamental distinction between "simple" and "complex" data.

Relations are aggregations of variables, but aggregations of variables are not necessarily relations. Relations are unencapsulated sets with a well-defined algebra and a theoretical foundation.

I think you will find a fundamental distinction in the interfaces to simple and aggregation object classes. In fact, you will generally find a fundamental distinction in the interfaces to any two object classes.

> Yes,
> the distinction between class and object is frequently blurred in common
> speech, but that is because the distinction is readily apparent from usage.

In discussions with object oriented programmers, I have found exactly the opposite is true. Sloppy vocabulary usage leads to sloppy thinking to the point where people do not even understand their own statements. When one assumes one understands someone from "usage", one cannot really be sure of one's understanding.

People who truly wish to communicate will gladly clarify their speech instead of insisting on unclear usage.

> A
> specific variable is an instance of a class type, just as it might be an
> instance of an integer type.

I agree.  

> Your words are foreign to me.

Variable, value, type, class, operation? Foreign to a programmer? Surely you jest.

> While they have obviously been imbued with great
> meaning for you, they are confusing and imprecise in communicating with me.

"Variable" is imprecise? "Type" is imprecise? "Identifier" is imprecise? If those terms are confusing and imprecise in communicating with a programmer, what terms precisely and clearly communicate the same concepts to programmers?

> So
> whose vocabulary is right?

What vocabulary would you propose?  

> > >> You have it backward. ODBMSes require the above process, but relational
> > >> databases do not. One can send a set-oriented command to the RDBMS that
> > >> manipulates data entirely within the DBMS process.
> > >
> > >Really. Can I send a set-oriented command to the RDBMS to find a
> least-squares
> > >path through a series of points?
> >
> > Provided the DBMS supports the operation, yes you can.
> >
> > You can also tell the DBMS to increase the amount in an account by the same
> > amount it decreases the amount in another account, without copying all of
> > the account information back and forth. How many object databases can say
> > the same?
>
> I think I can name 2. You see, Objectivity is not truly client/server in that
> the entire DBMS code is loaded into application program space. Consequently,
> the application has exactly the same access to the data as the DBMS.
>
> And I believe that Gemstone/S allows methods to be invoked in the server.

Very good. As a rough percentage, what fraction of object database products do these two products represent?

> > >Can I send a set-oriented command to the RDBMS
> > >to find a statistical probability that two measurements (including their
> error
> > >distributions) represent the same event?
> >
> > Provided the DBMS supports the operation, yes you can.
> >
> > You can also tell the DBMS to delete any information about customers who
> > have not made a purchase in the previous five years without copying any
> > customer or purchase information back and forth. How many object databases
> > can say the same?
>
> See above. Oh, and add that if wishes were horses...

Of the two products you have named, how many will allow you to tell the DBMS to do the above interactively assuming no pre-existing method exists to perform the exact task?

> > >Can I send a set-oriented command to
> > >the RDBMS to predict the likely next state of a Markhov model?
> >
> > Provided the DBMS supports the operation, yes you can.
> >
> > You can also tell the DBMS to populate a workflow table to identify all of
> > the entities whose state must change in the next step of some game, without
> > copying any data back and forth. How many object databases can say the same?
>
> See above. By the way, I presume that this populated workflow table will then
> be "copied" to some "user" who can take advantage of it.

Not necessarily.  

> These are really your weakest arguments yet.

What can I say? I am responding to even weaker arguments. Without any challenge, it's hard to really shine.

> > >Believe it or not, sometimes people want to apply algorithms to the result
> of a
> > >query.
> >
> > If an application requires a local copy of some data, this is a pattern of
> > the application and not a pattern of the relational model. Unfortunately,
> > so-called object dbmses that confuse applications with data management
> > require such copying even when completely unnecessary.
>
> Once again, you have it backwards. Because the table-oriented databases have so
> thoroughly isolated themselves from application programming languages, it is
> necessary to copy data from result table into application data structures
> before algorithmic operations can commence.

Really? Even SQL databases have stored procedures.

> Object databases are more naturally integrated in to application programming
> languages, so the data can be used naturally, without any unnecessary copying.

I believe, if you really think about it honestly, you will see that most object databases require the copying while making it implicit.

> Or, if it will help you to understand what I mean, replace "copy" above with
> "translation".

Since relational domains are object classes, what translation is required? From object class to what? From object variable to what? From object value to what?

> Because I really don't care about the movement of data from disk
> to database cache across a network (or not) to client-side cache (or not).

Of course, you don't care about copying. Your mischaracterization of relational databases backfired, and now you have to distance yourself from your own argument.

> What
> I care about is the completely superfluous step of translating result-table
> columns [object variables] into application program data structures [object variables]

To which superfluous step do you refer?

> > The relational model, by using value based identity, facilitates consistent
> > use of data across disparate applications. The user examines the same
> > identifiers in a relational dbms that the user examines in a spreadsheet, a
> > statistical regression application, on a report, in a UI grid or anywhere
> > else.
>
> Here is one of our disagreements that I will not debate with you. Object theory
> (scoff if you will) says that object identity is intrinsic and independent of
> any attribute values.

Variables have intrinsic identity. The question arises how users identify variables. The relational model uses an abstract, value-based logical identity. A CPU uses a physical address-based physical identity.

An individual variable within a relational dbms need not have any unique attributes values. How, exactly, does this scoff at so-called object theory?

> Relational theory says that identity is value based (so
> you will invent logical identifiers if sufficient uniqueness is not apparent in
> the existing model).

Yes. Humans must do so anyway. Will your human users examine OID to distinguish the identity of similar variables?

> > >In that (extremely common) case, the relational (or SQL if you can prove
> > >otherwise) *paradigm* [pattern] is to copy the data from a result table
> into a data
> > >structure the algorithm can use.
> >
> > As I explain above, this pattern is an attribute of the application and not
> > at all an essential pattern of relational databases or even SQL databases
> > for that matter. It is, however, an essential pattern of most so-called
> > object dbmses.
>
> I'll grant you that it is a pattern of application development. I do not grant
> that (if we leave the application development arena) it is intrinsic to ODBMS
> and extrinsic to RDBMS.

Since no two ODBMS products quite share a single logical data model, nothing is intrinsic to ODBMS -- basically anything goes. Since each ODBMS product invents its own logical data model, copying can be an essential pattern of some of those products. In fact, it is an essential pattern of most of them.

> > >Object databases do not require this extra
> > >step.
> >
> > Are you claiming that object databases operate directly on the data in the
> > database without copying data into application object variables?
>
> Yes.

Possibly two products out of how many?

> > >> >In the object database, this data copying step is eliminated.
> > >>
> > >> Actually, in the object database, this data copying step is required in
> > >> order to make the data available to the application programming language
> for
> > >> data manipulation. It is not required in an RDBMS because relational
> > >> databases have their own data manipulation language.
> > >
> > >You are just creating a different programming environment out of your
> > >(theoretical) RDBMS.
> >
> > Not at all. The DBMS is a data management environment, and it only makes
> > sense that it can manage data directly.
>
> If you can do application level things using the RDBMS language, you are doing
> application programming (accomplishing a *task*).

Yes, if one accepts that a data management language is a programming language, then anytime a user issues an interactive data management command, the user programmes -- even if the language is not turing complete, has no looping or control statements etc.

I guess, by that standard, the user who points at a line in a QBE application might programme too. Or the user who formats a floppy disk. Or the user who types a URL into a web-browser. Or...

> This is a programming
> environment by any definition.

If one defines a data management language as a programming language, I must agree.

> Perhaps it would be useful for me if you would define the scope of "managing
> data", since earlier you allowed this to include complex calculus (including,
> by the way, iteration).

I am not sure what you are driving at here. Are you saying that you believe the JOIN operation requires no iteration? I am not aware that the relational model places any restrictions on the relation operations that a relational dbms may provide except that they result in relations.

> > >If all processing occurs in the context of the DBMS,
> >
> > I have never made any such claim. Are you totally unaware of the conceptual
> > difference between "allow" and "require"? Much of your argumentation style
> > involves equating the two.
>
> As does yours.
> See above where you say "in the object database, this data
> copying step is required... It is not required in an RDBMS." Here you
> prototypically compare implementation products with theoretical possibilities.

Actually, I compare the theoretical possibilities with the theoretical possibilities. You can correctly criticize me for not specifying "most object databases" as I did earlier.

> And your assertion of what is required of an object database is false.

It is a requirement of the logical data model of most object databases. At no point have I confused "allow" with "require"; although, you can fault me for not using the word "most" in the specific example cited.

> > >system cannot scale well and the DBMS becomes a bottleneck.
> >
> > Even though this is a response to a straw man, I must point out that you are
> > imposing additional faulty preconceptions and assumptions in the above
> > statement. Nothing prevents distribution of an RDBMS, nothing prevents
> > parallel processing of data in an RDBMS, nothing prevents massively huge
> > scaling of an RDBMS.
>
> Well, once you start distributing your RDBMS it looks an awful lot the the data
> copying that you were arguing against just a little while ago.

Again, even in a distributed environment, often one can issue a set-oriented command to the dbms that distributed computers can enact without copying any data.  

> I'll grant you that it is theoretically possible to do everything you state
> above. I'll also point out that in current state of the art the database engine
> is a bottleneck for all high-throughput applications.

I would agree that relatively slow devices such as hard-disks and networks create bottlenecks. On the other hand, many extremely high-throughput applications use SQL databases so apparently it is possible to address these issues.

> > >Unless your RDBMS
> > >data manipulation language can support all the kinds of algorithms that are
> > >coded in other languages, this is (at best) a red herring.
> >
> > Presenting a pattern inherent in a specific application as if it were
> > inherent in the logical data model of the dbms is at most a straw man.
>
> Which you cheerfully signed up for just a few paragraphs above.

No, I did not. Read again.

> What I said is that there will be algorithms in yet to be constructed
> applications that are not achievable using only the programming environment of
> the DBMS.

You stated that copying of data is inherent in the relational data model when, in fact, it is not. You stated that it is required, when it is not. Most so-called object dbmses, however, do require copying the data to the application process.

>That's when application programming comes in and the data
> copy/translation "paradigm" I started with comes into play.

You failed to consider many details, however. For instance, you failed to consider that one can tightly integrate an application programming language in a dbms -- even a relational dbms. Currently available, commercial SQL dbmses do it all the time. See PL/SQL, T-SQL etc.

In fact, I would prefer to use an object-oriented programming language raised to the level of abstraction of a relational dbms -- such a language would overcome the deficiencies of 3GL object oriented languages that create impedance mismatch.

> > >Persistence without data copy.
> >
> > Out of curiosity, which object dbmses provide persistence without data copy?
> > How does the object dbms data model support this?
>
> You said persistence implies data copy. I gave an example where it does not.
> And, if you will allow my clarifying comment of translation for copy, I'd
> submit that all object databases provide persistence without translation.

Since a relational domain is an object class, relation variables contain object variables and relation values contain object variables, what translation must a relational dbms perform that an object dbms does not?

> > >> >I say conceptually, because obviously as data is moving to and from disk
> there
> > >> >is copying going on. However, an object reference allows me to manipulate
> a
> > >> >persistent object directly without regard to this copying.
> > >>
> > >> One cannot ignore the copying going on. At a conceptual level, the
> > >> programmer must still specify which object variables get copied into and
> out
> > >> of the application programme's memory. At a conceptual level, the
> programmer
> > >> must still specify when and how to retrieve values from the database.
> > >
> > >One certainly can.
> >
> > One certainly must not. An application that confuses an altered copy of a
> > database value with the actual database value will not operate correctly.
>
> Within the scope of a transaction, what is the difference?

I repeat: An application that confuses an altered copy of a database value with the actual database value will not operate correctly.

> From the time I
> issue an UPDATE command until I commit the transaction, my view of the database
> will be different from any other users.

I never made the claim that one can ignore the copying going on.

> However, in an *application* using a
> table-based database, there is yet another level of indirection, the values
> copied from the result table into application variables.

How does copying the values from object variables in a relational dbms differ from copying the values from object variables in an object dbms?

> The applications
> variables are not guaranteed to be consistent with the transactional state of
> the database for one instruction after they are first assigned.

I agree. However, I am not the one who made the specious claim that programmers can ignore data duplication and transaction boundaries.

> Using an object database, operating on "application variables" is the SAME as
> operating on database values.

This is untrue. Operating on "application variables" operates on in-memory copies of potential database values and any programmer who fails to programme accordingly will write incorrect code.

> If I modify an "application variable" (class
> instance), it is the same as issuing the UPDATE command above.

Nope. If you modify an "application variable", it is the same as modifying an "application variable". Few object dbmses even have the equivalent of an update command because few of them have any intrinsic data manipulation language.

> There is EXACTLY
> ONE consistent view of the database throughout the entire transaction.

Not true. Again, a programmer who succeeds in ignoring the copying and the transaction boundaries will write incorrect code.

> > >It is this point exactly that I was making above. Because
> > >the ODBMS makes a persistent object reference *look* exactly like any other
> > >programming language variable (pointer, if you wish), the application
> > >programmer has no concern for the copying of object variables into/out of
> > >memory.
> >
> > You have not addressed the point I raised above that one cannot simply igore
> > the copying going on. At a conceptual level, the programmer must still
> > specify which object variables get copied into and out of the application
> > programme's memory. This refutes your point, and you have not addressed this
> > counter-argument.
>
> I don't know how else to explain it to you.

I have already demonstrated that your original allegation is false. If you are unwilling to accept that your allegation is false, you can expend as much energy as you want. You cannot prove a contradiction.

> If you have ever used an object
> database, this should be obvious to you because it is so fundamental.

I have used object databases, and I am well aware of the fundamentals. Programmers cannot ignore copied data and transaction boundaries no matter what tool they use.

> If I have
> a persistent class Contact and I have a reference (pointer if you will) to an
> instance of that class, e.g.,
>
> d_Ref<Contact> aContact = ... // retrieve from database
>
> then operations on the "application variable" directly affect the transactional
> state of the database, e.g.,
>
> aContact->address(newAddress);

Are you now claiming that the object dbms immediately reflects changes to application variables immediately without requiring the user to tell it when to save data? Somehow, I doubt it.  

> From that point on, ALL references (in the current transaction) will show the
> updated state.

Likewise, when one updates an application variable while using a relational dbms, all references to that variable change within the current transaction.

> The Contact record may be manipulated and transformed in any
> number of other ways, but the view in the database (pending commit) is the same
> as the view in the "application variables".

Are you now claiming that the object dbms immediately reflects changes to application variables immediately without requiring the user to tell it when to save data? Somehow, I doubt it.

> > At a conceptual level, the programmer must still specify when and how to
> > retrieve values from the database. This also refutes your point, and you
> > have not addressed this counter-argument.
>
> Since your premise is false, the rest of your argument is irrelevant.

If my premise is false, the programmer can manipulate object variables in the dbms without ever identifying them. That's astounding, and I would very much like to see how they do that.

> > >Conceptually, the programmer must query the database for objects of
> interest,
> > >but this is not a concept unique to persistent data.
> >
> > It does, however, invalidate your prior argument that the application
> > programmer has "no concern" for such things.
>
> I did not assert that the application programmer had no concern for queries.

I quote you from above: "the application programmer has no concern for the copying of object variables into/out of memory". Since queries copy variables into/out of memory, how can a programmer have concern for queries?

> I
> also pointed out that querying a data store is not unique to a DBMS, or even
> persistent data in general.

Actually, you falsely claimed it is a specific weakness of relational dbmses.

> > >Conceptually, the programmer must be cognizant of transaction boundaries
> and
> > >transaction semantics. I can't think of any way to avoid this unless you
> give
> > >up the concept of ACID transactions (including rollback).
> >
> > Again, it invalidates your prior argument that the application programmer
> > has "no concern" for such things.
>
> Argument never made.

Again, I quote you from above: "the application programmer has no concern for the copying of object variables into/out of memory".

> > >> >(By the way, I consider this whole difference in paradigm with regard to
> > >> >explicit copying into/out of the database as one of the key
> > >> >philosophical/architectural differences between object databases and
> > >> >relational/SQL databases)
> > >>
> > >> Paradigm: A set of assumptions, concepts, values, and practices that
> > >> constitutes a way of viewing reality for the community that shares them,
> > >> especially in an intellectual discipline.
> > >
> > >My dictionary had a slightly different definition (see above). Or, if you
> > >prefer, try:
> >
> > Actually, the above is just one of several alternate definitions. If you
> > mean pattern or example, why not say pattern or example? Why the twenty-five
> > cent word?
>
> Because it is a perfectly valid word that means precisely what I wished to say.

Which of its several meanings did it mean again? How did your "usage" determine that meaning?

> > Those who use the word are not even clear on what it means or which of
> > several meanings they intend. If those who use the word intended clear
> > communication, they would choose a less ambiguous synonym. I can only
> > conclude that they intend to obfuscate.
>
> I am very clear of what it means and precisely what meaning I intended. I'll
> reiterate my accusation that you are a vocabulary snob.

Which of its several meanings did it mean again? How did your "usage" determine that meaning?

You claim that "[My] words are foreign to [you]. ...They are confusing and imprecise in communicating with [you]."

You insist on using a word with at least three distinct meanings where it is not at all clear which of at least two of them you mean, but I am the vocabulary snob. Get real!

> > >All of this because
> > >you chose to react to my choice of words instead of the point I was making?
> >
> > Your point was that copying of data is inherent to the relational model, and
> > I have demonstrated the point's impotence. Your use of the word "instead"
> > above misleads.
>
> No, you have blithely misunderstood me. I've made it clear in the past that I
> am an implementor, not a theorist. My words should be interpreted from that
> perspective

Even from that perspective, I have demonstrated your point's impotence. Your use of the word "instead" above misleads.

> My point was that copying (translating) data is inherent to programming with
> the (so called) relational databases available on the market to day.

I have demonstrated the falsity of this claim too. Even with SQL databases, one can often perform significant set-oriented data manipulations without copying any data at all. Almost all, if not all, currently available commercial SQL databases provide some form of stored procedure language.

> That step
> is obviated with the (so called) object databases available on the market
> today.

Actually, you will find that copying of data into application memory is inherent in most so-called object dbmses on the market today, while it is, in fact, not inherent in any relational dbms or even in any SQL dbms.

> > I think it is important for people to understand the intellectual bankruptcy
> > of the word "paradigm". Folks often use it to sound intellectual when they
> > have no intention of using any intellect.
>
> Arrogant.

Yes, I agree it is arrogant to attempt to sound intellectual without any intention of using intellect.

> > Given the number of hypesters and hack writers using the term, people can
> > easily fall into a lazy habit of aping them. One gains a very valuable
> > discipline by expunging the word from one's vocabulary.
>
> Your opinion. However, I do not intend to ask your permission before using any
> word in my vocabulary.

You don't have to ask permission, but you will continue to receive criticism.

> > >> I don't think physical copying has much to do with the differences in the
> > >> "paradigms".
> > >
> > >Let me try to be more precise for you. Physical copying (or even logical
> > >copying?) is a fundamental difference between programming with a
> result-table
> > >(or cursor) database and an object database.
> >
> > Copying has nothing to do with the logical data model of the dbms. One can
> > conceive of a day when we raise the level of application programming
> > languages to more closely match the level of relational databases -- in
> > order to obviate "impedance mismatch". No "logical" copying would be
> > required for application programming because such a programming language
> > would have statements appropriate to operate directly on relation variables.
>
> And one might argue (in fact, I think I just did!) that the object databases
> available on the market today do just that.

All non-relational object dbmses lower the level of abstraction of the dbms to that of a 3GL programming language. In so doing, most of them go further and require a separate application using some specific programming language to perform even the simplest of data manipulations.

> In fact, this is the single largest
> advantage to using an ODBMS over a relational alternative.

I don't see how crippling the dbms can provide any kind of advantage.

> > Again, you are assuming that a pattern inherent to the application is
> > inherent to the dbms to build a straw man. Any good programmer will tell you
> > that good abstractions hide inessential physical implementation details and
> > that horrible abstractions attempt to hide essential logical details.
>
> No straw man. Real world experience. Anyone who has programmed using both
> (so-called) relational databases and (so-called) object databases should be
> able to substantiate this point.

I have used both relational dbmses and so-called object dbmses, and I have refuted the point.

> Translation of data into and out of the
> database engine is a physical implementation detail and it should be abstracted
> away.

Every SQL dbms that I have ever used has abstracted away details of any translation.

> Or are you arguing that it is a valuable element of the logical model that
> should be preserved

My prior argument reflected your prior focus on copying of data. Since a relational dbms does not need to perform any more or any less translation than an object dbms, your newly rewritten point is fatuous at best.

> > I suggest to you that so-called object dbmses that attempt to hide the
> > copying inherent to an application programme attempt to hide essential
> > logical details.
>
> Which would be?
 

Transaction boundaries and transaction semantics.  

> > >When I say "SELECT A from FOO" I
> > >must bind the returned value(s) for A to application-space variables before
> I
> > >can use them.
> >
> > This is an attribute of your application programming environment.
> >
> > >Furthermore, if my algorithm ends up changing the value of A, I
> > >must then issue an explicit "UPDATE FOO values (A = newvalue)" to ensure
> the
> > >change is propagated to persistent memory.
> >
> > Again, this is an attribute of your application programming environment.
> >
> > >Note that before this update step,
> > >the changed value of A is available to other processing in application
> space
> > >and my application does not have a coherent view of the data space.
> >
> > How is this any different from the changed value of an object variable prior
> > to committing the change to persistent storage?
>
> Either you did not read or you did not understand the explanation I gave below.

Nope. I read it. I understood it quite well. You have not answered the question.

> > >In an ODBMS, the same "SELECT from FOO" will return me object reference(s)
> to
> > >FOO objects.
> >
> > Do you not see how dynamic heap allocation and local copies of the data are
> > inherent to this? Do you not see how this requires and exposes physical
> > details to the user?
>
> Nope. Where is heap or local copies intrinsic to accessing the data?

Where does the odbms put all of the object variables it returns references to?

> If you
> want to get into the gory implementation details of a particular ODBMS, I'm
> pretty knowledgeable of how Objectivity does it, but I don't see how that is
> fundamental to the model.
>
> What physical detail is exposed to the user (let's constrain the user for this
> discussion to the application programmer)?

A pointer or pointer(s).

> > >If my algorithm needs the A value, it simply uses it [ print
> > >obj->a() ].
> >
> > Which is inherently a local copy of the A value as it exists in the dbms
> > that may no longer match the A value in the dbms.
>
> Absolutely not. It is in fact guaranteed to be the A value in the dbms within
> the context of the current transaction. This includes any and all modifications
> to the A value for the duration of the transaction.

By that standard, an application variable in an application using a relational dmbs is guaranteed to have the value of the application variable within the context of the current transaction as well. What of it?

> > >If it needs to update the value, it does it directly [ obj->a(
> > >newvalue ) ].
> > >The data space is consistent within my transaction (a second
> > >SELECT statement will automatically see the updated value of A), but not
> > >propagated to other transactions until the commit boundary.
> >
> > Again, this is not inherent to the data model. It is a property of the
> > application programming environment, specifically to the middle-ware for
> > lack of a better term.
>
> I think we are mixing terms again. I don't see a huge difference in the "data
> model" between relational and object methodologies. We are, after all, modeling
> the same data.

If that is the case, you do not know what a logical data model even is.

> It is more a difference in how that model is represented.

The relational data model is entirely about how the dbms represents data to users of the dbms, and it is a huge difference.

> The
> fact is that the object databases represent the data in a form more natural to
> object programming environments. No question.

Since relational databases are object databases, I have no problem with the above statement.

> The DBMS (any DBMS) is by
> definition middleware.

The DBMS is not middleware. The DBMS is the DBMS. All middleware is ultimately tied to a single application programming environment.

> > >Thus, the programmer does not write any code to copy values into or out of
> > >application space.
> >
> > Except for the code you omitted that queries the dbms for the value of obj,
>
> Well, the code I omitted was omitted from both examples and is roughly
> equivalent.

In that case, the code presented as an example of an application's use of an object dbms is a valid example for both an object dbms and a relational dbms. Since one example describes both, I am not sure what you intend by it.

> The key difference is that the ODBMS user does not specify result
> columns because a *reference* to the entire object is returned, rather than
> *copies* of specific columns in a result table.

Again, this is an attibute of your application programming environment and is not intrinsically different between ODBMS and RDBMS.

> My object database supports pretty much the same kind of attribute predicates
> that a SQL database does.

What does JOIN or UNION mean in your object database? Do queries in your ODBMS result in an object value the way that a relational query results in a relation? If so, what type is the object value? Is it encapsulated?

> In fact, they offer an add-on SQL engine (so I guess
> they support all the predicates SQL does).

Does the add-on SQL engine provide full access to all class methods and properties? Does it support cartesian product (ie. multiple object classes in the "FROM" clause)? Does it support extension (ie. derived or calculated values in the SELECT list)?

> > and the code you omitted that commits the changes to the dbms.
>
> trans.commit();

Since that provides a suitable example for both database types, I am not sure what you intend by it.

> > >The PATTERN (paradigm) of table programming is copying data.
> >
> > You have not demonstrated this. You have demonstrated that the pattern of
> > your application is copying data, and you have demonstrated two different
> > methods that the middleware to an SQL database can accomplish this copying.
>
> Fine. Show me an example of how I can perform algorithmic operations not
> supported by the DBMS on a query result from the DBMS without copying from a
> result table into application variables using any commercial (not so-called
> object) database of your choice.

Choose any commercially available SQL product that provides a stored procedure language and write a stored procedure. Show me how any non-relational object DBMS can perform algorithmic operations not supported by the DBMS on a query result from the DBMS without copying data into application variables.

> > >The PATTERN (paradigm) of object programming is not.
> >
> > You have not demonstrated this, either. You have demonstrated that the
> > object dbms, by limiting the user to only one of the methods above, exposes
> > physical implementation details in its abstraction while attempting to hide
> > logical details in its abstraction.
>
> I can only say, "Huh?"

I can only suggest you actually try to understand what I have been writing.

> > >> >This whole concept of intrinsic identity is extremely critical in my
> domain
> > >> >because often we do NOT know what attribute value could be used to
> uniquely
> > >> >identify an object. Sometimes, all we know is that there is an object
> observed
> > >> >or inferred through some phenomenology. Over time, we hope to discover
> more
> of
> > >> >the attribute values attributable to that object, but in the mean time it
>
> must
> > >> >be distinct from all other objects under consideration.
> > >>
> > >> How do the users of your system identify the distinct instances under
> > >> consideration?
> > >
> > >Different ways in different contexts.
> >
> > But you want the dbms to use a single way, OID, in all contexts? Does the
> > irony elude you?
>
> Gaaack!! How can someone seemingly so intelligent be so dense?

I was wondering the same myself.

> Sometimes my users gesture to an icon on a map. Sometimes the follow a
> hyperlink on a web page. Sometimes they look at a row in a tabular display.

How does this change the fact that you want the dbms to use a single logical identifier, OID, in all contexts?

> > >> >Object databases handle this representation of uniqueness with object
> > >> >references (commonly referred to as OIDs).
> > >>
> > >> Using pointers, yes, I know that. We already know what a disaster it is to
>
> > >> expose pointers to users. If you do not expose OID to users, how do users
> > >> identify unique instances?
> > >
> > >See, I don't get your point. An OID is not a pointer. In the database system
> I
> > >use, an OID has a native representation (4 16-bit numbers) and a stringified
>
> > >representation ( #dd-cc-pp-ss ). Neither of these are "pointers" any more
> than
> > >a rowID is a pointer. Yet, because of the operator overloading in OO
> languages,
> > >they can appear as a pointer to the programmer.
> >
> > Show either representation to casual database users and ask them whether
> > OIDs are pointers. By the way, a rowID is a pointer.
>
> Why would I EVER want to show one to a casual database user? What possible
> purpose would that serve?

If that is the only logical identifier, that is the only way a user can identify the data. When you lose your credit card, how will you report the loss to the credit card company if you do not know its OID?

> > >Again, though, user interfaces are written to facilitate users doing their
> > >jobs.
> >
> > Do you honestly think that users use OIDs to identify their data?
>
> Nope. Nor would I ever claim that they should do so.

If that is the only logical identifier for the data, you force them to. What happens when someone drops their PDA? Or when it is lost in a fire? etc.

> > >When a user of an on-line ordering system orders a new printer, he does
> > >NOT copy the SKU number into an order-entry text field. He clicks on a
> picture
> > >of the product. The user is POINTING to the data of interest.
> >
> > The user communicates to the application programme via the physical location
> > of an image on the screen, because this is inherent to the communication
> > medium. However, the user communicates the identifying SKU to the
> > application programme via this medium and does not communicate the physical
> > location which will change in the very next instant.
>
> More precisely, the user clicks on an active user-interface element of the
> browser. The action associated with that element is to send a message to the
> back-end order-entry system. What is the content of the message? Depends on
> your system. It could be a SKU number (requiring a database lookup to find the
> product information that was just displayed to the user, such as price and
> availability)
>
> OR
>
> it could be an object reference to the inventory item represented on the
> display, providing direct access to all the requisite information to complete
> the order with no redundant query.

If one wants to use a reference to an application variable to obviate additional queries, one can do that without requiring the DBMS to expose pointers.

When the back-end order-entry system tips over and the user has to telephone in his order, how will he know the correct OID?

> You just can't tell by looking.

The question is: Why must the DBMS expose pointers in order for an application to use pointers?

> But my point remains, users are quite comfortable POINTING at data items of
> interest.

However, that does not mean that users are quite comfortable choosing the correct items of interest using database pointers. They need information to identify the things they point at.

> It shouldn't be so difficult to see how application programs could do
> exactly the same thing to their advantage.

It might sound good if you say it fast, but it does not change several decades of evidence to the contrary.

> > >Why can't
> > >software do the same thing?
> >
> > Because the user identified the appropriate SKU to the system using a
> > fleeting location and did not mentally identify the data by its location.
>
> Hmmm. You design software differently than I do.

Are you claiming that the user mentally identified the data by its location and not by some set of attribute values?

> > Software tried data management using pointers decades ago and it proved
> > impractical. Should we outfit our infantry with catapults and broadswords?
>
> What kind of "argument" is this? Argumentum ad ridiculum?

Your position is equivalent and the comparison fair.

> Decades ago the programming languages were not sufficient to properly take
> advantage of pointers.

The problem is not in programming languages but in a task which transcends application development -- data management.

> > >But I guess I could argue that OIDs do not *require*
> > >navigation.
> >
> > Really? How do users manipulate order line items without navigating orders?
>
> How about selecting order line items directly?

The user can answer questions like: "What is the average number of items in an order?" by accessing only order line items?

> > >Again, though, they are not of much use without it.
> >
> > Knowing the average order size or price is useless? Knowing the average
> > shipment size is useless? Really?
>
> Of course, here you are asking question of orders.

I am asking questions of order line items. Order line items contain the information regarding quantity and price that I desire.

> "Logically" you want to ask
> each order for its total price or size or each shipment for its size.

Logically, I want to group order items by an attribute value common among items in the same order to calculate a total price or size and then I want to group the result by no attributes to calculate an overall average price or size.

I do not need to make any reference to orders at all.

> Physically, you may have to group and sum order line items, but isn't this a
> physical implementation detail?

No, it is an entirely logical request. The request in no manner specifies how the dbms should physical evaluate the result. In fact, the request in no manner specifies how the dbms physically stores the source data -- the dbms might navigate to orders to perform the calculation or it might not.

> Wouldn't it be better to ask each order for its
> total price and just compute the average in a straight-forward manner?

Wouldn't it be better yet to query a view that derives the total price and quantity from the order line items and just request the averages in a straight-forward manner?

> > >Arbitrary numbers are implementation artifacts of
> > >systems that cannot properly represent intrinsic object identity.
> >
> > Are you honestly suggesting that OIDs will replace driver's license numbers,
> > social security numbers, product codes etc? Are you suggesting that users
> > find them more accessible than the existing artifacts? Are you suggesting
> > that OIDs are neither artifacts nor arbitrary?
>
> OIDs are implementation details as well. At issue is the representation of
> intrinsic identity such that I can unambiguously identify the "thing" of
> interest regardless of its attribute values.

First, you need to recognize that the "thing" to which you refer is nothing more than an object variable. Once you recognize that fact, you will realize that the relational model requires a logical identifier to identify all object variables regardless of their attribute values.

> The goal is object linking.

Relation values link object values, and relation variables link object variables. The goal of a logical identifier is object variable identity.

> If the
> policeman who pulls me over for speeding swipes my driver's license through a
> card reader that communicates with the DMV's database, who is to say what
> information was used to pull up my record? Perhaps it could be an OID (or a URI
> or some other "pointer" that is not value-based).

Since you must have some mechanism to report the driver's license lost or stolen or to identify yourself to the DMV from a pay-phone, what use is an OID?

> I am saying (and you are not hearing because you are convinced that the
> opposite is true) that OIDs are NOT (properly) exposed to the user interface.

No matter how many times you state that applications do not necessarily expose direct OID representations to the users of applications, you will not address the issue of the DBMS exposing pointers to users of the DBMS.

> They are NOT query elements. They are NOT used to specify joins (joins do not
> need to be specified because they are pre-computed and stored).

Of course, they are query elements. When the DBMS forces you to navigate through an "order" to manipulate "order items", it uses the pointer as an implicit query element.

With an ODBMS, one cannot specify joins because the data model is too weak to even know what such a thing is. Your statement about precomputation and storage speak volumes regarding physical independence and ODBMS' lack thereof.

> > >For example, a telephone number is an arbitrary identifier (although more
> > >closely related to a pointer) for a specific end-point in the telephone
> > >network.
> >
> > Have you never used a reverse-lookup feature on the internet? A telephone
> > number is an arbitrary identifier for users of the telephone network. The
> > transportation company I use for travel to the airport identifies its
> > customers by phone number. This has its drawbacks, of course. A video store
> > I used to frequent also used phone numbers to identify customers, and this
> > caused problems too.
>
> Yes, arbitrary IDs do have problems. They do not adequately capture intrinsic
> object identity.

The problems had to do with an innappropriate choice of identifier. In the case of transportation company, multiple phone numbers identify the same customer. In the case of the video store, a single phone number identified multiple customers.

> What happens when you move and the phone company re-issues your phone number?
> Oops.

This is yet another problem. However, it is one that the transportation resolves in its business process. The reservationist verifies my name; if it matches, the reservationist verifies my address.

> > The video store solved its problems by giving people without phones (or
> > people who shared a phone with others) another number that was not a valid
> > local phone number.
>
> Another arbitrary ID.

What ID is less arbitrary? Do you expect us to believe that a huge random number is less arbitrary?

> > In other words, as the system grew to the point where it required data
> > management, the system required a logical identifier usable by both humans
> > and machines.
>
> In other words, the system grew to the point where it required an identifier
> (if a phone number is logical, so is an OID)

In the contexts of the uses above, a phone number is anything but a pointer. It is an attribute value that the transportation company and the video store must collect in any case. I agree that it makes a poor identifier, but apparently for different reasons than you do.

> that was machine readable and

Nothing in particular makes the phone number any more or less machine readable at the video store than any other identifier. I suspect they used it so that people could remember their account number even when they forgot their card.

> technology did not advance to the point where this number could be shielded
> from the human users.

Well, once you have your way, I hope you enjoy telling people your OID so they can call you. Newspapers will either hate it or love it depending on whether they charge classified ad customers by the word or by the character.

> > >But if I had a way to "gesture" to your entry
> > >in my "contact database" and pass a direct end-point (pointer) to my
> telephone
> > >(or to my e-mail program, or to my envelope printer), then arbitrary,
> synthetic
> > >IDs would phase out as archaic relics of an unenlightened past.
> >
> > I guess that's why we invented IP and DNS.
>
> IP is a pointer. But, with firewalls and NAT, it is an easily masqueraded
> pointer.
>
> DNS is a logical identifier.

Actually, DNS is a pointer too. It just has additional levels of indirection and a more complex decoding algorithm.

> Nice try, but we aren't there yet.

If you mean we haven't phased out "arbitrary, synthetic IDs", I must agree. I can easily and accurately predict that we never will.

> > >> >This synthetic ID is stored in each phone number so
> > >> >that it can be joined back to the contact.
> > >>
> > >> Incorrect, both logically and physically. Logically: An association table
> > >> might expose the relationship between contact id and phone number.
> > >> Physically: An RDBMS might store the phone number with the contact fields
> > >> using juxtaposition to identify the contact, but if it does so, it exposes
>
> > >> the association to the user using the contact identifier and phone number.
>
> > >
> > >If you wish to design your database such that all associations are through a
>
> > >distinct "association table", that's fine.
> >
> > The second (ie. physical) example did not do so.
> >
> > >Object modelling has "link classes"
> > >that perform the same purpose.
> >
> > And what advantage do these link classes provide over relations? Simpler
> > interface? More consistent interface? Principled foundation? Psychological
> > advantage? ??
>
> Ummm, I don't believe I claimed an advantage. I believe my words were "the same
> purpose".

If it performs the same purpose poorly by every measure, it is not much of an alternative, is it?

> As near as I can tell from your previous posting and the background
> reading I've been able to do, a relation is just a concept. It still have to be
> translated into a programming language representation to be implemented on a
> computer.

I am not sure what you mean by "programming language representation". If you mean that data is information encoded for machine manipulation, I must point out that the standard vocabulary already makes the distinction clear. In fact, the Compute article that drew my attention to the ISO standard focused on that exact distinction.

> And frankly (feel free to jump in here and call me ignorant or
> pseudo-intellectual or obfuscating or whatever comes to mind), I don't see such
> a huge difference between modeling a separate "association table" and adding
> the join key to the phone number. The former is *more* complex to me, not less.
> The latter is the more common approach used by most database designers today,
> and while they may be theoretically wrong, it seems to work OK.

I am not sure what you mean by "join key". If you mean that one can model phone numbers in a table with a composite key (contact_id,phone_num) and no association table, then I must point out that the resulting table is an association table.

> > Given that the relational model has proved its advantage over navigational
> > systems, the onus now lies on any proposed new data model to prove its
> > worth.
> >
> > So-called object dbmses are nothing more than a regression to the arbitrary,
> > ad hoc, navigational databases of yesteryear.
>
> I've heard your epithets, but I don't understand your proof. You
> authoritatively call object databases network databases, but (I think) that
> presumes some common root from which the network must be traversed?

Nope. A network model dbms can have multiple roots. In fact, a hierarchic model dbms can have multiple roots. A hierarchic model dbms can have at most one referencing "parent" object for each object while a network model dbms can have multiple referencing "parent" objects for each object.

> > >But that is a heavyweight solution for simple
> > >associations that are commonly modelled by repeating foreign key
> information in
> > >the phone number table.
> >
> > Heavyweight in what sense? In the sense of the example I gave for how your
> > original statement was incorrect physically?
>
> Physically or logically? If the relational model is logically isolated from the
> physical representation, why was my original example incorrect physically?

Search above on "Incorrect, both logically and physically". Your example claimed that an RDBMS must store a "Synthetic ID" with the phone number, and I explained how that statement is incorrect.

> > >> >> >Perhaps each number would include a "type" tag (home,
> > >> >> >cell, etc.). In order to associate this phone information with the
> contact
> > >> >> >info, either a synthetic ID must be generated or the primary key
> values
> must be
> > >> >> >replicated.
> > >> >>
> > >> >> I am not sure I understand your complaint. Are you complaining about
> > >> >> redundant information in the logical view of the data? Pointers are as
> > >> >> redundant, if not more so.
> > >> >
> > >> >A pointer is a physical implementation of a logical concept.
> > >>
> > >> A pointers is a logical exposure of a physical concept (location).
> > >
> > >Since the location of a {thing} is a physical concept, I hope we can agree
> that
> > >a pointer is a physical thing.
> >
> > You have refuted your own earlier statement that it is a logical concept.
>
> Gaaacckkk!!!
>
> "A pointer is a physical implementation of a logical concept." Pointers are
> physical.

Location is not physical???

> The intrinsic identity of an object is not a physical concept

I realize that identity is a logical concept, which is why pointers like OID should have nothing to do with identity.

>, although it can
> be represented physically by a "pointer".

OID is a pointer. Just as IP or DNS is a pointer.

> An association between two object is not a physical concept, although it can be
> implemented in terms of "pointers".

Again, I agree that associations are logical and should be modelled logically as relations model them.

> However, an object reference is not a pointer, although it can be turned into
> one. If it were a pointer, the resulting object would always be required to
> occupy the same physical address in memory.

You mean how an IP address must always identify the same computer? Pointers don't have to always identify the same location, do they? If we use one or more levels of indirection and a more complicated decoding algorithm, we gain a little flexibility with relocating things. However, we do not overcome the data management problems inherent to navigation.

> > At a single level of indirection, a pointer is a strictly physical thing.
> > Others have argued that additional levels of indirection render the pointer
> > logical. While I disagree with this position, I see no benefit in arguing
> > for or against it.
> >
> > In my books, an IP is a physical pointer. It is a physical pointer with a
> > complex decoding algorithm. Likewise, an OID is a physical pointer with a
> > complex decoding algorithm.
> >
> > Additional levels of indirection allow some flexibility for rearranging
> > physical locations at the cost of a more complex decoding algorithm, and
> > other people would argue that this turns the physical pointer into a logical
> > pointer. I won't argue for or against this point; I will simply observe that
> > the pointer remains a pointer tightly married to a specific implementation
> > with all of the disadvantages that entails.
>
> Sounds to me like you are arguing against the "pointer" becoming logical.

I think all pointers are physical including OID.

> > For instance, we have six billion people on this planet, and we have four
> > billion unique IP addresses. What happens when everyone has several devices
> > directly connected to the internet?
>
> Subnets. DHCP. NAT. IP is NOT a physical pointer.

I do not see how any of these will resolve the problem when everyone has several devices directly connected to the internet. Only expanding the number of available IP address beyond four billion will resolve the issue. Of course, the IP standards folks are working at this. Hopefully, they will complete the standard in time to rewrite every IP based application that assumes a 32-bit representation.

If IP is not a physical pointer, it is a logical pointer. I already said I will not argue that point. Regardless, it remains a pointer to a physical hardware device.

> > >Let me be more precise. The phone number above has no *semantic* meaning
> > unless
> > >it is associated with the person whose phone it is.
> >
> > Or the organization whose fax it is, or the dial-up ISP whose modem-farm it
> > is, or ...
> >
> > It always has the semantic meaning of an addressable node in the telephone
> > network, and you are correct that by itself it has no further semantic
> > meaning. In a sense, phone numbers are more syntactic than semantic.
> >
> > Your example presupposes phone numbers, which are assigned by telephone
> > companies to individual nodes. In this sense, they are natural logical
> > identifiers for users of the connected devices.
>
> I just disagree that they are "natural" logical identifiers.

They are natural logical identifiers for phone numbers as your original problem stated.

> When I had a
> dial-up ISP, they continually added new phone lines and decommissioned others
> and had to ask users to change their modem dial-up settings to use the new
> numbers. Most modem dial-ups store multiple numbers to use in case one or more
> of the numbers are busy.

In the case of a pool of phone numbers, a phone number does not suffice for identity. This is a different problem requiring a different design.

> A natural logical identifier would be "the Houston Earthink POP" and I wouldn't
> care what phone number(s) my modem dialed.

Again, this is a different problem requiring a different solution. I find it interesting that you did not use an OID for your logical identifier, since you seem to want to promote them.

> I don't mean this in an unkind way, but it seems your thinking about data
> modeling is stunted by your relational expertise.

I don't mean this in an unkind way, but it seems your thinking about data modelling is stunted by your lack of understanding of the simplest of concepts even after explained to you.

> You immediately look about
> for some attribute value that can be used to represent logical identity instead
> of accepting that each "object/value/instance" has intrinsic identity
> independent of its attribute values.

Since you have not realized this yet, I must point out that values are self-identifying. The value of 5 is 5. Object variables have intrinsic identity independent of their attribute values in the relational model. Relation variables have intrinsic identity independent of their attribute values, too.

> Data modeling should facilitate answering
> the questions that end-users are likely to want answered.

Of course. I agree.

> > At a logical level, a relational dbms exposes that association using
> > relations. At a logical level, a navigational dbms exposes that association
> > using physical attributes such as pointers or such as proximity thereby
> > confusing two very distinct levels of discourse.
>
> At a logical level, the purpose of the relation is to enable traversal from the
> contact to the phone record and/or vice versa.

No traversal required. At a logical level, the purpose of the relation is to simply state facts.

> If I can achieve the same
> logical result in my object database, I don't see how it is such an evil thing.

Relations do not require navigation. Non-relational object dbmses require navigation. Navigation causes huge, previously demonstrated problems for data management. Relational proponents have decades of evidences to support this claim.

> You said you wouldn't argue that an indirected pointer is now a logical
> concept, but it seems central to your criticism of object databases.

Required navigation and unnecessary complexity are central to my criticism of so-called object dbmses. Whether a pointer is logical or physical is irrelevant to either point, which is why I said I won't bother to argue it either way.

> If it is
> a logical and not physical association stored in the ODBMS, then your above
> criticism evaporates.

How do the required navigation and unnecessary complexity evapourate?

> > >> And you complain about the logical interface of the relational model... ?
> > >
> > >I (honestly) point out a real short-coming with the (real) commercial
> > product
> > >with which I program.
> >
> > It is a real shortcoming of the logical data model used. When you identify a
> > real shortcoming of the relational data model, I will honestly admit it.
>
> Why is it a shortcoming of the data model? Why is it intrinsically not possible
> to query across associations in an object database?

Because it has no theoretical foundation, no closed algebra and no simple common interface.

> Don't confuse
> implementation defects with model shortcomings.

Don't worry: I don't.

> If this were a model
> shortcoming, then NO object database would EVER be able to query across
> associations. Are you sure you want to take on this proof?

I have already stated many times that relational dbmses are object-oriented dbmses. When an object model evolves that has a theoretical foundation, a closed algebra and a simple common interface, what do you suppose it will look like?

> > >As you are so fond of saying, a
> > >failure of commercial products is not a failure of the model.
> >
> > What aspect of the object model has your vendor failed to implement that
> > results in the above shortcoming?
>
> The ability to query across associations.

Are you claiming that the ability to query across associations is a requirement of the object data model?

> > >> >The second part, "the DBMS must *expose* (emphasis mine) the association
> > >> ...
> > >> >explicitly using values" I don't understand. If there is no *logical*
> > value
> > >> >that identifies the association, how should this exposure take place.
> > >>
> > >> The phone number must have a logical identifier, possibly the phone
> > number
> > >> itself. The contact must have a logical identifier or the users won't be
> > >> able to easily identify contacts.
> > >
> > >Synthetic IDs are evil because they carry no semantic content. How often
> > have
> > >you mis-dialed a phone number?
> >
> > How many times have you mis-dialed the phone number because you accidentally
> > pointed at the wrong line in the phone book?
>
> I have never pointed at the wrong line in the phone book and had the wrong
> number automatically mis-dialed. In every case, I have failed to follow the
> line from the attribute value of interest (name, possibly address) to the
> arbitrary ID needlessly exposed into the logical interface (phone number).

Whether you fat-finger the line or fat-finger the key, the results are the same. What identifier are you going to publish in newspaper classified ads?

> > >A "logical" model that forces more of these into
> > >the interface is flawed.
> >
> > A logical model that pretends they do not exist, or even worse pretends they
> > are not necessary, is even more flawed.
>
> It is a chicken and egg scenario. You say, "your model must have a unique
> logical identifier" so your customer makes one up (e.g., library card number).
> You then say to me, "you must model the library card number."

Nope. I say your users must have a logical identifier for their own needs, and then I say to you that you must model the logical identifier.

> Sure, I can have whatever attributes in my model are required. And I can query
> on them just like you can.

While I am certain that you can filter on them, I doubt very much that you can query on them just like I can. For instance, what does JOIN or UNION mean to an object?

> But the model was flawed when you had to create an
> arbitrary ID to support your model

Human users need logical identifiers. Your model was flawed when you had to create an arbitrary, incomprehensible pointer to support your model.

> (and add the the bookkeeping complexity of
> generating and managing these IDs)

What bookkeeping? What complexity? Your assumption is false.

> > >Quite often humans disambiguate by pointing.
> >
> > Before humans can point, they must disambiguate. The user cannot point at
> > the correct location in the catalogue or on an order unless the user knows
> > what he or she wants to identify.
>
> The red one or the blue one? Let me see what the red one would look like!

Red and blue are attribute values.

> Is the query to the back end by attribute value (red one) or object reference
> (I'll even grant you SKU for this one)?

By attribute value... "Show me all the red ones."

> Users do disambiguate by attribute values, but they find it tedious to specify
> sufficient attribute values to a computer to completely disambiguate.

I disagree. They do not find it tedious to pick an item from a list. They do not find it tedious to read their credit card number into the telephone. They do not find it tedious to write down an interesting phone number.

Tie them to OID, and they will find the simplest of tasks very tedious indeed.

> So you
> show a bag-o-attributes to the user and they pick what they want using whatever
> mental process they choose. Query By Example.

QBE does not require OID. The mental process does require identifying attributes. Which is necessary and which is redundant, synthetic and arbitrary?

> > >> All the more reason to suggest as simple an interface as possible -- the
> > >> relational model.
> > >
> > >You've missed the point. Why does FedEx assign a tracking number to your
> > >package?
> >
> > Above, you argue against logical identifiers. Does the irony escape you?
>
> No irony. The relational model with its "simple" relations is too complex for
> human users. So they create an abstraction to represent the complexity.

First, I disagree that tabular representations of data are too complex for human users. They use them quite naturally all the time.

Assuming that they were too complex, are you suggesting that the more complex, navigational object model will do anything but make matters worse?

> While I
> argue against arbitrary ids (like tracking numbers)

and OID... Oh wait, you argue FOR arbitrary ids. Don't you?

> I freely admit (and see
> above) that they provide a useful abstraction over value-based relations. They
> are just inadequate for human consumption and inadequate at representing the
> real abstraction: the intrinsic identity of the shipment.

Come back after you convince your customers to identify their shipments by OID.

> > >Because identifying "the package that Bob Badour sent to Jim Melton on
> > >Sept 1, 2001" is too complex (although it can easily be represented as a
> > >relation). People routinely create concepts that may "add complexity to the
> > >interface" in order to sheild themselves from greater complexity.
> >
> > Both examples above use the same interface; they are both propositions. The
> > relational algebra allows users to derive one from the other. It also allows
> > the DBMS to create multiple views -- one derived from another.
>
> Yeah, the theory is wonderful.

The practical and already delivered result is wonderful too.

>The application is a bear.

Are you suggesting that requiring all users and all applications to have a single view of data is less of a bear?

> Of course, one might
> argue that the tracking number is a pointer since it can be translated (through
> some complex algebra) into a record in the database....

One can use it as a pointer. One can use any value as a pointer. However, it is a value and one can use it without navigation for a variety of purposes. One can use it to look up waypoints without having any access to package information, for instance.

> And if that tracking
> number were an OID....

One would have to navigate through the package object to access any related data.

> > >> >In order to deal with more
> > >> >complex things, we hide complexity behind abstractions.
> > >>
> > >> Relations are very simple abstractions.
> > >
> > >One could represent all data as sequences of name-value pairs. Such data
> > would
> > >extremely simple, but exceedingly complex to work with, because the sequence
> >
> > >would be devoid of semantic content.
> >
> > It would also lack any theoretical foundation or guiding principle and would
> > require people to construct special encodings that expose implementation
> > details to users.
>
> How so? Such a construct would fit nicely into a relational model and would
> benefit from all the mathematical rigor in the relational model.

In order to represent repeating groups of data, one would have to encode positional elements into the name part of the name/value pair.

> And since
> there would only be one relation, it should be an exceedingly simple interface.

It would not be a relation. It would not allow relational algebra.

> > >> >Object classes have interfaces that reflect the complexity that is
> > >> >already inherent in the data.
> > >>
> > >> Unfortunately, object classes often go beyond this and expose the
> > complexity
> > >> inherent in the physical representation of the data as well as that
> > inherent
> > >> in the data itself.
> > >
> > >One must question if you understand object technology at all. Since it is
> > >completely possible to declare a class that is all interface and no
> > >implementation (no data members), it is ludicrous to assert that object
> > classes
> > >expose implementation details (physical representation).
> >
> > And the abstract "order" class exposes no collection, or hash, or bag, or
> > array of references to "order items"? The user can identify all associated
> > instances of "order item" without resorting to an instance of "order"?
>
> Certainly. Why not?
>
> But why? If you were going to perform a query of the SQL form of "SELECT * FROM
> ORDER ITEMS WHERE ORDER_ID = "A347Z"", why is that worse than (pseudo language)
> "[SELECT FROM ORDER WHERE ORDER_ID="A347Z"].orderItems()" ??

It's not worse. It's much better. The user does not need access to ORDER to manipulate "ORDER ITEMS" and the user can actually manipulate views instead.

"SELECT AVG(TOTAL_PRICE) AS AVG_PRICE, AVG(TOTAL_QTY) AS AVG_QTY FROM ORDER_TOTALS" The view doesn't even have to refer to ORDER.

> In both cases you are still navigating the association.

Bullshit.  

> Don't bother with a more complicated DML example. I'll concede that there are
> SQL statements that are exceedingly difficult to map directly to an object
> model.

I don't need a more complicated DML to demonstrate that your arguments are fatuous. I'll concede that there are simple, useful queries that are exceedingly difficult to express with a non-relational object dbms.

> > >> >Sure, you can argue that a user must understand
> > >> >some amount of the object model to become productive, but I don't see how
> >
> > >> that
> > >> >is any different in any paradigm.
> > >>
> > >> There goes that word again. Why do you use it for almost everything? Are
> > you
> > >> not able to conceive of a meaningful word to use in its place?
> > >
> > >Obviously not. Why don't you offer an alternative that won't push your hot
> > >button.
> >
> > The word has too many different meanings and people use it with too little
> > understanding of any of them for me to pick a suitable synonym in the above
> > context.
>
> "System of thinking". Honestly, I don't think you try.

Any different in any example. Any different in any pattern. Any different in any system of thinking.

In the relational system of thinking, the dbms empowers the user with a simple abstraction, common query operations and a self-describing system catalog so users can discover what they need to become productive. It amounts to the difference between giving a fish and teaching to fish.

> > >> Users understand relations with very little effort because all relations
> > >> have an identical interface using identical operations.
> > >
> > >Syntax is never particularly interesting.
> >
> > Relations are semantic and not syntactic.
>
> The "simple interface" you laud is syntactic.

The interface is semantic. The rules of a specific query language are syntactic.

> Each and every relation has
> different semantic meaning.

Each and every relation is a set of truth propositions with a well-defined predicate. This is semantic not syntactic.

> Learning semantics is the key to using data
> productively.

I agree, which is why I recommend using a semantically rich, self-describing dbms with a simple interface.

> > >Knowing *what* I can do is a far cry
> > >from knowing *why* I would want to do it (and when I would NOT want to do
> > it).
> >
> > Hence the integrity function of a relational dbms.
> >
> > >> >If I don't understand the way all the tables
> > >> >are related and what fields join what tables in what context, how
> > >> productive
> > >> >will I be?
> > >>
> > >> Very productive. All you need to know is the way the system catalog tables
> >
> > >> are related.
> > >
> > >Nonsense.
> >
> > What does an object dbms offer that even begins to compare?
>
> Is it a great surprise to you that ODBMS have similar functionality?

A single, simple interface useful for querying both the data and the meta-data that uses a common language, query algebra etc. ? Yes, that would suprize me.

> > >We have a diagram that depicts all the tables and relationships between
> > tables
> > >in a particular database used by our customer. It is incomprehensible.
> >
> > I don't doubt it. I am not a big advocate of diagrams.
>
> A picture is worth a thousand words. Querying a system catalog provides fewer
> cognitive cues than a diagram.

I disagree. Querying a system catalog allows semantic filtering.

> > >> >Object databases use objects naturally to manage complex notions (and
> > >> >relationships).
> > >>
> > >> I have yet to meet a casual database user who found objects natural. In
> > >> fact, I have found many experienced, skillful application programmers who
> > do
> > >> not find them at all natural.
> > >
> > >It all depends in what circles you move, I suppose. Here in
> > >comp.databases.object I think your findings would be somewhat different.
> >
> > I have yet to see any evidence of that.
>
> You have yet to look (or listen).

I am aware that you hold that belief. However, your belief does not make it so.

> > >You must be right and I must be wrong. I see your point. I do not
> > >agree with it.
> >
> > Unfortunately, you do not see my point. I see your point, and I understand
> > the fundamental misconceptions from which you derive it. Even when I point
> > out how flawed those fundamental misconceptions are, you cling to them and
> > actively promote them.
>
> You obviously have some fundamental misconceptions about my fundamental
> misconceptions. Since you have not understood me, you cannot have pointed out
> flaws in my thinking.

Except that I did and I did.

> You will not believe that I understand your point and I
> have proven singularly unsuccessful and communicating mine in a way you can
> understand.

I have understood all of your points. None of them were valid, but I understood them quite well.

> > >Statements such as the above exemplify the allegation I made a
> > >while ago about you being an intellectual snob (or something like that). It
> > is
> > >quite a condescending remark.
> >
> > If you espouse and promote the position that creating a unique and arbitrary
> > interface for every relationship among data reduces complexity compared to
> > using a simple set-based abstraction, you do not understand the concept of
> > complexity.
>
> Just a few paragraphs above you pointed out that relations had semantic
> meaning. Providing a codified representation of that semantic meaning is
> abstraction. It is not arbitrary (nor necessarily unique, but we've managed to
> avoid inheritance so far). Your inflammatory language does nothing to make your
> point more persuasive.
>
> If the same construct (relation) has different semantic meaning in different
> contexts, then it is more complex than any class interface.

A relation always has the same semantic meaning. Class interfaces are simply very poor at expressing the semantic meaning of data.

> If you retreat to
> some generic definition of relation that strips away semantic content, then you
> are left with relations being nothing but syntax.

I don't need to retreat to anything.

> In either case, "just
> relations" is not less complex to me than a carefully thought-out object model.

Again, you cannot take a simple interface simpler by adding complexity.

> Once again, it is time to agree to disagree.

If you insist.

> > If, however, you use the term object at one time to mean a variable, at
> > another time to mean a value, at another time to mean a collection of
> > variables, at another time to mean a reference etc., you are simply using
> > sloppy terminology of your own.
>
> If, on the other hand, an object can be a variable that comprises (among other
> things) a collection of variables (or objects), then I am simply being
> consistent with my own terminology and you are not as facile with object
> technology vocabulary as you might like to think.

You have not addressed the usages that mean value or reference. Again, simply sloppy terminology.

> > >I may not use your words
> > >with the precision that you would like.
> >
> > Unfortunately, we work in a precise field whose primary tasks are tasks of
> > communication. Sometimes the communication involves humans, sometimes the
> > communication involves machines and sometimes the communication involves
> > both.
>
> So far, I've been pretty successful at communicating with both machines and
> humans.

Communicating is two way, and I have observed reception problems at your end.

> > You do not even use your words with precision, and this is a real impediment
> > to accurate communication.
>
> When people from different cultures meet, it takes concerted effort on the part
> of all concerned to communicate. You seem to have entered this discussion with
> extremely strong prejudices and biases that don't allow you to accept at face
> value the wisdom and experience of others.

If I can readily observe that the "wisdom and experience" derive from ignorance and false assumption, I have little reason to accept them at face value.

> You have not shown an open mind to
> anything that might have contradicted your prejudices, but tenaciously clung to
> them and reiterated them at every opportunity.

While you have no way to know it, you have described the antithesis of my mental development regarding database management. I once held the prejudice that object classes were the end-all and be-all of complexity management. I once held the prejudice that navigational data models have an inherent performance advantage over the relational data model. I once held the prejudice that duplicate rows present no problems for data management. I once held the prejudice that relational theory is impractical. I once held the prejudice that all dbms users are programmers.

I kept an open mind and let go of those prejudices when I found they did not match reality.

> I did learn a bit from trying to understand your side of this discussion. I
> doubt you can say the same.

I did not learn much, but not for the reasons you might think.

> > >I
> > >have decades of experience in writing software for large, complex systems.
> > And
> > >IN MY EXPERIENCE, complexity is best managed through the use of objects.
> >
> > Since your experience failed to even teach you what a relational dbms is, it
> > offers little upon which to base a comparison.
>
> Your arrogance knows no bounds. My experience has exposed me to the best the
> market has to offer. Theory without implementation is called vaporware. They
> may not meet up to your academically pure standards of what a relational
> database is, but the rest of the world understands them to be relational.

Based in some way on relational principles... true. The best the market has to offer... maybe, so far. Unfortunately, the problems you chastise relational dbmses for are failures of SQL vendors to actually implement the relational model.

Since you criticize the relational model for the problems caused by vendors not delivering it, and since you defend regressive navigational models as solutions when they only make the problems worse, I must conclude that your experience failed to teach you what a relational dbms is.

Since most of the rest of the world is completely ignorant or misinformed about the relational model, I am not surprized that most of the rest of the world disagrees with me. Fortunately, a small minority out there are educated and well-informed so some hope remains.

> Since
> your definition is out of sync with the mainstream, perhaps you need to put a
> little more effort into communicating and a little less into condescension.

You insist on a definition of "relational database" with which relational proponents disagree. You consider your experience infallible. You refuse to acknowledge the glaring contradictions in your assumptions no matter how clearly and succinctly communicated. But I am the one who is arrogant, biased, closed-minded and a vocabulary snob.

Projection ain't just a relational operator, ya know!

> I am fully qualified to base a comparison of commercial RDBMS and commercial
> ODBMS. That is all I have done.

You have done it poorly -- attributing the weaknesses caused by SQL's failure to adhere to the relational model to the relational model itself. Since you do not even know what a relational dbms is, you are not qualified to compare RDBMS with anything. Received on Thu Sep 06 2001 - 09:09:09 CEST

Original text of this message