Re: Clean Object Class Design -- What is it?

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 3 Sep 2001 19:21:16 -0400
Message-ID: <TzUk7.760$4O3.134388753_at_radon.golden.net>


Jim Melton wrote in message <3B91E25D.32112D29_at_Technologist.com>...
>
>Bob Badour wrote:
>
>> Jim Melton wrote in message <3B9085E5.A5547CC1_at_Technologist.com>...
>> >
>> >Bob Badour wrote:
>> >
>> >> >The only
>> >> >way to model the associations as value-based would be to create
>> synthetic
>> >> IDs.
>> >>
>> >> Why? Do the entities you manipulate have no logical identity?
>> >
>> >One of the fundamental concepts of object technology is that objects
have
>> >intrinsic identity independent of any attribute values they may posess
at
>> any
>> >point in time.
>>
>> Please use well-defined terms. Object variables (instances) have
intrinsic
>> identity as do all variables. However, this does not help users
disambiguate
>> similar yet separate variables.
>
>An object is an object. A "thing" under consideration. This concept of
object
>identity is independent of computers and databases. "The red-haired woman
whom
>I saw cross the street last Tuesday" is an object.

Objectifying women, are we? ;->

And she has identifying attributes that we can use to distinguish her from all other red-haired women.

>"The block of data residing
>in sector 2057, track 32 of disk B" is an object.

And it has identifying attributes. When recording information about the sector, feel free to use those attributes.

>"The complete financial
>record of Bob Badour" is an object.

And it has identifying attributes.

>Frequently, users *do* have a hard time disambiguating similar "variables".
>(Can you tell twins apart?).

Yes. Each generally has a unique name, social security number, drivers license number etc.

>That's why query by example is a powerful user
>interface technique. When the user says, "That one" he is using a
"pointer".

He is also using a digit and an index (finger). <g> When the user inspects identifying attributes to decide which to point at, he is using explicit attribute values.

Attribute values help the user point when the user must point, but at other times pointers actually get in the user's way.

>> >This notion of intrinsic identity is reinforced in object
>> >databases by our pointers to objects.
>>
>> Yes, by pointers to variables. But this does not help users disambiguate
>> similar values stored in separate variables. Are you saying that your
>> entities have not logical identity? That users cannot disambiguate
similar
>> entities?
>
>Often not.

How not and when not? Users cannot do much of value if they cannot even identify what they are talking about.

>Logical identity is independent of the value of temporally changing
>attributes, yes?

Which is why human beings have long been constructing artifacts for identification. We tatoo symbols on our dogs. We put names on our streets and numbers on our houses. Artists number their prints. Producers stamp lot numbers, serial numbers, model numbers and bin numbers on their products, and did so prior to the advent of computers.

Sometimes the identifying attributes, themselves, change over time, but good identifying attributes change infrequently.

>But if all I know about something is "The red-haired woman
>whom I saw cross the street last Tuesday", there is not suffient
information to
>represent logical identity, unless I make up an arbitrary identifier.

One chooses logical indentifiers based on the proposed use. If all you want to do is gush to your friends, the above information suffices on its own to identify the woman.

If one proposes a different use, one must ask what that use is, what information one requires for that use, what information is optional, what information is absolutely mandatory etc.

>There is
>no reason to assume that another red-haired woman crossing the street next
>Tuesday will necessarily be the same woman.

Again, if all you want to do is gush to your friends, all you need to know is whether the woman is the same woman or a different woman. Fortunately, your human mind picks up on myriad identifying attributes, but that presupposes that you actually know more about the woman than your initial statement.

>However, if I want to accumulate evidence in support of a hypothesis, all
>information, no matter how sketchy can be of value.

Scientific observers have for a long time numbered, dated, catalogued and annotated observations. They certainly did this long before the advent of the computer.

>A way to reference the data
>that recognizes its intrinsic identity is required.

Which is why humans have always required human-understood logical identifiers.

>> >In a relational database, the paradigm is
>> >always to copy the data out of the database, perform some manipulations
(as
>> >required), then find the appropriate record(s) again and modify whatever
>> values
>> >are changed.
>>
>> There goes that word again. I am convinced that you confuse yourself with
>> pretentious, nebulous terminology. Instead of calling everything a
paradigm,
>> try identifying exactly what you want to say. Instead of calling
everything
>> an object, try identifying exactly what you want to say.
>
>This is why it's really hard to talk to you, Bob. I said exactly what I
wanted
>to say.
>
>From Miriam Webter (http://www.m-w.com)
>paradigm: 1 : EXAMPLE, PATTERN; especially : an outstandingly clear or
typical
>example or archetype

Restating your previous statement:

   In a relational database, the example (or pattern) is    always to copy the data out of the database, perform some manipulations (as

   required), then find the appropriate record(s) again and modify whatever    values
   are changed.

The example (or pattern) is simply incorrect as an archetype of relational databases. It might be a typical example of an application programme, however. A relational dbms allows one to operate directly on the data without any copying. On the other hand, the majority of so-called object dbmses do require such copying.

>object: 4 : a thing that forms an element of or constitutes the subject
matter
>of an investigation or science

So, then you reject the ISO/IEC standard vocabulary of:

   A set of operations and data that store and retain the effect    of the operations.

or

   An element of a data structure such as a file, an array, or an    operand, that is needed for the execution of programs.

When you use the term object in the discussion of data management, you equate object with data ie. the subject matter of the discussion.

When you use the term object in the discussion of variables, you equate object with variable ie. the subject matter of the discussion.

When you use the term object in the discussion of type, you equate object with type ie. the subject matter of the discussion.

When you use the term object in the discussion of identifiers, you equate object with identifier ie. the subject matter of the discussion.

When you use the term object in the discussion of relations, you equate object with relation ie. the subject matter of the discussion.

Why not simply use data, variable, type, identifier and relation as I suggested?

>> You have it backward. ODBMSes require the above process, but relational
>> databases do not. One can send a set-oriented command to the RDBMS that
>> manipulates data entirely within the DBMS process.
>
>Really. Can I send a set-oriented command to the RDBMS to find a
least-squares
>path through a series of points?

Provided the DBMS supports the operation, yes you can.

You can also tell the DBMS to increase the amount in an account by the same amount it decreases the amount in another account, without copying all of the account information back and forth. How many object databases can say the same?

>Can I send a set-oriented command to the RDBMS
>to find a statistical probability that two measurements (including their
error
>distributions) represent the same event?

Provided the DBMS supports the operation, yes you can.

You can also tell the DBMS to delete any information about customers who have not made a purchase in the previous five years without copying any customer or purchase information back and forth. How many object databases can say the same?

>Can I send a set-oriented command to
>the RDBMS to predict the likely next state of a Markhov model?

Provided the DBMS supports the operation, yes you can.

You can also tell the DBMS to populate a workflow table to identify all of the entities whose state must change in the next step of some game, without copying any data back and forth. How many object databases can say the same?

>Believe it or not, sometimes people want to apply algorithms to the result
of a
>query.

If an application requires a local copy of some data, this is a pattern of the application and not a pattern of the relational model. Unfortunately, so-called object dbmses that confuse applications with data management require such copying even when completely unnecessary.

The relational model, by using value based identity, facilitates consistent use of data across disparate applications. The user examines the same identifiers in a relational dbms that the user examines in a spreadsheet, a statistical regression application, on a report, in a UI grid or anywhere else.

>In that (extremely common) case, the relational (or SQL if you can prove
>otherwise) *paradigm* [pattern] is to copy the data from a result table
into a data
>structure the algorithm can use.

As I explain above, this pattern is an attribute of the application and not at all an essential pattern of relational databases or even SQL databases for that matter. It is, however, an essential pattern of most so-called object dbmses.

>Object databases do not require this extra
>step.

Are you claiming that object databases operate directly on the data in the database without copying data into application object variables?

>> >In the object database, this data copying step is eliminated.
>>
>> Actually, in the object database, this data copying step is required in
>> order to make the data available to the application programming language
for
>> data manipulation. It is not required in an RDBMS because relational
>> databases have their own data manipulation language.
>
>You are just creating a different programming environment out of your
>(theoretical) RDBMS.

Not at all. The DBMS is a data management environment, and it only makes sense that it can manage data directly.

>If all processing occurs in the context of the DBMS,

I have never made any such claim. Are you totally unaware of the conceptual difference between "allow" and "require"? Much of your argumentation style involves equating the two.

>system cannot scale well and the DBMS becomes a bottleneck.

Even though this is a response to a straw man, I must point out that you are imposing additional faulty preconceptions and assumptions in the above statement. Nothing prevents distribution of an RDBMS, nothing prevents parallel processing of data in an RDBMS, nothing prevents massively huge scaling of an RDBMS.

>Unless your RDBMS
>data manipulation language can support all the kinds of algorithms that are
>coded in other languages, this is (at best) a red herring.

Presenting a pattern inherent in a specific application as if it were inherent in the logical data model of the dbms is at most a straw man.

>> >The
>> >database becomes much less of external entity (conceptually) and data is
>> >manipulated (conceptually) directly.
>>
>> This is simply untrue. Conceptually, one must control persistence, and
the
>> term persistence, itself, implies a copy of data.
>
>No, persistence merely implies that the data exists outside the scope of
>program execution. How this is accomplished is a *physical*, implementation
>detail.

I'll cede that.

>Persistence without data copy.

Out of curiosity, which object dbmses provide persistence without data copy? How does the object dbms data model support this?

>> >I say conceptually, because obviously as data is moving to and from disk
>> there
>> >is copying going on. However, an object reference allows me to
manipulate a
>> >persistent object directly without regard to this copying.
>>
>> One cannot ignore the copying going on. At a conceptual level, the
>> programmer must still specify which object variables get copied into and
out
>> of the application programme's memory. At a conceptual level, the
programmer
>> must still specify when and how to retrieve values from the database.
>
>One certainly can.

One certainly must not. An application that confuses an altered copy of a database value with the actual database value will not operate correctly.

>It is this point exactly that I was making above. Because
>the ODBMS makes a persistent object reference *look* exactly like any other
>programming language variable (pointer, if you wish), the application
>programmer has no concern for the copying of object variables into/out of
>memory.

You have not addressed the point I raised above that one cannot simply igore the copying going on. At a conceptual level, the programmer must still specify which object variables get copied into and out of the application programme's memory. This refutes your point, and you have not addressed this counter-argument.

At a conceptual level, the programmer must still specify when and how to retrieve values from the database. This also refutes your point, and you have not addressed this counter-argument.

>Conceptually, the programmer must query the database for objects of
interest,
>but this is not a concept unique to persistent data.

It does, however, invalidate your prior argument that the application programmer has "no concern" for such things.

>Conceptually, the programmer must be cognizant of transaction boundaries
and
>transaction semantics. I can't think of any way to avoid this unless you
give
>up the concept of ACID transactions (including rollback).

Again, it invalidates your prior argument that the application programmer has "no concern" for such things.

>> >(By the way, I consider this whole difference in paradigm with regard to
>> >explicit copying into/out of the database as one of the key
>> >philosophical/architectural differences between object databases and
>> >relational/SQL databases)
>>
>> Paradigm: A set of assumptions, concepts, values, and practices that
>> constitutes a way of viewing reality for the community that shares them,
>> especially in an intellectual discipline.
>
>My dictionary had a slightly different definition (see above). Or, if you
>prefer, try:

Actually, the above is just one of several alternate definitions. If you mean pattern or example, why not say pattern or example? Why the twenty-five cent word?

>3 : a philosophical and theoretical framework of a scientific school or
>discipline within which theories, laws, and generalizations and the
experiments
>performed in support of them are formulated

This is actually a synonymous definition to the one I chose and is not at all equivalent to your earlier definition as example or pattern.

>Your point?

Those who use the word are not even clear on what it means or which of several meanings they intend. If those who use the word intended clear communication, they would choose a less ambiguous synonym. I can only conclude that they intend to obfuscate.

>> The object oriented community have false assumptions, nebulous concepts,
>> warped values and arbitrary practices. The relational community have
>> explicit assumptions, precisely defined concepts, principled values and
>> reasoned practices.
>
>My, you are painting with an awfully broad brush tonight.

I have earlier demonstrated all of the above assertions and nobody has yet offered a valid counter to any of them.

>All of this because
>you chose to react to my choice of words instead of the point I was making?

Your point was that copying of data is inherent to the relational model, and I have demonstrated the point's impotence. Your use of the word "instead" above misleads.

I think it is important for people to understand the intellectual bankruptcy of the word "paradigm". Folks often use it to sound intellectual when they have no intention of using any intellect.

Given the number of hypesters and hack writers using the term, people can easily fall into a lazy habit of aping them. One gains a very valuable discipline by expunging the word from one's vocabulary.

>> I don't think physical copying has much to do with the differences in the
>> "paradigms".
>
>Let me try to be more precise for you. Physical copying (or even logical
>copying?) is a fundamental difference between programming with a
result-table
>(or cursor) database and an object database.

Copying has nothing to do with the logical data model of the dbms. One can conceive of a day when we raise the level of application programming languages to more closely match the level of relational databases -- in order to obviate "impedance mismatch". No "logical" copying would be required for application programming because such a programming language would have statements appropriate to operate directly on relation variables.

Again, you are assuming that a pattern inherent to the application is inherent to the dbms to build a straw man. Any good programmer will tell you that good abstractions hide inessential physical implementation details and that horrible abstractions attempt to hide essential logical details.

I suggest to you that so-called object dbmses that attempt to hide the copying inherent to an application programme attempt to hide essential logical details.

>When I say "SELECT A from FOO" I
>must bind the returned value(s) for A to application-space variables before
I
>can use them.

This is an attribute of your application programming environment.

>Furthermore, if my algorithm ends up changing the value of A, I
>must then issue an explicit "UPDATE FOO values (A = newvalue)" to ensure
the
>change is propagated to persistent memory.

Again, this is an attribute of your application programming environment.

>Note that before this update step,
>the changed value of A is available to other processing in application
space
>and my application does not have a coherent view of the data space.

How is this any different from the changed value of an object variable prior to committing the change to persistent storage?

>In an ODBMS, the same "SELECT from FOO" will return me object reference(s)
to
>FOO objects.

Do you not see how dynamic heap allocation and local copies of the data are inherent to this? Do you not see how this requires and exposes physical details to the user?

>If my algorithm needs the A value, it simply uses it [ print
>obj->a() ].

Which is inherently a local copy of the A value as it exists in the dbms that may no longer match the A value in the dbms.

>If it needs to update the value, it does it directly [ obj->a(
>newvalue ) ].
>The data space is consistent within my transaction (a second
>SELECT statement will automatically see the updated value of A), but not
>propagated to other transactions until the commit boundary.

Again, this is not inherent to the data model. It is a property of the application programming environment, specifically to the middle-ware for lack of a better term.

>Thus, the programmer does not write any code to copy values into or out of
>application space.

Except for the code you omitted that queries the dbms for the value of obj, and the code you omitted that commits the changes to the dbms.

>The PATTERN (paradigm) of table programming is copying data.

You have not demonstrated this. You have demonstrated that the pattern of your application is copying data, and you have demonstrated two different methods that the middleware to an SQL database can accomplish this copying.

>The PATTERN (paradigm) of object programming is not.

You have not demonstrated this, either. You have demonstrated that the object dbms, by limiting the user to only one of the methods above, exposes physical implementation details in its abstraction while attempting to hide logical details in its abstraction.

>> >This whole concept of intrinsic identity is extremely critical in my
domain
>> >because often we do NOT know what attribute value could be used to
uniquely
>> >identify an object. Sometimes, all we know is that there is an object
>> observed
>> >or inferred through some phenomenology. Over time, we hope to discover
more
>> of
>> >the attribute values attributable to that object, but in the mean time
it
>> must
>> >be distinct from all other objects under consideration.
>>
>> How do the users of your system identify the distinct instances under
>> consideration?
>
>Different ways in different contexts.

But you want the dbms to use a single way, OID, in all contexts? Does the irony elude you?

>> >Object databases handle this representation of uniqueness with object
>> >references (commonly referred to as OIDs).
>>
>> Using pointers, yes, I know that. We already know what a disaster it is
to
>> expose pointers to users. If you do not expose OID to users, how do users
>> identify unique instances?
>
>See, I don't get your point. An OID is not a pointer. In the database
system I
>use, an OID has a native representation (4 16-bit numbers) and a
stringified
>representation ( #dd-cc-pp-ss ). Neither of these are "pointers" any more
than
>a rowID is a pointer. Yet, because of the operator overloading in OO
languages,
>they can appear as a pointer to the programmer.

Show either representation to casual database users and ask them whether OIDs are pointers. By the way, a rowID is a pointer.

>Again, though, user interfaces are written to facilitate users doing their
>jobs.

Do you honestly think that users use OIDs to identify their data?

>When a user of an on-line ordering system orders a new printer, he does
>NOT copy the SKU number into an order-entry text field. He clicks on a
picture
>of the product. The user is POINTING to the data of interest.

The user communicates to the application programme via the physical location of an image on the screen, because this is inherent to the communication medium. However, the user communicates the identifying SKU to the application programme via this medium and does not communicate the physical location which will change in the very next instant.

>Why can't
>software do the same thing?

Because the user identified the appropriate SKU to the system using a fleeting location and did not mentally identify the data by its location.

Software tried data management using pointers decades ago and it proved impractical. Should we outfit our infantry with catapults and broadswords?

>> >SQL databases can generate synthetic
>> >IDs such as rowID (that are virtually the same as OIDs).
>>
>> Except that they are symmetric and do not require navigation.
>
>Pardon me for not being fully cognizant of your vocabulary, but how are
they
>"symmetric" (and how are OIDs not)?

Sorry, I missed the fact that you were referring to rowID. Those are pointers too, and exposing them is just one way SQL databases fail to implement the relational model.

I did not read your sentence with sufficient care and I thought you were referring to a dbms feature supporting sequential numeric identifiers. RowID is instead a feature exposing pointers to rows. Again, I apologize for the confusion.

>While they do not *require* navigation, they are commonly used for
precisely
>that by the database.

Actually, they are commonly used for precisely that by *users*. However, users do not always need or want to navigate.

>But I guess I could argue that OIDs do not *require*
>navigation.

Really? How do users manipulate order line items without navigating orders?

>Again, though, they are not of much use without it.

Knowing the average order size or price is useless? Knowing the average shipment size is useless? Really?

>> >However, if there are
>> >no attributes that can be used to create a distinct relation, how would
a
>> >relational database handle this concept of intrinsic identity?
>>
>> Identity is intrinsic to variables. Relation variables are uniquely
>> identified by name. Tuple variables are uniquely identified by relation
name
>> and key value. Object variables are uniquely identified by relation name,
>> key value and column name.
>
>And if there is no unique key value, you manufacture one?

If there is no unique key value, the data modeller has failed to properly identify the entities in the system. How will users identify the entities if the dbms cannot? How will the dbms identify the entities if the user cannot?

>Why? Why is a contact "logically" identified by some number?

Contacts are not always identified by some number. Sometimes users use short-hand alphanumeric codes, instead.

>Do you assign
>numbers to all the people in your address book?

I record sufficient attributes in my address book to uniquely identify all of the people and organizations tracked therein. Since my address book is an information store and not a data store, I use my own ad hoc rules of information management when modelling the data in it.

>Do you refer to them by number?

Sometimes. I probably have the address for 3283291 Canada Limited in there somewhere. While my family tends not to reuse christian names, other families often do, which requires me to resort to numbering grandfathers, fathers and sons.

>This is utter nonsense.

What is utter nonsense? That I uniquely identify the entries in my address book? Or that organizations number some things to keep track of them?

>Arbitrary numbers are implementation artifacts of
>systems that cannot properly represent intrinsic object identity.

Are you honestly suggesting that OIDs will replace driver's license numbers, social security numbers, product codes etc? Are you suggesting that users find them more accessible than the existing artifacts? Are you suggesting that OIDs are neither artifacts nor arbitrary?

>For example, a telephone number is an arbitrary identifier (although more
>closely related to a pointer) for a specific end-point in the telephone
>network.

Have you never used a reverse-lookup feature on the internet? A telephone number is an arbitrary identifier for users of the telephone network. The transportation company I use for travel to the airport identifies its customers by phone number. This has its drawbacks, of course. A video store I used to frequent also used phone numbers to identify customers, and this caused problems too.

The transportation company uses some additional identifier or has some facility to copy customer information easily because their call centre now picks up my address from the call identifier no matter which of my phones I call from.

The video store solved its problems by giving people without phones (or people who shared a phone with others) another number that was not a valid local phone number.

>In the early days of telephones, an operator was required to
>physically connect the incoming call with the end-point by plugging a cable
>into the appropriate hole. Yet, in many rural communities, the end point
was
>not identified by a number, but "Bill Jones' house".

Which is appropriate in a small-scale information management system where the operator can keep track of all the people on the local network.

>As human operators were
>completely replaced by machines, machine-readable end-point identifiers
were
>required, hence the phone number.

In other words, as the system grew to the point where it required data management, the system required a logical identifier usable by both humans and machines.

>But if I had a way to "gesture" to your entry
>in my "contact database" and pass a direct end-point (pointer) to my
telephone
>(or to my e-mail program, or to my envelope printer), then arbitrary,
synthetic
>IDs would phase out as archaic relics of an unenlightened past.

I guess that's why we invented IP and DNS.

>> >This synthetic ID is stored in each phone number so
>> >that it can be joined back to the contact.
>>
>> Incorrect, both logically and physically. Logically: An association table
>> might expose the relationship between contact id and phone number.
>> Physically: An RDBMS might store the phone number with the contact fields
>> using juxtaposition to identify the contact, but if it does so, it
exposes
>> the association to the user using the contact identifier and phone
number.
>
>If you wish to design your database such that all associations are through
a
>distinct "association table", that's fine.

The second (ie. physical) example did not do so.

>Object modelling has "link classes"
>that perform the same purpose.

And what advantage do these link classes provide over relations? Simpler interface? More consistent interface? Principled foundation? Psychological advantage? ??

Given that the relational model has proved its advantage over navigational systems, the onus now lies on any proposed new data model to prove its worth.

So-called object dbmses are nothing more than a regression to the arbitrary, ad hoc, navigational databases of yesteryear.

>But that is a heavyweight solution for simple
>associations that are commonly modelled by repeating foreign key
information in
>the phone number table.

Heavyweight in what sense? In the sense of the example I gave for how your original statement was incorrect physically?

>All you've done is require two distinct identifiers (and allowed phone
number
>to be one) so they can be stored in your association table.

In the example I gave for how your original statement is incorrect physically, the dbms does not store the data in an association table.

>Of course, each of
>these association entries will require a unique identifier...

The combination of contact id and phone number suffices. The combination is familiar, simple and stable.

>> >> >Perhaps each number would include a "type" tag (home,
>> >> >cell, etc.). In order to associate this phone information with the
>> contact
>> >> >info, either a synthetic ID must be generated or the primary key
values
>> >> must be
>> >> >replicated.
>> >>
>> >> I am not sure I understand your complaint. Are you complaining about
>> >> redundant information in the logical view of the data? Pointers are as
>> >> redundant, if not more so.
>> >
>> >A pointer is a physical implementation of a logical concept.
>>
>> A pointers is a logical exposure of a physical concept (location).
>
>Since the location of a {thing} is a physical concept, I hope we can agree
that
>a pointer is a physical thing.

You have refuted your own earlier statement that it is a logical concept.

At a single level of indirection, a pointer is a strictly physical thing. Others have argued that additional levels of indirection render the pointer logical. While I disagree with this position, I see no benefit in arguing for or against it.

In my books, an IP is a physical pointer. It is a physical pointer with a complex decoding algorithm. Likewise, an OID is a physical pointer with a complex decoding algorithm.

Additional levels of indirection allow some flexibility for rearranging physical locations at the cost of a more complex decoding algorithm, and other people would argue that this turns the physical pointer into a logical pointer. I won't argue for or against this point; I will simply observe that the pointer remains a pointer tightly married to a specific implementation with all of the disadvantages that entails.

For instance, we have six billion people on this planet, and we have four billion unique IP addresses. What happens when everyone has several devices directly connected to the internet?

>But you have been (uncharacteristically) sloppy
>at equating object identifiers with pointers.

Seemingly uncharacteristic observations require intensive examination because they always identify an opportunity for learning. Either they fall within the true parameters of statistical error, or they have an unknown or poorly understood cause.

If the observations fall within the statistical error, the observer has an opportunity to better understand the true range of error in the observations.

If the observations involve unknown or poorly understood causes, the observer has an opportunity to better understand causality.

In this case, I disagree that equating object identifiers with pointers requires any sloppiness. They are pointers with multiple levels of indirection and a complex decoding algorithm. The same is true of a memory address in a 80386-based computer using paged virtual memory.

>Since the logical concept I was
>describing is the association between a contact and his phone number(s),
using
>a pointer to implement this association is a physical implementation.

A physical implementation that the dbms should not expose to the user.

>I've
>already described the nature of OIDs in the database I use and they are not
at
>all dissimilar from rowIDs.

Which are themselves pointers. I apologize that I missed the reference to rowID. That was truly sloppy on my part.

>Yes, the database can use them to hash directly to
>a specific object, but they are no more pointers than a phone number is a
>pointer to your phone.

A phone number is an account number at the phone company. I agree that it is also a physical device identifier, but new technologies are blurring that distinction. When I turn off my cell-phone, the number identifies the answering machine at my home instead.

The phone number has other interesting properties, but I think that is a discussion for a different thread.

The most important observation for now is the number has the same informational content as it has data content.

>> >"Home phone: 210
>> >555 1212" has no meaning unless it is associated with the person whose
>> phone it
>> >is. I believe that coupling is *logically* very tight and that it is
>> reasonable
>> >to implement it as a pointer rather than creating synthetic fields upon
>> which
>> >to join.
>>
>> If a user needs to answer the question of "How many home phone numbers do
we
>> have in our contact database?", the coupling is totally irrelevant.
>
>Let me be more precise. The phone number above has no *semantic* meaning
unless
>it is associated with the person whose phone it is.

Or the organization whose fax it is, or the dial-up ISP whose modem-farm it is, or ...

It always has the semantic meaning of an addressable node in the telephone network, and you are correct that by itself it has no further semantic meaning. In a sense, phone numbers are more syntactic than semantic.

Your example presupposes phone numbers, which are assigned by telephone companies to individual nodes. In this sense, they are natural logical identifiers for users of the connected devices.

None of this has any consequence for how we choose to identify the other entities in our database. Ultimately, we establish the semantics by associating the phone number with the other entities in our database.

At a logical level, a relational dbms exposes that association using relations. At a logical level, a navigational dbms exposes that association using physical attributes such as pointers or such as proximity thereby confusing two very distinct levels of discourse.

>> Since the contact has a logical identifier and the phone number has a
>> logical identifer, it is reasonable to expose the relationship to the
user
>> by combining the identifiers.
>
>And you would expose this to the "user" as "Contact ID A473B has (a) phone
210
>555 1212". This is the combination of the (il)logical identifiers.

Assuming a relational dbms with full support for relational views, I would expose this to the user in exactly the manner most conducive to the user's application. It might appear as "Contact ID A473B, named Tom Jones, has a fax machine connected in his home at phone number 210 555 1212".

Of course, the identifying attributes of the proposition might be (A473B, 210 555 1212) but they might be (A473B, fax machine, 210 555 1212) since one can connect multiple devices to the same node, after all.

>> >> Nothing prevents you from doing that. The relational model only
requires
>> >> that you allow the user to query the phone numbers as if they are
>> >> independent of the contact. To the user, the DBMS must expose the
>> >> association between the phone number and the department explicitly
using
>> >> values regardless of how the DBMS physically establishes the
association.
>> >
>> >The first half I can accomodate. I can query against any object in my
>> object
>> >database. The fact that there may be an association (pointer, if you
wish)
>> with
>> >another object is irrelevant. (To be fair, my particular vendor does NOT
>> >supporting queries across relationships so a query of the form "Find all
>> the
>> >contacts whose home phone is in area code 808" would be difficult to
>> >accomplish).
>>
>> And you complain about the logical interface of the relational model... ?
>
>I (honestly) point out a real short-coming with the (real) commercial
product
>with which I program.

It is a real shortcoming of the logical data model used. When you identify a real shortcoming of the relational data model, I will honestly admit it.

>There is no fundamental reason why this should be so, but
>it is so and I refuse to play "what if" games.

As they say "Denial ain't just a river in Egypt."

>As you are so fond of saying, a
>failure of commercial products is not a failure of the model.

What aspect of the object model has your vendor failed to implement that results in the above shortcoming?

>> >The second part, "the DBMS must *expose* (emphasis mine) the association
>> ...
>> >explicitly using values" I don't understand. If there is no *logical*
value
>> >that identifies the association, how should this exposure take place.
>>
>> The phone number must have a logical identifier, possibly the phone
number
>> itself. The contact must have a logical identifier or the users won't be
>> able to easily identify contacts.
>
>Synthetic IDs are evil because they carry no semantic content. How often
have
>you mis-dialed a phone number?

How many times have you mis-dialed the phone number because you accidentally pointed at the wrong line in the phone book?

I want to observer that I did not choose the logical identifier for this example it was furnished by the telephone company.

>How many of your credit card or frequent flier
>numbers do you have memorized?

All of them, but perhaps I am not representative of the general population.

>A "logical" model that forces more of these into
>the interface is flawed.

A logical model that pretends they do not exist, or even worse pretends they are not necessary, is even more flawed.

>Information is identified in context.

Information is different from data. I recently saw an article in I.E.E.E. Compute regarding that very issue. It struck a resonant chord with me.

>If I were to go to the Washington, D.C.
>phone book and look up "Bob Badour", I would find 0 or more matches. There
is
>no reason for me to assume that any of these people is the person with whom
I
>have been having this conversation for these weeks. I identify you
"uniquely"
>as the Bob Badour who has been posting in comp.databases.object. I do not
>create a number to represent you.

You have identified one of the weaknesses of the internet and you have nailed the growing problem of identity theft right on the head. You would have tremendous difficulty identifying me if someone else began posting messages under my name.

I suggest to you that obfuscating real identity using OID makes the problem worse and not better. For instance, how easily can you identify the source of this email from the message headers -- if you can even find the message headers.

>> >You seem
>> >to be mandating that synthetic IDs be created to be used in a logical
join
>> that
>> >are not necessary in either the logical or the physical level.
>>
>> Define synthetic. Unless you advocate a complete lack of logical
identity,
>> the user will need to have some means to identify contacts and some means
to
>> identify phone numbers. Use those means.
>
>Logical identity is synonymous with what I called an object's intrinsic
>identity.

Unfortunately, humans do not operate on intrinsic identity any more than they operate on OID. Humans identify things by attribute values even when they do so subconsciously.

>Quite often humans disambiguate by pointing.

Before humans can point, they must disambiguate. The user cannot point at the correct location in the catalogue or on an order unless the user knows what he or she wants to identify.

>When you walk through the
>cafeteria line, you point to the lady with the scoop which jello salad you
>want, since it is the most precise "identifier".

In that case, I will choose the jello salad I want based on the values of the colour, flavour and size attributes. Once I have chosen, I will indicate my choice to the lady with the scoop by whatever expedient means I have at my disposal.

>When I call my parents, I hit
>the speed dialer (I don't remember their phone number).

Do you have only one number in your speed dialer? Does your speed dialer just use a shorter number to identify your parents? Or do you scroll through a list and choose the number by some other identifying attribute of your choice?

>Since users point, it
>seems that you are advocating pointers :-)

Users do not point to identify things to themselves; although, it is sometimes expedient to point at attributes for the consumption of others. Usually, others will not identify the entity pointed out by location either, but by some reasonable set of attribute values.

>> >The English language has only a very few concepts: noun, verb,
adjective,
>> >adverb, preposition, conjunction (I may have missed one or two). Yet I
don't
>>
>> >think anyone would argue that mastering it is simple.
>>
>> You have missed many concepts, and you have ignored the confounding
>> complexity. Much as you ignore the confounding complexity of ODBMS.
>
>Hmmm. If I'm missing all these concepts and ignoring complexity, perhaps
it's
>not so complex after all. Otherwise, shouldn't this complexity be causing
me
>untold grief?

You ignore it in your statement above. You do not ignore it in your use of the language. The complexity seems easy to you only after you spent many years mastering the language, which is why it seems difficult to those just learning.

To construct a straw man, you ignore many concepts such as irregular spelling, irregular verb, gerund, infinitive, subordinate clause, article, object, subject, predicate, pronoun, relative pronoun, vocative, subjunctive, dative, transitive, intransitive, interrogative, exclamative, assertive, punctuation etc.

>> All the more reason to suggest as simple an interface as possible -- the
>> relational model.
>
>You've missed the point. Why does FedEx assign a tracking number to your
>package?

Above, you argue against logical identifiers. Does the irony escape you?

>Because identifying "the package that Bob Badour sent to Jim Melton on
>Sept 1, 2001" is too complex (although it can easily be represented as a
>relation). People routinely create concepts that may "add complexity to the
>interface" in order to sheild themselves from greater complexity.

Both examples above use the same interface; they are both propositions. The relational algebra allows users to derive one from the other. It also allows the DBMS to create multiple views -- one derived from another.

>> >In order to deal with more
>> >complex things, we hide complexity behind abstractions.
>>
>> Relations are very simple abstractions.
>
>One could represent all data as sequences of name-value pairs. Such data
would
>extremely simple, but exceedingly complex to work with, because the
sequence
>would be devoid of semantic content.

It would also lack any theoretical foundation or guiding principle and would require people to construct special encodings that expose implementation details to users.

Since relations do not require any special encodings for addressability, they have obvious advantages over the above. Since a relational dbms allows one to directly communicate the semantics of the data to the dbms, it obviates a major shortcoming of both name-value pairs and so-called object dbmses.

>> >Object classes have interfaces that reflect the complexity that is
>> >already inherent in the data.
>>
>> Unfortunately, object classes often go beyond this and expose the
complexity
>> inherent in the physical representation of the data as well as that
inherent
>> in the data itself.
>
>One must question if you understand object technology at all. Since it is
>completely possible to declare a class that is all interface and no
>implementation (no data members), it is ludicrous to assert that object
classes
>expose implementation details (physical representation).

And the abstract "order" class exposes no collection, or hash, or bag, or array of references to "order items"? The user can identify all associated instances of "order item" without resorting to an instance of "order"?

>> >Sure, you can argue that a user must understand
>> >some amount of the object model to become productive, but I don't see
how
>> that
>> >is any different in any paradigm.
>>
>> There goes that word again. Why do you use it for almost everything? Are
you
>> not able to conceive of a meaningful word to use in its place?
>
>Obviously not. Why don't you offer an alternative that won't push your hot
>button.

The word has too many different meanings and people use it with too little understanding of any of them for me to pick a suitable synonym in the above context.

"Example" does not seem to make any sense above. "Pattern" does not seem to make any sense above, either. Since object dbms lack any consistent theoretical framework, that definition makes no sense either.

If you want to demonstrate that the sentence above has any meaning at all, you will have to identify a sensible alternative for me.

>> Users understand relations with very little effort because all relations
>> have an identical interface using identical operations.
>
>Syntax is never particularly interesting.

Relations are semantic and not syntactic.

>Knowing *what* I can do is a far cry
>from knowing *why* I would want to do it (and when I would NOT want to do
it).

Hence the integrity function of a relational dbms.

>> >If I don't understand the way all the tables
>> >are related and what fields join what tables in what context, how
>> productive
>> >will I be?
>>
>> Very productive. All you need to know is the way the system catalog
tables
>> are related.
>
>Nonsense.

What does an object dbms offer that even begins to compare?

I have taught many users about system catalog tables and I have observed many of them repeatedly discover the information they needed by interrogating those same tables.

>We have a diagram that depicts all the tables and relationships between
tables
>in a particular database used by our customer. It is incomprehensible.

I don't doubt it. I am not a big advocate of diagrams.

>There is
>NO hiding of complexity

It does not surprise me that the data modellers performed poorly. Very few actually grasp the fundamentals.

> -- it is all out before us with no way to break it down
>into bite-sized pieces. I can tell exactly how each table is related, but I
>can't figure out what the tables MEAN, which means I can't use the data
>productively.

Again, I cannot help that your data modellers performed poorly. I can only point out that you were lucky they did not have a navigational dbms to render useless.

>> >Object classes attempt to model what the user already has to figure
>> >out anyway.
>>
>> I disagree that the user has to figure out a complex object interface for
>> every possible relation, and I must point out that object classes handle
the
>> job very poorly.
>
>See above.

See what? I haven't seen anything that refutes my point.

>> >Object databases use objects naturally to manage complex notions (and
>> >relationships).
>>
>> I have yet to meet a casual database user who found objects natural. In
>> fact, I have found many experienced, skillful application programmers who
do
>> not find them at all natural.
>
>It all depends in what circles you move, I suppose. Here in
>comp.databases.object I think your findings would be somewhat different.

I have yet to see any evidence of that.

>> >Yes, I understand the concept. I did not ask you to agree with me.
>>
>> You have yet to exhibit any understanding.
>
> ... to your satisfaction.

... or at all.

>One of the difficulties in discussing things with you is that you cannot
agree
>to disagree.

I can when I see a reason to.

>You must be right and I must be wrong. I see your point. I do not
>agree with it.

Unfortunately, you do not see my point. I see your point, and I understand the fundamental misconceptions from which you derive it. Even when I point out how flawed those fundamental misconceptions are, you cling to them and actively promote them.

>Statements such as the above exemplify the allegation I made a
>while ago about you being an intellectual snob (or something like that). It
is
>quite a condescending remark.

If you espouse and promote the position that creating a unique and arbitrary interface for every relationship among data reduces complexity compared to using a simple set-based abstraction, you do not understand the concept of complexity.

>I will readily admit that I do not have the "official Date & Pascal"
vocabulary
>for describing purist relational theory internalized.

Nor do you need to. If you want to call some relations tables, other relations views and other relations queries, fine. If you want to call tuples records or rows, fine. If you want to call domains object classes, fine.

I think that variable and value have the same definition and meaning to database practitioners that they have for programmers.

Even if you do not know all of the mathematical identities that enable relational optimizers, you can still benefit from them. The dbms vendor must understand these identities, but users need not.

If, however, you use the term object at one time to mean a variable, at another time to mean a value, at another time to mean a collection of variables, at another time to mean a reference etc., you are simply using sloppy terminology of your own.

>I may not use your words
>with the precision that you would like.

Unfortunately, we work in a precise field whose primary tasks are tasks of communication. Sometimes the communication involves humans, sometimes the communication involves machines and sometimes the communication involves both.

You do not even use your words with precision, and this is a real impediment to accurate communication.

>But I have used databases that are
>called relational and I have used database that are called object-oriented.

Unfortunately, the databases you were told are called relational are not relational, and the databases you were told are called object-oriented are nothing more than network model databases with a fresh new scent.

Until you actually know what a relational dbms is, it is irresponsible to make public claims denigrating them.

>I
>have decades of experience in writing software for large, complex systems.
And
>IN MY EXPERIENCE, complexity is best managed through the use of objects.

Since your experience failed to even teach you what a relational dbms is, it offers little upon which to base a comparison.

>Once again, I don't ask you to agree with me.

I stand by my earlier statement. I don't expect you to agree.

>> One cannot start with a simple interface and make it more simple by
adding
>> features.
>
>Decomposing the works of Shakespeare into it's component letters and
storing
>each letter with a frequency count would be a simple interface. But I think
it
>could be made simpler by adding features...

It would not be an interface at all since it would have destroyed all meaning. This is a problem with "argument by example" or anecdotal evidence. You must first establish that the example has meaning, then you must establish that the meaning has relevance. Received on Tue Sep 04 2001 - 01:21:16 CEST

Original text of this message