Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> comp.databases.theory -> Re: Clean Object Class Design -- What is it?

Re: Clean Object Class Design -- What is it?

From: Bob Badour <bbadour_at_golden.net>
Date: Sat, 1 Sep 2001 22:42:01 -0400
Message-ID: <G0hk7.636$yK.127003715@radon.golden.net>

Jim Melton wrote in message <3B9085E5.A5547CC1_at_Technologist.com>...
>
>Bob Badour wrote:
>
>> >The only
>> >way to model the associations as value-based would be to create
synthetic
>> IDs.
>>
>> Why? Do the entities you manipulate have no logical identity?
>
>One of the fundamental concepts of object technology is that objects have
>intrinsic identity independent of any attribute values they may posess at
any
>point in time.

Please use well-defined terms. Object variables (instances) have intrinsic identity as do all variables. However, this does not help users disambiguate similar yet separate variables.

>This notion of intrinsic identity is reinforced in object
>databases by our pointers to objects.

Yes, by pointers to variables. But this does not help users disambiguate similar values stored in separate variables. Are you saying that your entities have not logical identity? That users cannot disambiguate similar entities?

>In a relational database, the paradigm is
>always to copy the data out of the database, perform some manipulations (as
>required), then find the appropriate record(s) again and modify whatever
values
>are changed.

There goes that word again. I am convinced that you confuse yourself with pretentious, nebulous terminology. Instead of calling everything a paradigm, try identifying exactly what you want to say. Instead of calling everything an object, try identifying exactly what you want to say.

You have it backward. ODBMSes require the above process, but relational databases do not. One can send a set-oriented command to the RDBMS that manipulates data entirely within the DBMS process.

>In the object database, this data copying step is eliminated.

Actually, in the object database, this data copying step is required in order to make the data available to the application programming language for data manipulation. It is not required in an RDBMS because relational databases have their own data manipulation language.

>The
>database becomes much less of external entity (conceptually) and data is
>manipulated (conceptually) directly.

This is simply untrue. Conceptually, one must control persistence, and the term persistence, itself, implies a copy of data.

>I say conceptually, because obviously as data is moving to and from disk
there
>is copying going on. However, an object reference allows me to manipulate a
>persistent object directly without regard to this copying.

One cannot ignore the copying going on. At a conceptual level, the programmer must still specify which object variables get copied into and out of the application programme's memory. At a conceptual level, the programmer must still specify when and how to retrieve values from the database.

>(By the way, I consider this whole difference in paradigm with regard to
>explicit copying into/out of the database as one of the key
>philosophical/architectural differences between object databases and
>relational/SQL databases)

Paradigm: A set of assumptions, concepts, values, and practices that constitutes a way of viewing reality for the community that shares them, especially in an intellectual discipline.

The object oriented community have false assumptions, nebulous concepts, warped values and arbitrary practices. The relational community have explicit assumptions, precisely defined concepts, principled values and reasoned practices.

I don't think physical copying has much to do with the differences in the "paradigms".

>This whole concept of intrinsic identity is extremely critical in my domain
>because often we do NOT know what attribute value could be used to uniquely
>identify an object. Sometimes, all we know is that there is an object
observed
>or inferred through some phenomenology. Over time, we hope to discover more
of
>the attribute values attributable to that object, but in the mean time it
must
>be distinct from all other objects under consideration.

How do the users of your system identify the distinct instances under consideration?

>Object databases handle this representation of uniqueness with object
>references (commonly referred to as OIDs).

Using pointers, yes, I know that. We already know what a disaster it is to expose pointers to users. If you do not expose OID to users, how do users identify unique instances?

>SQL databases can generate synthetic
>IDs such as rowID (that are virtually the same as OIDs).

Except that they are symmetric and do not require navigation.

>However, if there are
>no attributes that can be used to create a distinct relation, how would a
>relational database handle this concept of intrinsic identity?

Identity is intrinsic to variables. Relation variables are uniquely identified by name. Tuple variables are uniquely identified by relation name and key value. Object variables are uniquely identified by relation name, key value and column name.

>> >> >I find attribute joins
>> >> >problematic, especially where they force synthetic IDs into the data
>> model
>> >>
>> >> Do you mean you would prefer not to have any form of logical
identifier?
>> You
>> >> will find such a lack much more problematic.
>> >
>> >I find that a normalized model does not usually consist of stand-alone
>> >entities. For example (again), a contact database should have multiple
>> phone
>> >numbers for a contact.
>>
>> And the user should have some method for identifying each of these phone
>> numbers. Home, work, fax, cell, emergency, alternate office on tuesdays
and
>> thursdays...
>
>But you are incomplete. You need a field (pardon me for not using the right
>term) to join the phone number with the contact record.
>Since "logically" the
>contact is uniquely identified by the sum of the fields (name, address,
title,
>company, etc.), in practice a synthetic ID is created to represent the
unique
>identity of the contact.

Logically, a contact is identified by some number. The sum of the other fields need not be unique and the user must have some method to tell them apart. Organizations assign numbers to all kinds of things: employees, customers, license holders, benefit recipients, dependents, departments, accounts, bins, locations. They were doing that long before computers ever came along.

>This synthetic ID is stored in each phone number so
>that it can be joined back to the contact.

Incorrect, both logically and physically. Logically: An association table might expose the relationship between contact id and phone number. Physically: An RDBMS might store the phone number with the contact fields using juxtaposition to identify the contact, but if it does so, it exposes the association to the user using the contact identifier and phone number.

>> >Perhaps each number would include a "type" tag (home,
>> >cell, etc.). In order to associate this phone information with the
contact
>> >info, either a synthetic ID must be generated or the primary key values
>> must be
>> >replicated.
>>
>> I am not sure I understand your complaint. Are you complaining about
>> redundant information in the logical view of the data? Pointers are as
>> redundant, if not more so.
>
>A pointer is a physical implementation of a logical concept.

A pointers is a logical exposure of a physical concept (location).

>"Home phone: 210
>555 1212" has no meaning unless it is associated with the person whose
phone it
>is. I believe that coupling is *logically* very tight and that it is
reasonable
>to implement it as a pointer rather than creating synthetic fields upon
which
>to join.

If a user needs to answer the question of "How many home phone numbers do we have in our contact database?", the coupling is totally irrelevant.

Since the contact has a logical identifier and the phone number has a logical identifer, it is reasonable to expose the relationship to the user by combining the identifiers.

>> I question whether any synthetic ID is required. The contact identifier
and
>> the phone number suffice.
>
>How else would one join the phone information with the contact information?
Or
>by contact identifier did you mean a synthetic ID for the contact record?

Natural ID's are nothing more than old synthetic ID's.

>> >I'd rather just store the array of phone numbers with the contact
>> >where they belong.
>>
>> Nothing prevents you from doing that. The relational model only requires
>> that you allow the user to query the phone numbers as if they are
>> independent of the contact. To the user, the DBMS must expose the
>> association between the phone number and the department explicitly using
>> values regardless of how the DBMS physically establishes the association.
>
>The first half I can accomodate. I can query against any object in my
object
>database. The fact that there may be an association (pointer, if you wish)
with
>another object is irrelevant. (To be fair, my particular vendor does NOT
>supporting queries across relationships so a query of the form "Find all
the
>contacts whose home phone is in area code 808" would be difficult to
>accomplish).

And you complain about the logical interface of the relational model... ?

>The second part, "the DBMS must *expose* (emphasis mine) the association
...
>explicitly using values" I don't understand. If there is no *logical* value
>that identifies the association, how should this exposure take place.

The phone number must have a logical identifier, possibly the phone number itself. The contact must have a logical identifier or the users won't be able to easily identify contacts.

Any relation that includes both of the above values will explicitly associate the two by value.

>You seem
>to be mandating that synthetic IDs be created to be used in a logical join
that
>are not necessary in either the logical or the physical level.

Define synthetic. Unless you advocate a complete lack of logical identity, the user will need to have some means to identify contacts and some means to identify phone numbers. Use those means.

>> >> Purity is not the issue. Simplicity and comprehensibility are. Users w
ill
>> >> not get the answers they want when confronted with a complex,
>> >> incomprehensible representation of the data.
>> >
>> >I find an object database a marvelous tool for managing complexity. To
each
>> his
>> >own.
>>
>> Then I must question whether you understand the concept. If every object
>> class has a different, unique interface, the user has a huge amount to
learn
>> before the user can become productive. Relations are simple -- as are
>> relational expressions. Objects are complex -- so complex that no such
thing
>> as an object expression really exists, or ever will.
>
>The English language has only a very few concepts: noun, verb, adjective,
>adverb, preposition, conjunction (I may have missed one or two). Yet I
don't
>think anyone would argue that mastering it is simple.

You have missed many concepts, and you have ignored the confounding complexity. Much as you ignore the confounding complexity of ODBMS.

>Relations may be simple, but that does not mean that their usage may not be
>exceedingly complex.

Relations and the relational algebra are much simpler than the english language. It is true that one can model real-world systems to arbitrary levels of complexity with this simple interface. What I don't understand is any insistence on adding further needless complexity.

>Cognitive modelling has shown that human beings can only
>keep a finite number of concepts in active memory.

All the more reason to suggest as simple an interface as possible -- the relational model.

>In order to deal with more
>complex things, we hide complexity behind abstractions.

Relations are very simple abstractions.

>This is one of the core
>motivations of object technology.

Unfortunately, object technology fails to hide as much complexity as the relational model does regardless of object technology's good intentions.

>Object classes do not manufacture interfaces for the sake of creating
>complexity.

Sure they do. Otherwise, all multivalued objects would be relations.

>Object classes have interfaces that reflect the complexity that is
>already inherent in the data.

Unfortunately, object classes often go beyond this and expose the complexity inherent in the physical representation of the data as well as that inherent in the data itself.

>Sure, you can argue that a user must understand
>some amount of the object model to become productive, but I don't see how
that
>is any different in any paradigm.

There goes that word again. Why do you use it for almost everything? Are you not able to conceive of a meaningful word to use in its place?

Users understand relations with very little effort because all relations have an identical interface using identical operations.

>If I don't understand the way all the tables
>are related and what fields join what tables in what context, how
productive
>will I be?

Very productive. All you need to know is the way the system catalog tables are related.

>Object classes attempt to model what the user already has to figure
>out anyway.

I disagree that the user has to figure out a complex object interface for every possible relation, and I must point out that object classes handle the job very poorly.

>Object databases use objects naturally to manage complex notions (and
>relationships).

I have yet to meet a casual database user who found objects natural. In fact, I have found many experienced, skillful application programmers who do not find them at all natural.

>Yes, I understand the concept. I did not ask you to agree with me.

You have yet to exhibit any understanding.

One cannot start with a simple interface and make it more simple by adding features. Received on Sat Sep 01 2001 - 21:42:01 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US