Re: The Fact of relational algebra (was Re: Clean Object Class Design -- What is it?)

From: Adrian Veith <adrian_at_veith-system.de>
Date: Thu, 18 Oct 2001 10:59:32 +0200
Message-ID: <9qm59m$b1o$02$1_at_news.t-online.com>


"Bob Badour" <bbadour_at_golden.net> schrieb im Newsbeitrag news:V3%x7.2891$cA2.94542889_at_radon.golden.net...
> "Adrian Veith" <adrian_at_veith-system.de> wrote in message
> news:9q3fo7$ti5$00$1_at_news.t-online.com...
> > Resent, because my PC had a wrong date.
> >
> > "Bob Badour" <bbadour_at_golden.net> schrieb im Newsbeitrag
> > news:A4xt7.1500$yU1.104969349_at_radon.golden.net...
> > >
> > > "Adrian Veith" <adrian_at_veith-system.de> wrote in message
> > > news:9nl4hf$ndq$1_at_wrath.news.nacamar.de...
> > > > From: "Bob Badour" <bbadour_at_golden.net>
> > > > Newsgroups:
comp.databases,comp.databases.object,comp.databases.theory
> > > > Sent: Monday, September 10, 2001 3:22 AM
> > > > Subject: Re: The Fact of relational algebra (was Re: Clean Object
> Class
> > > > Design -- What is it?)
> > > > > "Adrian Veith" <adrian_at_veith-system.de> wrote in message
> > > > news:<9ngc8f$de9$05$1_at_news.t-online.com>...
> > > > > > "Bob Badour" <bbadour_at_golden.net> schrieb im Newsbeitrag
> > > > > > news:cd3b3cf.0109081138.731c2192_at_posting.google.com...
> > > > > > > "Adrian Veith" <adrian_at_veith-system.de> wrote in message
> > > > > > news:<9na529$6u5$1_at_wrath.news.nacamar.de>...
> > > > > > > > "Bob Badour" <bbadour_at_golden.net> schrieb im Newsbeitrag
> > > > > > > > news:v2Vk7.768$LE3.134318184_at_radon.golden.net...
> > > > > > > > >
> > >
> > > The "enhancements" you desire render the dbms non-relational doing
> > > tremendous harm in the process.
> > >
> >
> > You are totally wrong. The enhancements I desire stand without any
> conflict
> > to a relational design.

>

> I can only suggest you educate yourself regarding the fundamentals. My
> previous statement remains a true fact.
>

As the uneducated fool I am, I cannot agree to your argumentation.

> > Reference:
> >
> > 1. Direction more or less precise to (page etc. of) book etc. where
> > information may be found.
> > 2. Relation, respect, correspondance, to

>

> Yes, a reference is a pointer to a location.
>
> Of course, in a technical discussion, I would suggest we use a more
rigorous
> definition of the term accepted within the scope of the discussion.
>

The definition is precise enough: a reference directs you to information.

> > > > The
> > > > "reference" i am talking about has the following characteristics:
> > > >
> > > > - it identifies the ascotiated object like a primary key, but is
> > generated
> > > > by the database
> > >
> > > It points to an object variable without providing any useful
information
> > for
> > > identification.
> >
> > With the above definition of a reference, it gives you the place where
you
> > can find information.

>

> Yes, it points to a location.
>
>

> > There are many possible implementations of such a
> > reference. A simple pointer would be the most stupid.
>
> Additional levels of indirection or a more complex decoding algorithm
change
> little

You equal: reference to information = reference to location - which is certainly untrue.

>
>

> > > > - but unlike the primary key, it gives you the following
information:
> > > > - what kind (class) of object it identifies.
> > >
> > > While a primary key does not provide type information, the combination
> of
> > > relation name and column name do. Since primary key value, relation
name
> > and
> > > column name uniquely identify an object instance, the relational
> > equivalent
> > > to a pointer provides as much type information.
> >
> > Wrong. In your model you have to know in advance, which relation you
want
> to
> > address with a primary key.
>
> The relational equivalent to your pointer is the combination of a
relation,
> key value and column name. How is anything I wrote wrong?
>

This information is not enough to fully identfy the instance of an object. It still lacks the information to which class the object belongs. In the example below, it is not enough to have the relation "Foo", the column "A" and a unique key value to tell if the object is a "Foo" or a "FooBar". ( Beside the fact, that according to your definition of a reference, this combination of a relation, key value and column name is also a reference and therfore a pointer as well.)

> > With a reference you can address an object of
> > unknown type and get all the information.
>
> This is equally true in the relational model. The declared type specified
by
> a relation and a column name does not have to match the most specific type
> of any object value further identified by a candidate key value in the
> relation. While I question the utility, one could have a universal
"Object"
> supertype and declare the type of a column using this universal type.
>

Where is the information stored, that a specific object is of class "Foo" or of class "FooBar". Is there a reference to this information, or do you cut this
information out the air?

> > And you get even usefull
> > information, if you provide a invalid reference (= the reference is
> > invalid).

>

> Again, the relational model likewise provides useful information regarding
> invalid references. An rdbms will explain that a relation is unknown to
the
> dbms if that is the case. An rdbms will explain that the named column does
> not exist in the relation if that is the case. An rdbms will correctly
state
> whether any provided key (or non-key) exists within the relation.
>

A reference does exactly the same job.

> > Of course. A big problem of all databases is, that they don't have a
> proper
> > handling of concurrency.

>

> This is a remarkable statement. I hope you provide equally remarkable
> support for it.
> > In a multigenerational database, locking hinders
> > concurrency and therfore performance.
>

> The relational model does not require locking in the first place. I wonder
> what point you are trying to make regarding relational vs non-relational
> object dbmses.

At least, we think about this problem. As you declare, the relational model ignores this problem and therefore many RDBMS do as well.

> > In most cases, it is enough to know if
> > an object (or record) has changed since the time the transaction
started.

>

> For those rare instances where users need such information, the relational
> model only requires that the dbms logically expose this information as
> values in the relation. The most commonly used methods are update counters
> and timestamps.
>

I would call this: a not standard workaround for a common problem.

> > If
> > the database provides this information, you can reduce locking to those
> > occasions, where it is absolutely necessary for data integrity.

>

> As I mentioned previously, the concurrency model is orthogonal to the
> logical data model. The relational model allows any valid concurrency
model
> and any implementation of that concurrency model.
>

Other words for ignoring a problem.

> > Example:
> >
> > class Foo
> > int A,
> > double B
> > end
> >
> > class FooBar < Foo
> > int C
> > end
> >
> > Given a reference X to an object of class Foo i can the following:
> >
> > X is Foo -> True
> > X is FooBar -> False
> > print X.A
> > print X.B
> > print X.C -> Throws an exception
> >
> > This is encapsulation !

>

> What is encapsulated and how?
>

Now you are kidding.

> > Therefore you can only embed a copy of
> > an object.

>

> If I record my mother's birthday in a database and I record my mother's
> birthday in a spreadsheet, I have to make a copy of my mother?!? Yikes!

Your mother != Your mothers birthday.
and
Your mothers birthday in a database != Your mothers birthday in a spreadsheet.

Because, if you have to change one (don't ask me why), you have to change the other.

And:

if you want to "put your mother in a database", you have to make copies of the dates you want to put into the database. If you wo be able (technically) to put her personally in, then she would be gone from the real world. This is nothing a good son should do with his mother.

> > In my opinion, this is also a major problem of persistance.

>

> Actually, this is a problem of using pointers for identity. Identity is a
> concept independent of location. The relational model captures this
concept
> logically using values. Your proposal confuses the concept of identity
with
> the concept of location.

The relational model captures nothing here. By duplicating information the problem gets bigger not smaller.

> > Since with
> > persistance you create copies of the objects in the database in the
memory
> > of the application.

>

> The above statement demonstrates the confusion. The identity of the
> described object does not changed when information is recorded in multiple
> locations.
>

The confusion arises, because people (programmers) tend to think (when they work with persitent objects), that the object in the database and the object in
memory is the same. I don't like this idea, because this is not true. If I change some values of the object in memory, there has to be a more or less complicated process to update the values in the database.

Hiding this process is not a solution!

Therefore I prefer the picture of interfacing the objects in the database.

> > This is one of the reasons I prefer to speak about
> > interfacing the objects in the database.

>

> I prefer to speak about concepts separately from logic and separately from
> physical location. I think your understanding will vastly improve if you
do
> likewise.

>

How can a concept be separated from logic ? I speak of a concept, that is separated from specific physical layout. But your dislike of my concept (and my dislike of yours), does not give you the ability to judge my
understanding.

> > Again wrong. A reference knows, when it is dangling. And with a proper
> > memory mangement, this should never happen.

>

> Wrong in what way? Are you now claiming that it is impossible to try to
> access a deleted object through the pointer used to delete it? No error is
> possible? I am not sure what I said that was in error.
>

It is impossible to access a deleted object through the reference, that deleted it!
No error is possible!
The reference should even have the ability to tell, that it has been used to access an object, that has been deleted. Whereas a rdbms is only able to tell you if
something exists or not, but not if it has been existed.

> > Why references and not a copy of the attributes ? Because I want to be
> able
> > to write code like this in my applications:
>
> You can write whatever code you want in your applications without
extending
> the relational model. The relational model handles database management
while
> your application programming language handles applications.
>

Other words for ignoring the fact, that databases are accessed by applications. Ignoring the interaction between both is like being blind on one eye.

> > And now the way how I plan to implement views:
> >
> > In the database:
> >
> > Interface IFooBar
> > FooBar myFooBar,
> > int X
> > end

>

> Requires someone to predefine the type of tuples in the view. Must someone
> predefine a type for every query?

At a certain point you have to define mapping. Either in the database (one time), or in the application (many times). It is not necessary to define this in the database, but it helps in the process of designing.

>> deleted
> Given the examples above, where is the need for references?

From the point of the database, there is no big difference. From the point of
the application, a reference enables you to navigate easily inside the result set, which was delivered by the database. This is a need defined by many programmers!

> > class FooCollect
> > collection of Foo myCollection,
> > int D
> > end
> >
> > create table FooCollectTable based on FooCollect
> >
> > select self
> > from FooCollection self

Oops, syntax error. It should be: from FooCollectTable self which is a shortcut for: from FooCollectTable.FooCollect self which is legal, because: FooCollectTable is based on FooCollect

> > where myCollection.Has(A = 3 or FooBar(C=11))
> >
> > returns a collection of references to all FooCollect Objects where Foo
has
> > Objects with A=3 or it is a FooBar and C = 11.

>

> It also violates encapsulation, requires significant additional complexity
> in the query language and defines implicit operations on explicit types.
>
  1. access to myCollection is legal, because myCollection is an attribute of class FooCollect which is referenced by self. But it would be more correct to write self.myCollection.Has(...)
  2. access to A inside Has(..) is legal, because myCollection references object of class Foo and therefore anything which is inside the brackets.
  3. the construction: FooBar(C=11) is NOT a type cast. It should be read as: this (the implict reference to a Foo provided by Has(..)) IS a FooBar AND this.C = 11

therefore the access to C is legal, because access rights are tested and granted before the access is made.

Therefore, the example does not violate encapsulation ! The operations are well defined!

I would appreciate any definition of a query language, which is suitable for relational and object orientated access, which is less complex, which is quite close to the standard.

If you can give one - please its your turn:

> > select self
> > from FooTable self
> > where Invers(FooCollectTable.FooCollection.Foo).Has(D = 23)

Ooops, again the same syntax error. It should be: Invers(FooCollectTable.FooCollect.Foo) or shorter Invers(FooCollectTable.Foo) since all Objects in FooCollectTable are of class FooCollect.

> > and self.FooBar(C = 11)
> >
> >
> > returns a collection of references to all Foo Objects which are
referenced
> > by FooCollect Objects with D=23 and the object is a FooBar with C = 11.
> >
> > Please explain why a collection hinders to formulate queries ?

>

> It greatly complicates the query language. Please explain why it helps to
> formulate queries.
>
> For instance, you have only demonstrated simple restriction. Yet, doing
so:

Really ? A rough equivalent in SQL (please don't kill me if i made a mistake) would be:

select foo.*, foobar.*
from FooTable foo, FooBarTable foobar, FooCollectTable_FooCollection_Foo fct_foos, FooCollectTable fct,
where fct.D = 23 and fct.ID = fct_foos.HasID and fct_foos.InversID = foo.ID and foo.ID = foobar.ID
and foobar.C = 11

I think that writing queries like this comes close to some S/M practics.

I know you would define a view, to avoid the joins, but is this really less complex ?

> * you had to add the implicit Has predicate.

it is a part of the query language. What is wrong in extending a query language ?

> * you confusingly returned Foo objects when you specified FooTable
objects.

All Objects in table FooTable are of class Foo or derived from this class. Therefore this legal.

> * you had to add the implicit Invers operation.

I am not happy with the Invers operation, but what is better:

  1. aReference.Invers( some Collection ).Has( something )
  2. "some Collection".References( aReference ).Has( something )
  3. ??

> Just imagine the complexity you must add to use FooTable in a Join, Union,
> Intersection, Difference or Projection. (Not to mention when nesting these
> operations.)

Take the SQL equivalent and give me the answer, which of the two solutions is more readable and tends to have less errors?

[Quoted] > > > > The construct of the "reference" brings the task of key-generation
to
> > the
> > > > DBMS.
> > >
> > > Unfortunately, it hides the identity of data from the user at the same
> > time.
> > > A DBMS can generate unique key values without resorting to pointers.
> >
> > ???

>

> !!!

>

??? What is hidden ?

> > select self.A, self.B, self.FooBar.C // returns C if self is a FooBar or
> > null
> > from FooTable.Foo self
> > where A = 1 or self.FooBar(C = 23)

>

> Again, one does not need references for polymorphism or for querying
> specific type. A relational dbms can do the same without extending the
> relational model.

It helps navigation inside the application and still fits into the relational concept.

> > > Fabian's book deals with many other practical issues as well.
> > >
> >
> > You have made me curious about this book. I have ordered it already, but
> it
> > will take some time to come to germany.
>
> I hope it proves as useful to you as it has to me. I think it will clarify
a
> number of issues for you.

I hope it does not fall under any embargo restrictions, because I am waiting almost 14 days now.

[Quoted] > > > > There are some situations, where navigation is useful. If the task
you
> > > have
> > > > to perform needs navigation, why should a DBMS (OO or R) hinder you
?
> > >
> > > No task requires navigation. Why should any dbms require it?
> >
> > How can you be so sure. I am not.

>
> Ponder the following two statements: A pointer (or reference) establishes
a
> unidirectional relationship between two variables. A tuple in a relation
> variable establishes a symmetric relationship among N variables.
>

i do not agree, a reference (even a pointer) is as symmetric as any other relationship that is established by the identity between two ore more values. (a.ID = b.ID is not much different from a.reference_to_b = b.address )
The only problem of a pointer (not a reference) is, that this relationship breaks, because the pointer has to change its value when the associated object is written to disk or read from disk. Therefore it is not very clever to use pointers to reference a non volatile object.
But a reference with a physical independent value is as "relational" as any other ID generated by an rdbms.

> > For me it looks like a religion - navigation is evil. If does not cause
> any
> > harm, why is it forbidden to use it ?
>
> Much empirical evidence and many logical arguments have clearly
demonstrated
> that forced navigation causes harm. Your own examples demonstrated
> considerable needless complexity.

There is no navigation demonstrated in my examples! It seems, that you don't like the language elements, that have to deal with the concept of collections. But without these elements, there is not much difference (classes!) to a standard relational query-language.

But what do you want to say with: Much empirical evidence and many logical arguments have clearly
demonstrated that forced navigation causes harm.

Whom do I force to navigate ? and why ?
What are the logical arguments ?
Where is the harm ?
Who has clearly demonstrated ? and what ?

In a second thought, i think, that i don't want the answers to these questions, because they will lead the discussion to nowhere.

> > > The ISO Standard Vocabulary for Databases, ISO/IEC 2382-17, defines
> > > "Relation" as:
> > >
> > > "A set of entity occurences that have the same attributes, together
with
> > > these attributes. Note -- In a relational database, a relation can be
> > > represented by a table with the rows corresponding to the entities and
> > > columns corresponding to the attributes." ISO/IEC 2382-17.04.01
> > >
> > > Ich habe kein deutsches Normung. Ich habe nur das
> > englisches/franzoesisches
> > > Normung, und mein deutsch-englisches Computerverzeichnis ist in New
> > Jersey.
> > > Ich habe ein deutscher Freund gefragt und er hat gesagt dass die
> logische
> > > oder mathematische Ausdruecke "function" und "relation" sind
"Funktion"
> > und
> > > "Relation" ins Deutsch. Ich glaube dass das Problem nicht eine
> > Uebersetzung
> > > Ausgabe ist.
> > >
> >
> > perfect ;-)
> >
> > > Your english is much, much better than my german.
> > >
> >
> > To be frank, I am positivly surprised.

>

> I am gratified to hear that. (I think)
>

Out of topic: I think, that if everybody would try to understand parts of the language of his counterparts, this would help to improve the understanding of different cultures and thinking. In my experience it is really amazing if you try to explain your thoughts in a different language - the results differ. But I am not a specialist in foreign languages, therefor this is only a personal thought.

> > > > You can have different attributes for the collection. If it has the
> > > > attribute OWNS, then it will delete the associated objects.
> > >
> > > These arbitrary and ad hoc "attributes" are less functional than the
> > general
> > > integrity function of an RDBMS.
> > >
> >
> > That is true, but as I pointed out, that collections are a subset of
> > relations.

>

> That's like saying pedal-cars are a subset of cars. It's true. But is it
> helpful?

For my son yes! And for me as well, because I can save a lot of money until he is 18.

To be serious. The concept of a collection *is not* a replacement of the general concept of a relation, but it can be used as replacement in many occasions.
One benefit is, that a collection helps applications to perform some type of navigation *without* destroying the relational concept of the database.
Another benefit is, that they provide a framework for the programmer, that can be asociated to collections in common programming languages. They are not a revolution and not necessary. If you don't like them, you can ignore them.

> > If a collection references an object that is deleted, than the reference
> to
> > this object must be removed.

>

> Only if one defines the referential integrity constraint with cascaded
> deletion. What you call deletion of the object is nothing more than
removal
> of the object from the extent. In an rdbms, the data modeller has full
> control over what happens when one deletes any given object variable.
>

There is nothing wrong with referential integrity constraints. Your point was (i think), that it is impossible to have them in an OODB. My point is, that I can
see no reason, that this should be true.

> > > > Is it wrong, beeing a vendor ?
> > >
> > > No. It is wrong to spread ignorance and misconception. It's just less
> > > surprizing when done for financial gain.
> > >
> >
> > Everybody is ignorant - sometimes - one less one more.

>

> However, not everyone tries to spread their ignorance or to reinforce the
> ignorance of others. (I am not just splitting hairs here. This is a
vitally
> important distinction.)

>

It is difficult to distiguish real ignorance (if this really exists) from a different point of view.

> > > > Or who do you think should produce the DBMS, you would like to have.
> > >
> > > A knowledgeable, conscientious, educated vendor instead of the all too
> > > common variety I usually encounter.
> >
> > Saint Who ? I doubt, that someone does without financial interest.
>
> Financial interest is orthogonal to knowledgeability, conscientiousness
and
> education. (Or at least, it should be!)

>
>

> > > > If a
> > > > RDMS would allow navigation (beside of description), there could be
an
> > > > "always" instead of an "overall".
> > >
> > > Untrue. Exposing pointers in the logical interface offers no
performance
> > > advantage over using physical pointers hidden from the user.
> > >
> >
> > From my examples, you can see, that the reference is not exposed to the
> > user, but to the interface of the programming language.
>
> The programming language is a user.

But a user, who might want to navigate!

> > > > > > > > Wrong is:
> > > > > > > > 1. That all optimizations of a query are done with
relational
> > > > algebra.
> > > > > > The
> > > > > > > > most common optimization is to use an index.
> > > > > > >
> > > > > > > Since physical storage determines execution speed, all
> > optimizations
> > > > > > > involve physical storage structures. Many different types of
> > indexes
> > > > > > > exist to change performance characteristics. However, indexes
> only
> > > > > > > scratch the surface of performance altering physical
structures.
> > You
> > > > > > > ignore clustering, physical pointers, pointer pools,
> distribution,
> > > > > > > parallelization and everything else the human mind can
imagine.
> > > > > >
> > > > > > This all has nothing to do with relational algebra!
> > > > >
> > > > > It has everything to do with relational algebra. How else will you
> > > > > specify your intent without restricting the possible physical
access
> > > > > methods?
> > > > >
> > > >
> > > > Parallel access to your data is a problem of transactions and their
> > > > interference. This is one of the reasons, why I designed the
> "reference"
> > > > with the abilty to know "when" it is.
> > >
> > > You have not answered my question. You have only further validated the
> > > contention that the introduction of pointers needlessly complicates
all
> > > database functions: integrity, manipulation, definition, concurrency,
> > > security, etc. Incidentally, the idea of combining a pointer with a
> > > transaction identifier is not new -- SQLBase did that years ago.
> >
> > I am not claiming to do something new. I am just doing the things I am
> > convinced of. Maybe the combination is new, but that I don't know.

>

> You have not addressed the more important issues.
>

Sorry but what are the important issues i have not answered ? Again, in my opinion ( out of my experience ) clustering, physical pointers, pointer pools and distribution have nothing to do with relational algebra. These things are only a problem of "time interference" and not a problem of a specific physical structure.

> > > > > > > parallelization

>

> > > > > > > > Relational algebra is used to
> > > > > > > > transform the query to use the index. But the index in
itself
> is
> > > not
> > > > > > defined
> > > > > > > > by relational algebra.
> > > > > > >
> > > > > > > You apparently misunderstand Dr. Codd's most fundamental goals
> > when
> > > he
> > > > > > > proposed the relational data model. The relational model
> > explicitly
> > > > > > > avoids any definition of physical structures for the very
> purpose
> > of
> > > > > > > allowing physical independence. It permits any physical
> structure.
> > > > > > >
> > > > > >
> > > > > > I have understood very well and I disagree. It permits to
retrieve
> > > data,
> > > > > > that fits into the model, from any physical structure. But the
> model
> > > is
> > > > > > *not* optimal for something like an object.
> > > > >
> > > > > Since a relational domain is an object class and since the values
> > > > > exposed to the user in relations are object values, I can see no
> valid
> > > > > support for the above statement. It is just another outrageous
claim
> > > > > by a vendor trying to sell a flawed product.
> > > > >
> > > >
> > > > Can a relational domain give you these answers:
> > > > - i am of type ClassXY and i am derived from ClassX ?
> > >
> > > Yes, if you mean type inheritance when you say "derived".
> >
> > Where is the relational domain defined ?
>

> The concept of domain was introduced in Codd's 1970 paper. If you are
> talking about specific domains in the dbms, the user defines some and the
> dbms defines some as built-in types.
>

Than it should be O.K. that i define the "reference to object" domain.

--
Adrian Veith,
Veith System GmbH.
www.db-gonzales.de
Received on Thu Oct 18 2001 - 10:59:32 CEST

Original text of this message