Re: The Practical Benefits of the Relational Model

From: Peter Koch Larsen <pkl_at_mailme.dk>
Date: Fri, 27 Sep 2002 21:00:49 +0200
Message-ID: <3d94aae2$0$18114$edfadb0f_at_dspool01.news.tele.dk>

There are lots of snips here and there.

"Nathan Allan" <nathan_at_alphora.com> skrev i en meddelelse news:fedf3d42.0209261806.4bb94365_at_posting.google.com...
> pkl_at_mailme.dk (Peter Koch Larsen) wrote in message
news:<61c84197.0209260611.6e61aeb9_at_posting.google.com>...
> > Just to stick with C++, the tradition is to put operators outside both
> > classes.
>
> But to my knowledge, this cannot be done for operator overloads.
> (again, I am not trying to attack C++, just take it as an observation)
It can and normally should be, but this is off-topic.
>
> > Yes. This is the approach taken by D: saying, that anything may happen
> > and that it is up to the implementor (and any future implementor) to
> > assure that semantics are preserved.
>
> Isn't it the case in any language that a rogue type designer can mess
> things up pretty badly (oops, forgot to call base::XXX). Before you
> bring up member hiding, I would point out that it is mostly syntactic
> sugar and not real protection.
Yes - this is indeed the case. But the operator overloading in D may affect otherwise perfectly working code - eg. code working on ELLIPSE-values specifically may break when some other user implements the CIRCLE type.
>
> > But the semantics can change in D as well. One implementator
> > implementing the AREA of CIRCLE might define pi as 3.14 (or use a
> > system-provided constant), while another - defining AREA for ELLIPSE
> > might define it as 22/7. Nothing in the D language (apart from the
> > note that "semantics MUST be preserved) prevents this.
> > > In this case, as you guessed, I am referring to languages that have
> > > "properties." There are actually some pretty good reasons to draw a
> > > formal line between properties and get/set functions. (See "possible
> > > representations" in The Third Manifesto book).
> >
> > I see nothing but a thin syntactic layer here.
>
> There is a subtle (but important) distinction. Possible
> representations have exclusive access to the actual (or physical)
> representation. This draws a clean, non-arbitrary line between the
> logical/physical. It also defines a clear role for properties
> (accessors) as opposed to other operators; namely, the properties
> composing a possible representation provide a complete representation
> of the value.
>

Okay. So there's a difference for D4.

> > But D has taken hiding to its extreme in its type system. So I believe
> > you are contradicting yourself?
>
> I never looked at it that way, but more accurately, I would say that
> it has taken hiding _out_ of the logical model. I guess it is a good
> kind of hiding _because_ it takes it to the extreme. I might add that
> it also simplifies the logical model in the process.

Actually, if you define the type-system (and a lot of the RM-prescriptions) as part of TTM, hiding is part to the extent that TTM prohibits any physical datadescriptions in D - these descriptions must (enforced by the model, I would say) even be written in another, hidden language. But perhaps we are only discussing words here.

>
> > The motivating factor behind information hiding is not to enforce
> > integrity but to enforce an appropriate level of abstraction and to
> > allow for changes in internal representation without affecting the
> > users of the type in question.
>
> You are describing implementation independence, which again D takes to
> the extreme.
>
> > And as soon as you turn to concrete implementations
> > such as C++, you can find very concise definitions of a model.
>
> I think this is apples and oranges territory again. Surely you could
> derive some conceptual model from any specific implementation (this
> would be a VERY complex model in the case of C++). Would everyone
> agree that the extracted model is _the_ OO model?! Probably not.

I do believe that part of the C++ standard is describing an ObjectO model. This is done very concisely in the C++ ISO standard. Again, we're off-topic.
>
> > I regard TTM as a very high-level description, vague in many places.
> > Just search for the words "very loosely" in TTM.
>
> Your assertion that TTM is a "very high-level description" is correct
> and desirable. It's purpose is to lay out a general blue print
> without restricting implementation possibilities.

This depends on the purpose with the TTM book. If it was to form the base for a common group of languages, it fails by not providing a "feel" of what e.g. relational assignment should be. If it is a motivation for a new approach to database management systems (and this I believe to be the case), the book is in my opinion far to detailed in the description of its type system, and with to little emphasis of what should be its core: relational stuff such as view updateability, relational assignment, why nulls should be forbidden, why there should not be tuple-level access and lots of other stuff in that ballpark.

>
> > One such place is
> > discussed elsewhere in this newsgroup, namely relational assignment. I
> > would have hoped for (and expected) a more thorough discussion of this
> > subject in TTM.
>
> Relational assignment, in it's pure form is a trivial matter. It does
> become sticky when the imperative language "overloads" the concept
> with things such as triggers and cascades, but it is up to such a
> language to clearly specify the semantics. We feel that the
> specification of such matters in TTM would be unnecessarily
> restrictive. IOW, TTM leaves room for art. ;-)

Room for art? Be careful that it does not translate to room for surprises for the end-user. ;-)

>
> > Another vague and very central subject is that of view
> > updateability: hardly discussed at all.
>
> View updatability is clearly spelled out by Date in several places.
> The most recent of these is the 7th edition of An Introduction to
> Database Systems. There, update rules are spelled out for each
> relational operator. The Dataphor DAE fully supports view
> updateability.
[This part was partly written before that i noticed that An Introduction to Database Systems was actually referenced in TTM section on view updates] I have read the Introduction to DBMS, but probably not the seventh edition (I do not have my edition present, but it was from 1990'es). There I do remember reading about view-updateability, but it definitely did not propose that ALL views were in theory updateable. The brief mention of the subject in TTM claims that (with a back door wrt constraint violations (pg 151)) and follows up with an example of an update of a union, where both underlying relations are updated. While this does work wrt the view, there is no argumentation that this should be so - you could e.g. just update one of the relations. I am sceptical of such an approach, especially when the choice seems so arbitrary and justification is not present. Anyway it is a disappointment on such a central point not to see references to books that are not by the present authors.

> The closest system type to a rational in our implementation (D4) is
> Decimal, which is a supertype of several different flavors of integer
> types. So loosely the answer is yes.

I had hoped for some rational, that would be suitable for complex numbers ;-(
>
> In our implementation also, they could be different physical
> representations. I will qualify that our implementation does have a
> few gotchas when a descendant changes the physical representation.

I do not believe you do need to have gotchas - just a more complex implementation.
>
> > > > 2) How do you perform integer division? How do you perform RATIONAL
> > > > division?
> > >
> > > Those are semantically distinct operators and they should be defined
> > > as such.
> > Again: How is it handled in the Alphora product?
>
> We have an intrinsic "/" operator that returns a decimal in all cases.
> Integer division is handled through a "div" operator. There is also
> a "mod".
>
> > The important thing
> > is that they must be they must be syntactically different as well -
>
> Agreed
>
> > and that is not to my liking. For one reason because of the ugliness
> > of the resulting expression. You might end up writing k :=
> > INTEGER_DIVIDE(i,j) rather than k := i/j for integers and z :=
> > REAL_DIVIDE(x,y) for rational numbers.
>
> You might, but not in D4. ;-)
>
> k := i div j
> k := i / j

Okay. But this only requires my argument to change. I could create my (mathematically unsound) decimal_with_infinity (or even simpler a bounded integer). This type then would be unable to use the "/" or the "div" operator - at least not in a D-implementation with S by C implemented. At least matematicians would be very sorry if they would have to change operators for each type used.

>
> This is stolen from Pascal. C's overload of the "/" operator has
> doubtless caused innumerable bugs because it is poorly defined.
On the contrary (being off topic again), the "/" operator is very clearly defined in C and C++.
> Oops, there I go language bashing again. Seriously though, we like to
learn
> from each language and take the "best of" and leave the bad ideas
> behind.
This is a sound principle.
>
> > Another reason is that the
> > type-implementors must be aware of any super-types and subtypes.
> > This last example is better considered in the light of the
> > COMPLEX/RATIONAL example. Here, SQRT might already be defined in the
> > RATIONAL case and thus you can not use SQRT for the same operation
> > with a COMPLEX parameter.
>
> What you are describing is a tricky aspect of implementation, not of
> the logical concepts presented in TTM. We have considered this
> problem, and frankly have not yet fully worked out an adequate
> implementation solution. This is not to say that the model is flawed,
> nor do I think this problem is unsolvable. The _logical_ model
> presented in TTM is purposely silent on matters such as this, as they
> have no bearing on the model itself.

The model says that if x is declared as COMPLEX and if COMPLEX is a supertype of REAL (that is if every real number is a complex number) and if the imaginary value of x happens to be zero, then the REAL sqrt routine should be called and not the COMPLEX (I assume here, that both are defined).

>
> > > Sure. Table A references (has a FK to) table T. When I have a query
> > > or view based on "A join B" (B is another table), it is possible to
> > > infer that the query or view implicitly references table T. This is a
> > > very powerful byproduct of logical data independence and forms the
> > > basis of our product's ability to derive complete user interfaces.
> >
> > But the D language JOIN uses the names of the columns to derive a
> > JOIN. Thus foreign keys are not in any way taken into consideration.
>
> I think you misunderstood. The principal I am discussing is the
> inference of "metadata" for derived tables. Such inference has no
> impact on the semantics of the relational operators. Let me give a
> more concrete example: We have a Customer table and a Zipcode table.
> There is a reference (FK) from the Customer to the Zipcode table. Now
> lets say we have a view, ActiveCustomer, defined as "Customer join
> Sale". We would expect for any SQL system to know the columns (with
> associated names and types) for the ActiveCustomer view. This gives
> us a degree of logical data independence in this respect. But what I
> am saying is possible (and is done by Dataphor), is the inference of
> other information. In our example, the system can tell us that the
> ActiveCustomer view references the Zipcode table. This knowledge can
> be used, for example, to provide a "lookup" from the ActiveCustomer
> user interface to the Zipcode table. This is an extremely powerful
> concept that has been previously neglected. I would also mention,
> though it should be obvious, that there are inference semantics in all
> relational operators (not just joins) for all metadata (not just
> references).

I fail to see how this is related to TTM or Dataphor. This inference is available for any SQL system with FK-support as well.

>
> > > > > ... Well defined inheritance is beneficial because the SYSTEM can
> > > > > help us enforce domain constraints and such.
> > ...
> >
> > I am still confused. This should be possible even if another
> > inheritance model is used, should it not? You would just have to
> > declare (in that hypothetical language) that (e.g.) CIRCLE is a
> > subtype of ELLIPSE.
>
> Right, but then enters the work of specifying the specific semantics
> of a CIRCLE. Using a constraint-based inheritance model such as the
> one provided by TTM, we can easily create types (e.g. LARGECIRCLE)
> merely by specifying a declarative constraint (i.e. radius > 1000).
>
Yes. And?

> > > > > -Type inheritance.
> > > > > OO is is not a "model" and is certainly not a data management
solution.
> >
> > > > What about generic programming.
> > >
> > > What about it? Are you suggesting that "generic programming" is some
> > > kind of formal model or is a data management solution?
> >
> > I meant that generic programming is one of the concepts that have
> > evolved with object oriented system. I did forget LISP here, perhaps
> > this way of programming should not be attributed solely to object
> > oriented systems, but certainly it is C++ that has more than any made
> > generic programming a solution for the masses.
>
> I'm sorry, but I still fail to see it's relevance here. 8^)
So do I, perhaps ;-)
>
> > > The Third Manifesto book itself includes a formal definition of the
> > > model and includes the necessary references.
> >
> > I did look for references for "relational assignment" and "view
> > updateability" in particular. For relational assignment i found
> > nothing, for view updateability one that is out-of-date and needs
> > revision.
>
> See pages 79, 165-166. As previously mentioned, Mr. Date has
> thoroughly covered view updateability elsewhere. I wouldn't say it is
> "out-of-date", more, intentionally high-level.
I think you should take another look! For relational assignment I found less than a paragraph describing "very loosely" how this operation works. Page 165-166 describes how UPDATE (but without key-changes), DELETE and INSERT can be implemented as macroes.
>
> > > ... It is not the role of a formal model to "consider the
> > > implications that computers are finite." This is the job of an
> > > implementation. Let's be clear: the relational model is a conceptual
> > > model, not some implementation prescription.
> >
> > Would it not be the model that should prescribe what should happen in
> > situations such as the following:
> >
> > R WHERE <cond_a> OR <cond_b>
> >
> > Where there is an overflow in <cond_a> or in <cond_b>? Also, what
> > should happen in this query should an error occur (such as a failing
> > TREAT_DOWN_AS_.... operator)? It might be nitty-gritty for some, but
> > not for the serious user of an actual system. And as a user of the
> > D-language (e.g. as a member of the Alphora development team), these
> > items WOULD interest me.
>
> Absolutely they are of interest to us and we have had to consider
> them. Again, however, I don't think they belong in the TTM. Think of
> it this way, someone could implement a "written D" totally outside the
> realm of computers. Anything (within reason) that would have to be
> ignored by that person, is arguably not part of the logical model. In
> the case of an overflow in "written D", an exception handling system
> would kick in to retrieve an other piece of paper. Sorry I couldn't
> resist. ;-)
>
> In our implementation, we have an exception management system to
> handle such errors (try...finally...except).
>
> > Apart from this, there is at least one place in TTM where an infinite
> > memory model is implied - namely RM prescription 5. A hint? Consider a
> > point which has a physical representation with cartesian coordinates.
> > What happens in this (pseudo)code:
> > POINT p (1.0,0.0);
> > THE_ANGLE(p) := 45; (degrees assumed)?
>
> And in a logical model, that is perfectly reasonable. In fact, it
> leave room for rather nifty implementation possibilities--like an
> implementation that "represents" infinite?! It wouldn't (couldn't)
> "materialize" the value, but would be useful for cases such as the one
> you identify above. Our implementation doesn't do this, but my point
> is that a logical model is a logical model precisely to give
> implementors total leeway.

Ahhh - a logical model is there to be broken? I surely misunderstood you! ;-)
>
> > - the concept of NULLs is discussed in the context of SQL. While it is
> > easy to agree that the SQL implementation is bad, the concept is
> > simply dismissed. Instead an entirely inadequate (in my current
> > belief) solution is introduced in connection with the discussion of
> > SQL-migration and when discussing outer joins. There are many open
> > questions in this area. (One: if the right-side operator of an outer
> > join contains a boolean field, what value should be substituted for
> > "no value").
>
> This is deep area where I would probably point you to Date's
> "Writings" books (I think everyone should have them all). Rarely is a
> dead horse beaten as badly as with this subject.
>
> I might mention that our product does have "special values". Frankly
> I don't miss nulls. Now, even when we design systems in SQL, we don't
> consider nulls an option, and it has dramatically improved our designs
> (and implementations).
So an outer join will return these "special" values? How is the definition of these special values - are they outside the normal value set? If so, what is the difference between SQL NULLs and your special values - apart from the failures made by the SQL product (mentioned in TTM) and that probably x = x is a tautology for true?

>
> > - Tuple level operations are banned, again with little motivation. Why
> > not allow tuple-level operations as well - there is no reason these
> > operations should follow SQL-guidelines.
>
> There are good reasons for this. A relation constitutes a single
> value. Any attempt to update the relation at a more granular level
> undermines the definition of the relation. This is a non issue if
> more granular update operators are defined as short-hands for
> relational assignment to avoid update anomalies. The explicit
> mentions to SQL are to make it clear that an implementation of D is
> not to fall into some of the same pitfalls that SQL did in this
> regard.

I do not follow you here. A string is a single value, but you can still (I believe) ask for the character value at a given position, and probably update that value as well. This does not undermine the definition of the string, does it? What then is it, that undermines the relational system if we allow tuple-level access. I will emphasize that I do not intend to (re)introduce anything like ordering or bags.

>
> > There are other places. One is OO VSS 1: Coercion (that is implicit
> > type conversion) is not supported. (Page 234ff in the second edition).
> > The the motivation is (quote pg. 235)"we do agree that prohibiting
> > coercion ia a good idea anyway for a variety of other reasons -
> > reasons that are widely understood in the programming languages
> > community". There are no references to substantiate that claim. This
> > explanation is very weak anyway, especially as they do allow numerous
> > coercions - e.g. from CIRCLE to ELLIPSE.
>
> CIRCLE to ELLIPSE are not logically coercions. See the same page 286
> as before. The reason (to me) is that the language is both simpler
> and clearer without coercions. No funny business. WYSIWYG.

This must be a question of definition. In my mind the assignment ELLIPSE := CIRCLE is a coercion as ELLIPSE and CIRCLE are different types, that might not even share the same physical representation.

>
> > If you have the time, it would be interesting for us to hear in which
> > areas you did deviate and then return to the "proper way".
>
> This would be a LONG story. I would like to tell that story at some
> point... but we are too busy living the ending. :-) One thing I will
> say, and we have never mentioned this to Date and Darwen, is that when
> Bryn (co-architect) and I first looked at their type inheritance
> model, we looked at each other and said, "are they crazy?! This could
> never be implemented! Any why would you want to!" <g> Ahh... it is
> so easy to mock what we don't understand.

I did look at it, but never thought that it would be nonimplementable. My first thought way "why? this is clearly the case for type substitution, not type inheritance", and my second was: "this model has some negative implications for performance".

>
> > I would like to know in which areas you are nonconforming, but this
> > will probably be mentioned in the documentation mentioned below?
>
> Below is an informal document of non-conformance. We have never made
> it public before, but I see no harm in it.

Thanks - very interesting.

>
> > I will look forward to read it!
>
> Me too! :-)

Ahhh.... I will have to wait a while, it seems! ;-)
>
> Very much enjoying this conversation.
So do I.
>
> Regards,
>
> --
> Nathan Allan

> ---------------------------------------
> Documentation of Compliance with The Third Manifesto:
>
[snip]
>
> OO Prescription 2:
> We support type inheritance to define types, and obtain compile time
> substitutability.
> We do not at this time support run time substitutability, nor S or G
> by C.

Ahhhh..... there is one of the major points missing. You did have a look at the Db language /reference can be found in www.thethirdmanifesto.com)?

Kind regards
Peter Received on Fri Sep 27 2002 - 21:00:49 CEST

This message: [ Message body ]
Next message: Nathan Allan: "Re: The Practical Benefits of the Relational Model"
Previous message: pete: "Boyce Codd Normal Form"
Maybe in reply to: Joel Meulenberg: "Re: The Practical Benefits of the Relational Model"
In reply to Peter Koch Larsen: "Re: The Practical Benefits of the Relational Model"
Next in thread: Nathan Allan: "Re: The Practical Benefits of the Relational Model"
Reply: Peter Koch Larsen: "Re: The Practical Benefits of the Relational Model"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message