Re: The Practical Benefits of the Relational Model

From: Nathan Allan <nathan_at_alphora.com>
Date: 26 Sep 2002 19:06:45 -0700
Message-ID: <fedf3d42.0209261806.4bb94365_at_posting.google.com>


pkl_at_mailme.dk (Peter Koch Larsen) wrote in message news:<61c84197.0209260611.6e61aeb9_at_posting.google.com>...

> The V-table remark indicated that you specifically had C++ (or a
> dialect) in mind.

Clearly some type of V-table underlies most (if not all) OO implementation. In fact I was first introduced to it when they added OO to Borland's Pascal.

> But you _are_ arguing against existing languages and you do so in
> order to promote your future high-level language.
> ...
> Perhaps this was an apple versus oranges comparison.

I suppose I should be more clear: At this point, our language (D4) is quite targeted, specifically at high level application development. One of our intents with D4 is to bring "real programmers" features up into the high level domain, not so much the opposite (at least at this point). Hence my balking at the comparison with C++.

> Just to stick with C++, the tradition is to put operators outside both
> classes.

But to my knowledge, this cannot be done for operator overloads. (again, I am not trying to attack C++, just take it as an observation)

> Yes. This is the approach taken by D: saying, that anything may happen
> and that it is up to the implementor (and any future implementor) to
> assure that semantics are preserved.

Isn't it the case in any language that a rogue type designer can mess things up pretty badly (oops, forgot to call base::XXX). Before you bring up member hiding, I would point out that it is mostly syntactic sugar and not real protection.

> But the semantics can change in D as well. One implementator
> implementing the AREA of CIRCLE might define pi as 3.14 (or use a
> system-provided constant), while another - defining AREA for ELLIPSE
> might define it as 22/7. Nothing in the D language (apart from the
> note that "semantics MUST be preserved) prevents this.

A good point... but I would argue that because such overloads are logically unnecessary, they are not a flaw with the logical system. Allowing overrides of the D nature could therefore be considered the role of a systems developer, and a systems developer (by design) can and should be allowed to REALLY mess things up in efforts to optimize and enhance the system. The point is that the application developer can be kept "safe" from such matters.

> > In this case, as you guessed, I am referring to languages that have
> > "properties." There are actually some pretty good reasons to draw a
> > formal line between properties and get/set functions. (See "possible
> > representations" in The Third Manifesto book).
>
> I see nothing but a thin syntactic layer here.

There is a subtle (but important) distinction. Possible representations have exclusive access to the actual (or physical) representation. This draws a clean, non-arbitrary line between the logical/physical. It also defines a clear role for properties (accessors) as opposed to other operators; namely, the properties composing a possible representation provide a complete representation of the value.

> But D has taken hiding to its extreme in its type system. So I believe
> you are contradicting yourself?

I never looked at it that way, but more accurately, I would say that it has taken hiding _out_ of the logical model. I guess it is a good kind of hiding _because_ it takes it to the extreme. I might add that it also simplifies the logical model in the process.

> The motivating factor behind information hiding is not to enforce
> integrity but to enforce an appropriate level of abstraction and to
> allow for changes in internal representation without affecting the
> users of the type in question.

You are describing implementation independence, which again D takes to the extreme.

> And as soon as you turn to concrete implementations
> such as C++, you can find very concise definitions of a model.

I think this is apples and oranges territory again. Surely you could derive some conceptual model from any specific implementation (this would be a VERY complex model in the case of C++). Would everyone agree that the extracted model is _the_ OO model?! Probably not.

> I regard TTM as a very high-level description, vague in many places.
> Just search for the words "very loosely" in TTM.

Your assertion that TTM is a "very high-level description" is correct and desirable. It's purpose is to lay out a general blue print without restricting implementation possibilities.

> One such place is
> discussed elsewhere in this newsgroup, namely relational assignment. I
> would have hoped for (and expected) a more thorough discussion of this
> subject in TTM.

Relational assignment, in it's pure form is a trivial matter. It does become sticky when the imperative language "overloads" the concept with things such as triggers and cascades, but it is up to such a language to clearly specify the semantics. We feel that the specification of such matters in TTM would be unnecessarily restrictive. IOW, TTM leaves room for art. ;-)

> Another vague and very central subject is that of view
> updateability: hardly discussed at all.

View updatability is clearly spelled out by Date in several places. The most recent of these is the 7th edition of An Introduction to Database Systems. There, update rules are spelled out for each relational operator. The Dataphor DAE fully supports view updateability.

> > > How well thought of is this model? As a hint let me
> > > just ask you these questions:
> > > 1) Is RATIONAL a SUPERTYPE of INTEGER?
> >
> > Yes.
>
> This and the following questions were more pragmatic than you assumed.
> I was asking specifically about your implementation of the model.

The closest system type to a rational in our implementation (D4) is Decimal, which is a supertype of several different flavors of integer types. So loosely the answer is yes.

> However - if the "Yes" above was meant to be representative for all
> D-languages, your answer differs from the one in the TTM (page 286 in
> the second edition).

It really isn't possible to answer this for all D languages because type inheritance is explicitly an option. The discussion on page 286 concerns coercions. He give one example of the state of things when INTEGER is not a subtype of RATIONAL for the sake of analyzing the coercion implications. Clearly, if your D supports type inheritance is it only logical that INTEGER is a subtype of RATIONAL.

> > > If yes, is there then
> > > a difference in physical representation for these two types?
> How is it handled in the Alphora product?

In our implementation also, they could be different physical representations. I will qualify that our implementation does have a few gotchas when a descendant changes the physical representation.

> > > 2) How do you perform integer division? How do you perform RATIONAL
> > > division?
> >
> > Those are semantically distinct operators and they should be defined
> > as such.
> Again: How is it handled in the Alphora product?

We have an intrinsic "/" operator that returns a decimal in all cases.  Integer division is handled through a "div" operator. There is also a "mod".

> The important thing
> is that they must be they must be syntactically different as well -

Agreed

> and that is not to my liking. For one reason because of the ugliness
> of the resulting expression. You might end up writing k :=
> INTEGER_DIVIDE(i,j) rather than k := i/j for integers and z :=
> REAL_DIVIDE(x,y) for rational numbers.

You might, but not in D4. ;-)

k := i div j
k := i / j

This is stolen from Pascal. C's overload of the "/" operator has doubtless caused innumerable bugs because it is poorly defined. Oops, there I go language bashing again. Seriously though, we like to learn from each language and take the "best of" and leave the bad ideas behind.

> Another reason is that the
> type-implementors must be aware of any super-types and subtypes.
> This last example is better considered in the light of the
> COMPLEX/RATIONAL example. Here, SQRT might already be defined in the
> RATIONAL case and thus you can not use SQRT for the same operation
> with a COMPLEX parameter.

What you are describing is a tricky aspect of implementation, not of the logical concepts presented in TTM. We have considered this problem, and frankly have not yet fully worked out an adequate implementation solution. This is not to say that the model is flawed, nor do I think this problem is unsolvable. The _logical_ model presented in TTM is purposely silent on matters such as this, as they have no bearing on the model itself.

> > > 3) If you declare a type COMPLEX, can you then declare it a SUPERTYPE
> > > of RATIONAL? If no, why? If yes how is the physical representation?
> >
> > Conceptually yes. In fact, you could look at each new type as
> > suddenly becoming yet another direct or indirect supertype of Omega.
> > Physical representation is up to the implementation.
>
> Why did the word "conceptually" creep in? Is it because it might
> actually not be so? In this case, the model has a problem.

"Conceptually" crept in because our implementation has yet to fully address this (as noted before). The model, however, makes no restriction concerning the ability to arbitrarily modify the type graph. The model does _not_ have a problem.

> > Sure. Table A references (has a FK to) table T. When I have a query
> > or view based on "A join B" (B is another table), it is possible to
> > infer that the query or view implicitly references table T. This is a
> > very powerful byproduct of logical data independence and forms the
> > basis of our product's ability to derive complete user interfaces.
>
> But the D language JOIN uses the names of the columns to derive a
> JOIN. Thus foreign keys are not in any way taken into consideration.

I think you misunderstood. The principal I am discussing is the inference of "metadata" for derived tables. Such inference has no impact on the semantics of the relational operators. Let me give a more concrete example: We have a Customer table and a Zipcode table. There is a reference (FK) from the Customer to the Zipcode table. Now lets say we have a view, ActiveCustomer, defined as "Customer join Sale". We would expect for any SQL system to know the columns (with associated names and types) for the ActiveCustomer view. This gives us a degree of logical data independence in this respect. But what I am saying is possible (and is done by Dataphor), is the inference of other information. In our example, the system can tell us that the ActiveCustomer view references the Zipcode table. This knowledge can be used, for example, to provide a "lookup" from the ActiveCustomer user interface to the Zipcode table. This is an extremely powerful concept that has been previously neglected. I would also mention, though it should be obvious, that there are inference semantics in all relational operators (not just joins) for all metadata (not just references).

> > > > ... Well defined inheritance is beneficial because the SYSTEM can
> > > > help us enforce domain constraints and such.
> ...
>
> I am still confused. This should be possible even if another
> inheritance model is used, should it not? You would just have to
> declare (in that hypothetical language) that (e.g.) CIRCLE is a
> subtype of ELLIPSE.

Right, but then enters the work of specifying the specific semantics of a CIRCLE. Using a constraint-based inheritance model such as the one provided by TTM, we can easily create types (e.g. LARGECIRCLE) merely by specifying a declarative constraint (i.e. radius > 1000).

> > > > -Type inheritance.
> > > > OO is is not a "model" and is certainly not a data management solution.
>
> > > What about generic programming.
> >
> > What about it? Are you suggesting that "generic programming" is some
> > kind of formal model or is a data management solution?
>
> I meant that generic programming is one of the concepts that have
> evolved with object oriented system. I did forget LISP here, perhaps
> this way of programming should not be attributed solely to object
> oriented systems, but certainly it is C++ that has more than any made
> generic programming a solution for the masses.

I'm sorry, but I still fail to see it's relevance here. 8^)

> > The Third Manifesto book itself includes a formal definition of the
> > model and includes the necessary references.
>
> I did look for references for "relational assignment" and "view
> updateability" in particular. For relational assignment i found
> nothing, for view updateability one that is out-of-date and needs
> revision.

See pages 79, 165-166. As previously mentioned, Mr. Date has thoroughly covered view updateability elsewhere. I wouldn't say it is "out-of-date", more, intentionally high-level.

> > ... It is not the role of a formal model to "consider the
> > implications that computers are finite." This is the job of an
> > implementation. Let's be clear: the relational model is a conceptual
> > model, not some implementation prescription.
>
> Would it not be the model that should prescribe what should happen in
> situations such as the following:
>
> R WHERE <cond_a> OR <cond_b>
>
> Where there is an overflow in <cond_a> or in <cond_b>? Also, what
> should happen in this query should an error occur (such as a failing
> TREAT_DOWN_AS_.... operator)? It might be nitty-gritty for some, but
> not for the serious user of an actual system. And as a user of the
> D-language (e.g. as a member of the Alphora development team), these
> items WOULD interest me.

Absolutely they are of interest to us and we have had to consider them. Again, however, I don't think they belong in the TTM. Think of it this way, someone could implement a "written D" totally outside the realm of computers. Anything (within reason) that would have to be ignored by that person, is arguably not part of the logical model. In the case of an overflow in "written D", an exception handling system would kick in to retrieve an other piece of paper. Sorry I couldn't resist. ;-)

In our implementation, we have an exception management system to handle such errors (try...finally...except).

> Apart from this, there is at least one place in TTM where an infinite
> memory model is implied - namely RM prescription 5. A hint? Consider a
> point which has a physical representation with cartesian coordinates.
> What happens in this (pseudo)code:
> POINT p (1.0,0.0);
> THE_ANGLE(p) := 45; (degrees assumed)?

And in a logical model, that is perfectly reasonable. In fact, it leave room for rather nifty implementation possibilities--like an implementation that "represents" infinite?! It wouldn't (couldn't) "materialize" the value, but would be useful for cases such as the one you identify above. Our implementation doesn't do this, but my point is that a logical model is a logical model precisely to give implementors total leeway.

> - the concept of NULLs is discussed in the context of SQL. While it is
> easy to agree that the SQL implementation is bad, the concept is
> simply dismissed. Instead an entirely inadequate (in my current
> belief) solution is introduced in connection with the discussion of
> SQL-migration and when discussing outer joins. There are many open
> questions in this area. (One: if the right-side operator of an outer
> join contains a boolean field, what value should be substituted for
> "no value").

This is deep area where I would probably point you to Date's "Writings" books (I think everyone should have them all). Rarely is a dead horse beaten as badly as with this subject.

I might mention that our product does have "special values". Frankly I don't miss nulls. Now, even when we design systems in SQL, we don't consider nulls an option, and it has dramatically improved our designs (and implementations).

> - Tuple level operations are banned, again with little motivation. Why
> not allow tuple-level operations as well - there is no reason these
> operations should follow SQL-guidelines.

There are good reasons for this. A relation constitutes a single value. Any attempt to update the relation at a more granular level undermines the definition of the relation. This is a non issue if more granular update operators are defined as short-hands for relational assignment to avoid update anomalies. The explicit mentions to SQL are to make it clear that an implementation of D is not to fall into some of the same pitfalls that SQL did in this regard.

> There are other places. One is OO VSS 1: Coercion (that is implicit
> type conversion) is not supported. (Page 234ff in the second edition).
> The the motivation is (quote pg. 235)"we do agree that prohibiting
> coercion ia a good idea anyway for a variety of other reasons -
> reasons that are widely understood in the programming languages
> community". There are no references to substantiate that claim. This
> explanation is very weak anyway, especially as they do allow numerous
> coercions - e.g. from CIRCLE to ELLIPSE.

CIRCLE to ELLIPSE are not logically coercions. See the same page 286 as before. The reason (to me) is that the language is both simpler and clearer without coercions. No funny business. WYSIWYG.

> If you have the time, it would be interesting for us to hear in which
> areas you did deviate and then return to the "proper way".

This would be a LONG story. I would like to tell that story at some point... but we are too busy living the ending. :-) One thing I will say, and we have never mentioned this to Date and Darwen, is that when Bryn (co-architect) and I first looked at their type inheritance model, we looked at each other and said, "are they crazy?! This could never be implemented! Any why would you want to!" <g> Ahh... it is so easy to mock what we don't understand.

> I would like to know in which areas you are nonconforming, but this
> will probably be mentioned in the documentation mentioned below?

Below is an informal document of non-conformance. We have never made it public before, but I see no harm in it.

> I will look forward to read it!

Me too! :-)

Very much enjoying this conversation.

Regards,

--
Nathan Allan

---------------------------------------
Documentation of Compliance with The Third Manifesto:

Items of non-compliance:

RM Prescription 3c:
	No distinction is made between read only and update operators by the
compiler.
	No restriction is made on update operators not returning a result.
	We see no reason to make an arbitrary distinction in this regard, as
we can
	detect when side-affects occur, so we are able to prevent the
execution
	of non-functional operators in functional contexts.
	
RM Prescription 6, 7, 9, 10:
	Our tuple and relation type generators and values only allow scalar
type attributes.
	
RM Prescription 21, specifically multiple assignments:
	We are still in the analysis phase on this one.
	
RM Prescription 25:
	We do not at this time support updates to the system catalog relvars,
DDL statements must be used to effect the changes.
	
RM Prescription 26:
	We do not feel we can make a statement of compliance on this
prescription, because we are probably biased.

RM Proscription 6:
	As long as the argument that some of our constructs are merely
declarative meta data is accepted, we are in full compliance.
	There is certainly a need for a layer which ties the physical to the
logical in some manner, we have opted to represent this layer
	as meta data attached to logical definitions.  They mean nothing in
the logical model, and are therefore separate from it.
	
	For example:
	
		create domain ID
		{
			representation ID
			{
				Value : String
					read class "Softwise.Foundation.Dataphor.IDValueReadNode"
					write class "Softwise.Foundation.Dataphor.IDValueWriteNode"
			} class "Softwise.Foundation.Dataphor.IDSelectorNode"
		} tags {StaticByteSize = "10"};
		
		The StaticByteSize tag indicates to the storage layer that up to 10
bytes of storage be colocated with the row.
		The class constructs indicate which host language type is used to
implement the selector and accessor operators for the domain.
		
OO Prescription 2:
	We support type inheritance to define types, and obtain compile time
substitutability.
	We do not at this time support run time substitutability, nor S or G
by C.
		
OO Prescriptions 4 and 5:
	Transaction support in the current version is limited to the
supporting storage engine.
	In the next version we will implement full transaction support as
prescribed by the Manifesto.
		
RM Very Strong Suggestion 2:
	We support declarative referential integrity in the form of a
Reference, which is defined as follows:
	
		create table A {ID : ID, Name : string, key {ID}};
		create table B {ID : ID, A_ID : ID, key {ID}, reference B_A {A_ID}
references A {ID}};
		
		The execution of these statements will produce a constraint
equivalent to the following database level constraint:
		
			constraint B_A IsSpecial(A_ID) or exists (A where ID = A_ID);
		
		Where IsSpecial is an operator created by the system during the
definition of the type on which A.ID is defined which returns true if
the argument is equal to any special value defined on that domain.
		
		We included this concept because a true foreign key is a subset of
this behavior, and we have found this behavior to be useful in the
past.
		
RM Very Strong Suggestion 4:
	We do not support transition constraints at this time, however, we
have plans to include such support in some future version.
	
RM Very Strong Suggestion 6:
	We have an operator we have called EXPLODE which provides a solution
to the classic bill of materials problem, however we do not include
explicit support for the generalized transitive closure operator, nor
for generalized concatenate and aggregate operations, although such
support could be added by an end user through RM Very Strong
Suggestion 7.
	
OO Very Strong Suggestion 1:
	See the remarks for OO Prescription 2 above.
	
OO Very Strong Suggestion 3:
	We do not yet support the ARRAY and SET type generators, although an
astute user could provide such support.
Received on Fri Sep 27 2002 - 04:06:45 CEST

Original text of this message