Re: Representing Types in the Catalog (was Re: Distributed foreign keys)

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Fri, 11 Jul 2003 16:08:41 +0100
Message-ID: <bemk2h$160u$1_at_gazette.almaden.ibm.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:5SkPa.581$1m2.74543232_at_mantis.golden.net...
> "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
> news:bek9jh$25k0$1_at_gazette.almaden.ibm.com...
> > I'm claiming that a relational expression (that defines a type literal) is
> > sufficient to define all the core aspects of a type.
>
> Because a type comprises both a set of values and a set of operations, a
> relational expression does not suffice.

I'm with D&D on this one. OO Very Strong Suggestion #2 - Types and operators unbundled: "Operator definitions should be logically distinct from the definitions of the types of their parameters and/or results"

To summarise (what is turning into a very long post), here is my current set of catalog objects:

  One (or more*) REs to define the same poss rep literal for each type and subtype.
  Constraints** on poss rep literals such that the set of attributes of each poss rep is a key of the literal. These key constrants also define which attributes are in which poss rep.
  Optionally, tuples in a single catalog relation of 'meta data' to give each poss rep a name.

  Up to one RE per poss rep of a type to equate non-canonical representation values (if any) with thier canonical form.   Constraints on 'non-can' literals such that the set of attributes for the non-canonical values is a key.

  A catalog relation to hold one 'paramerterised' RE value per type generator.

Probably one 'meta-data' relation to distigush type defining REs from plain old REs.
Another 'meta-data' relation to distigush operator defining REs from plain old REs.

  • see later ** all constraints stored in one catalog relation (although I would probably keep constraints on the catalog separate from constraints on the database proper)

> The relation literal at best can
> describe the set of values. Because the catalog must describe not only types
> but type generators, a relation literal suffices only to describe one
> component of one instance of what the catalog must describe.

Agreed, and I started to begin to address this later in the post. I guess you don't always read to the end before starting to reply? (But then I don't go back and xref earlier comments against later ones in the same post...)

[sniping stuff so that we can hopefully move onto describing type generators]
> Since we know the catalog must describe type generators and that the
> description of a type generator must include a way to derive specific
> instantiated types, why not move onto describing type generators? We already
> know that if one can instantiate a relation literal from the description of
> a type generator, one can describe the set of values of the instantiated
> type.

Agreed.

> One can then describe a type as a type generator with no (free)
> parameters.

Indeed. I hope that a type generator will be a 'parameterized RE, so that one with no free parameters is then just a RE that defines a relation literal.

[snip]
> > Humm. So you would be happy to leave the definition of which poss rep
> values
> > are equal to the implementation of the '=' operator for the type?
>
> Of course. The '=' operator already defines it at the logical level.
>
> > For one that would be a pain for user defined types. Any type with > 1
poss
> > rep would require users to move outside of the logical environment to
define
> > how the poss reps equate.
>
> I disagree. At the logical level, users simply apply the '=' operator.
>
[warning, lots of non type generator stuff follows ;-)]

D&D distinguish between "type definers" and "type implementers" in RM prescription #4.

They give an example of implementing using either "highly protected operators not part of D", or genuine D code. The example is the selector for type POINT that take a polar poss rep and returns the equivalent Cartesian poss rep. It is via such selectors that they define (and optionally implement) equality between poss reps. It is unclear (to me) whether they need the implementations to preserve the 1:1 between the pos rep values. Certainly by defining equality via selectors, and so not explicitly enforcing the 1:1, they seem to leave that option wide open. Maybe they did that on purpose or maybe they were just skipping over the issue.

> > I can't see the point, especially as my scheme - with just one type
literal
> > for all poss reps (rather than one per pos rep as you have assumed) -
allows
> > equality to be very easily defined in the logical environment.
>
> I disagree. Such a relation literal would not be easy to define without the
> '=' operator, and describing it as an expression would necessarily expose
> the internal implentation of the '=' operator.

The internal implementation is an optimisation (i.e. physical issue). The issue is, if we allow a logical implementation of '=' (as D&D do allow via selectors written in D), should we do it in such a way as to imply that it must maintain the 1:1 between poss reps? The related issue is whether the DBMS can enforce that it does without actual evaluating every single case (by, for example fully generating the type literal).

I think that if the system cannot guarantee (or take on absolute trust) that two poss reps are 1:1, then the system should not allow them to both be poss reps. I.e. if we cannot guarantee that both Cartesian and Polar are 1:1 then we pick just one as the single poss rep, and relegate the other to plain operators that don't have a 1:1 relationship. Humm, that makes me think that the result type of any function that has an 1:1 relationship with a type (what's the technical term for that), could be used as a poss rep for that type.

So my conclusion is that catalog representation of types does not matter just so long as all pos rep values are (somehow) constrained to be 1:1.

[snip]
> > I see what you are saying, but because I'm not willing to leave the
definition
> > of '=' for the poss reps of a type to the physical arena
>
> The internal implementation of the '=' comparison is internal and an
> implementation issue. Do you propose to similarly expose the internal
> implementation of the SQRT or the LOG operation?

I would certain allow and encourage operations to have a defining algorithm expressed as a relational expression over the components of the poss rep of the operand types, but as I said later on, I would for pragmatic reasons allow these to be optional, and be implemented by trusted operators outside of the algebra if coding relaionally was way too much hassle. Similarly, I would (now) allow the '=' between poss reps to be an external function, but those would have to be very trusted functions. Humm, it does mean I'll need a catalog relation that at least lists the name of all known external functions. I'd still have a RE to define a relation literal for each function however...

> >, having separate type
> > literals for each poss rep is not a valid normalisation of my single type
> > literal. So, for me, it's not a design choose.
>
> I do not believe any single type literal exists. One needs a literal for
> each possrep and another literal for operations at the very minimum.
>

One thing my scheme is lacking is that I only need to define expressions one-way. I.e. if I define Polar points as a RE over Cartesian points, then I have not expressed the reverse operator. Practically the DBMS would need to fully evaluate the literal if a request to return a cartesian point given a polar point. That would not be good.
That rather implies that I would want 2 RE's to define my type literal, or have 2 type literals. Lets see, here with two RE's both defining the *same* literal (and with a KEYS spec as a 'post condition')

RE1

SELECT
    x.i as X,
    y.i AS Y,
    SQRT( x.i ** 2 + y.i ** 2) as R
    y.i / x.i as THETA
FROM INTEGER x, INTEGER y
KEYS( {X,Y}, {R, THETA} ) RE2:

SELECT

    l.l * COS( a.a) as X,
    l.l * SIN( a.a) as Y,
    l.l as R,
    a.a as THETA

FROM LENGTH l, ANGLE a
KEYS( {X,Y}, {R, THETA} ) OK, I quite like that. And I'm still avoiding these selector or "domain value identifier" concepts (or at least keeping them as shorthands for relational expressions)

If a type has 3 poss reps P1, P2 & P3. Then I could possibly see the benifit of 3 literals with typle types of say

(<p1:P1>,<p2:P2>)
(<p1:P1>,<p3:P3>)
(<p2:P2>,<p3:P3>)

That would make adding extra REs that define the same type literals a tad simpler. Not a big win as far as I can see though

> > My design does force the choose of canonical forms for poss
> representations
> > that have more than one value representing the same type value. I see
> this
> > forcing of canonical forms as very much a good thing.
>
> As I said above, such types probably have important and desirable
> properties, but it is a matter of design to choose to have those properties.

I know we disagree, but I say that poss reps must be 1:1. If not, then don't make them poss reps, just make them operators. That, in fact, is proposed my solution to the Point example. I would define it with just one poss representation. Or alternatively have one type with the pos rep of Polar, one type with the poss rep of Cartesian, and one common subtype with two poss reps in the cases where the poss reps are 1:1 (like at the origin).

> > > > > THE_A(var1) := "NewAValue";
> > > > >
> > > > > is a shorthand for
> > > > >
> > > > > var1 := P1( A := "NewAValue", B := THE_B( var1 ) );
> > > >
> > > > Maybe, but I want to see the P1() function above be cast in purely
> > > > relational terms.
> > >
> > > It is already. It is a domain value identifier.
> >
> > Did you just make that term up? TTM would call it a [type] selector
> > [literal] - I think.
>
> What is a [type] selector [literal] if not a way to identify a value of a
> domain?
>
>
> > When I said "in purely relational terms", I meant "as a relation or an
> > operator of the algebra"
>
> In other words, you meant in any relational term except domain. I am happy
> for a domain to remain a domain.

Ah, well I'm having fun trying to eliminate some of the concepts in and around domains.
I do want a domain to be ultimately little more that a set of values. I'm happy to just have domains as a bunch of shorthands over the algebra. BTW, in a post on some other topic, you said something about knowing where to stop is the key to being a good designer. I'm guessing that you might be thinking I'm loosing that ability here? ;-)

> > > > Functions are just shorthand for some <COMPOSE> with a relation
> > > > literal. In this case, the relation literal for the function is the
> poss
> > > > rep literal for the type.
> > >
> > > The relation literal is the <COMPOSE> of each of the component relation
> > > literals restricted by the constraints. Of course, <COMPOSE> degenerates
> to
> > > cartesian product when none of the tuples share attributes so this just
> > > restates an earlier observation. Unlike functions, a selector's
> parameters
> > > are the result so, in a sense, the relation describing the function of
> every
> > > selector is just TABLE_DEE.
> >
> > You lost me there.
>
> See the example in TTM using TWO_AND_TWO and PLUS. Consider a similar
> situation where we replace PLUS with SELECTOR ::= TABLE_DEE and TWO_AND_TWO
> with any relation. Apply the <COMPOSE> operation as it is applied in the
> example and mentally picture the result.

Nope, still not got it. Is it relevant anyway?

> > [snip]
> > > > In my scheme the literal is derived from a relational expression. The
> ER
> > > > defines the type. What is it about types that you think cannot be
> defined
> > > > by a (set of) REs?
> > >
> > > Operations and parameterized types.
> >
> > As per TTM's 'treating operators as relations', I claim operations can be
> > fully represented as relation literals or relational expressions that
> define
> > relation literals.
>
> But that is a different relation literal than the literal that describes a
> possrep. If we have two relation literals, one describing the set of values
> and another describing the set of operations, we have a description of one
> instantiated type.

I refer you again to OO-VSS #2

> We still lack a description for a type generator,

Patience

> and we
> have violated the information principle because the constraints are not
> represented as values in relations.

In my summary at the top, I make sure they are values in relations.

However, I will note that having constraints on literals is strictly redundant. They are all subsumed by the constraint the the value of the literal is just the value of the literal. However, it is useful to declare that a literal has say a key (even if that could be derived by the dbms), because it allows the dbms to check that the user is not contradicting himself. Also, in the case where we are trusting an external protected operator, the constraints define what about the operator that we are trusting I guess.

> > I will admit that we do need computational completeness
> > (to which I now assume means having recursion in the algebra) to be able
> to
> > represent many operators as REs.
> >
> > For example the operator INTEGER_PLUS can be specified quite easily with a
> > recursive relational expression if one of the poss reps of INTEGER is the
> > relation of tuples <index:INTEGER, bit_value:BIT_1> and if we have the
> bit
> > operators AND, OR and NOT defined...
> >
> > Now, I will (under duress) admit that coming up with (often recursive)
> > relational expressions that are equivalent to the huge number of known
> > procedural and functional algorithms that are currently used to define
> (and
> > implement) type operators, would be no small undertaking. And as such, I
> might
> > accept externally defined functions as an pragmatic extension of the core
> > 'relations from the ground up' logical catalog.
>
> To provide a complete logical description of an operation, we need only
> describe the operator (in the original sense of the symbol representing the
> operation), the names and types of the parameters, the type of the result,
> the preconditions and the postconditions.

Yep, and all that can be done with a RE that defines a relation literal, or a RE that references some external highly protected function, but that still defines a relation literal. In both cases though (and particularly the second), I will concede that the constraints (aka pre & post conditions) would be held in a separate catalog relation (to the extent that the dbms cannot derive all 'interesting' constraints directly from the literal).

> In fact, the types of the
> parameters and the type of the result are redundant because those are
> preconditions and postconditions as well.

Strictly that would be a untyped language. We don't what to go there.

> My big question is: Is there one parameterized representation that will
> describe both "RETURNING" and "UPDATING" operations?

Do I want update operators for anything other than the database variable? Not sure. They are not a *core* requirement I think. However, to try an (non parameterized) example:

In my purest way, updating operations are just selections from the type literal.
So the OPERATOR REFLECLT (P POINT) UPDATES P; would be specifed as say

P :=(

    SELECT TL.Point
    FROM POINT as TL
    WHERE TL.X = - P.X AND TL.Y = - P.Y) // where POINT is the type literal (with type say {<Point:POINT>, <X:INTEGER>, <Y:INTEGER>} )
// and '-' is a shorthand for a compostion to the INTEGER_MINUS operator literal
// oh, and '=' is a shorthand for a join to the INTEGER type literal.

P.S. does my mixing of SQL and D syntax distract? I really should learn to get fluent in D...

> > Paramertised types (aka Type Generators - Yes?), will require relational
> > expressions that can generate other relation expressions.
>
> A relational expression is a derived relation and doesn't really generate
> anything. If you meant to say the catalog must represent sufficient data
> about a type generator from which to instantiate a relation literal for each
> possrep of an instantiated type and some representation of the operations on
> the instantiated type, I agree.

Agreed. How about a RE *derives* a Relation? And a parameterised RE when composed with parameter values *derives* a RE that can then derive a Relation literal.

> I assume that the representation of an operation on an instantiated type may
> be parameterized. One operand and the return type of the JOIN operation for
> a specific relation type will remain parameterized.
>
>
> > Hopefully we can get
> > round to tackling this one later.
>
> Why wait?

True.

[snip]
> > As a *very* rough and ready
> > exposition, I suggest we would want something like.
> >
> > INSERT INTO Type_Generators(Type_Gen_Name, Type_Gen_Exp) VALUES
> > ( "Interval"
> > , "SELECT i.i as Bound_Lower, j.i as Bound_Upper
> > FROM [ORDERED_TYPE_NAME] i, [ORDERED_TYPE_NAME] j
> > WHERE i.[ORDERED_TYPE_NAME] <= j.[[ORDERED_TYPE_NAME]]"
> > )
> >
> > CREATE DATABASE VIEW Generated_Types AS
> > // generate type literal relations for each generated interval type
> > EVALUATE_AND_EXPLODE(
> > SELECT SUBSTITUTE(Type_Gen_Exp, "ORDERED_TYPE_NAME", Type_Name)
> > FROM Type_Generators, Types
> > WHERE Types.Is_Ordered_Type = True
> > )
> > UNION // now populate the relation name catalog table for the generated
> > types
> > EVALUATE_AS_RELATION (
> > SELECT "Interval" || Type_Name AS Type_Name,
> > RELATION({ <Bound_Lower:[ORDERED_TYPE_NAME]>,
> > <Bound_Upper:[ORDERED_TYPE_NAME]> }) AS Tuple_Type
> > FROM Types
> > WHERE Types.Is_Ordered_Type = True
> > )
> >
> >
> > !!
>
> The above seems to violate the information principle to me. Why are the
> parameters not represented using values in relations? Why are the
> constraints not represented using values in relations? Why are th possrep
> components not represented using values in relations? etc. Why are all these
> things encoded in a character string?

I did say it was a rough exposition. The strings are just values of the string poss reps of the type PARAM_RE. I should have used selectors. PARAM_RE is the set of all possible paramerterised relational expressions. I.e. the set of valid REs where all combinations of component REs have been replaced with all possiabe parameter names.
The constraints (on the derived relation literals) were ignored, as I assumed the DBMS would derive any interesing ones from the type literal RE. Lets see:

UNION // now populate the constraints catalog for the generated type literals
EVALUATE_AS_RELATION (

     SELECT    TYPE_NAME("Interval" || Type_Name) AS Relation_Name,
        Booleen_Exp("CHECK Bound_Lower <= Bound_Upper") AS Constraint
     FROM    Types
     WHERE    Types.Is_Ordered_Type = True
    UNION
     SELECT    TYPE_NAME("Interval" || Type_Name) AS Relation_Name,
        Booleen_Exp("KEY( {Bound_Lower, Bound_Upper}) ") AS Constraint
     FROM    Types
     WHERE    Types.Is_Ordered_Type = True
)

BTW I think I'm hitting a syntax barrier..... help! At the least I need a form for parameter replacement (to help me answer your first question). Can you expand on what you said just before "why wait"?

> > > > > The catalog representation must suffice to derive the appropriate
> > > relation
> > > > > literal, of course.
> > > >
> > > > Agreed. Including the rule that each poss rep for a type has the same
> > > number
> > > > of values, and has one value per type value.
> > >
> > > I disagree. Whether there is a 1:1 correspondence between values of
> > > implemented types is a choice of the type designer.
> >
> > My model requires such a correspondence and so, I note, does the
> Manefesto. .
>
> I've got the first edition handy: "However, we see this problem as an
> implementation problem merely; it is analogous to problems that have always
> been with us regarding such matters as the largest and smallest numbers that
> can be repesented in different floating point formats." (Pg. 115 Para. 2)
>
> Did they change their minds for the second edition?

I'll need to look that up. I doubt they changed their minds, but I not sure that they are consistent. I.e. I'm sure they require a 1:1 correspondence between poss rep values, but they then keep giving the POINT example without noting that the number of values in each rep must be the same. When they look at type constraints, they note the constraints on different poss reps must logically be the same., but no mention of the values of the reps themselves.

[snip]
> I guess one thing that I am aiming for is to make D logically unnecessary!!
> > (at some useful level of abstraction). I.e. that every D statement is but
> > a short hand for a statement of the A algebra.
> >
> > I understand that A does not support say 'assignment', so more correctly
> > I'm looking at:
> > D is_a_subset_of A + the database variable + database assignment +
???
>
> TTM has an entire list of "issues not covered" that one would have to add to
> the list above.

Fair enough. Of course one of the things we are trying to do here is to cover one of those issues. Namely, the logical form of the catalog.

On the nature of recursion and computational completeness in relational algebra, this is one subject where we would need to leave D&D behind and venture into the Alice book (which, I'm beginning to have more respect for...). http://www.amazon.com/exec/obidos/ASIN/0201537710/

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Fri Jul 11 2003 - 17:08:41 CEST

Original text of this message