Re: Representing Types in the Catalog (was Re: Distributed foreign keys)

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Thu, 10 Jul 2003 18:57:44 +0100
Message-ID: <bek9jh$25k0$1_at_gazette.almaden.ibm.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:YY0Pa.509$5o.70409340_at_mantis.golden.net...
> "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
> news:bef3uj$1vlg$1_at_gazette.almaden.ibm.com...
> > OK, although I think we are suggesting roughly equivalent representations.
>
> I don't think we are. I am suggesting a representation from which we can
> derive the relational literal, but we cannot necessarily derive the
> representation from the relational literal.

I'm claiming that we can, so ideally I'd like a counter example.

I'm claiming that a relational expression (that defines a type literal) is sufficient to define all the core aspects of a type. This includes enumerating all the values of a type - specifying all the different possible representation values.
I do admit that separate expressions would be need to cater for representations with non-canonical forms, and I'm sure that there is 'other stuff' (esp when we come to type generators) that will need more/different relational expressions forms..

> The relational literal only
> describes the set of values in one possible representation. A type is more
> than a set of values.
>
>
> > I guess I am bundling the components and the constraints into one
> relational
> > expression. You suggest holding the components separately from the
> > constraints, then if we want a literal, building a single RE by cross
> > producing the components and restricting the result with the constraint.
> > Sounds OK for simple types, but how would you define how two poss reps for
> > a single type equate?
>
> I am not sure I would--the equality operator already defines it at the
> logical level. If we have two possible representations, P1 & P2, with
> corresponding relational literals P1_LIT and P2_LIT, we can determine the
> equal values in P2 for a P1 literal by:
>
> P2_LIT WHERE P2(/*...components from P2_LIT...*/) = P1(/*...literal
> components...*/)
>

Humm. So you would be happy to leave the definition of which poss rep values are equal to the implementation of the '=' operator for the type? For one that would be a pain for user defined types. Any type with > 1 poss rep would require users to move outside of the logical environment to define how the poss reps equate.
I can't see the point, especially as my scheme - with just one type literal for all poss reps (rather than one per pos rep as you have assumed) - allows equality to be very easily defined in the logical environment.

[snip]
> Since we are talking about a type and not a variable, the reference
> statement should declare a type:
>
> TYPE Boolean
> POSSREP ( STRING_5 ) /* possrep and component name default to Boolean */
> POSSREP Bit ( BIT_1 ) /* component name defaults to Bit */
> CONSTRAINT ( MATCHES(Boolean, RegExp('True|False') ) ) /* Boolean refers
> to component of the Boolean possrep */
> ;

OK. That's the TTM way - I agree. It's just not the way I think it should be done (or maybe the TTM way could be a shorthand for my way - I'ld certainly be looking for shorthand suggestions in my scheme).

> > If a type has more than one poss rep, then the values of those poss
> > reps need to be explicitly linked (in the current scheme by sharing the
same
> > tuple in the type literal)
>
> You assume that each possrep is a candidate key of the equals relation, and
> I do not. I assume only that the equals relation is 1NF. While types whose
> possreps are all candidate keys of the equals relation probably have
> important and desirable properties just as relations in 5NF have important
> and desirable properties, I think it is a matter of design to choose those
> properties just as it is a matter of design to choose 5NF.

I see what you are saying, but because I'm not willing to leave the definition of '=' for the poss reps of a type to the physical arena, having separate type literals for each poss rep is not a valid normalisation of my single type literal. So, for me, it's not a design choose.

My design does force the choose of canonical forms for poss representations that have more than one value representing the same type value. I see this forcing of canonical forms as very much a good thing.

> > > THE_A(var1) := "NewAValue";
> > >
> > > is a shorthand for
> > >
> > > var1 := P1( A := "NewAValue", B := THE_B( var1 ) );
> >
> > Maybe, but I want to see the P1() function above be cast in purely
> > relational terms.
>
> It is already. It is a domain value identifier.

Did you just make that term up? TTM would call it a [type] selector [literal] - I think.

When I said "in purely relational terms", I meant "as a relation or an operator of the algebra"

> > Functions are just shorthand for some <COMPOSE> with a relation
> > literal. In this case, the relation literal for the function is the poss
> > rep literal for the type.
>
> The relation literal is the <COMPOSE> of each of the component relation
> literals restricted by the constraints. Of course, <COMPOSE> degenerates to
> cartesian product when none of the tuples share attributes so this just
> restates an earlier observation. Unlike functions, a selector's parameters
> are the result so, in a sense, the relation describing the function of every
> selector is just TABLE_DEE.

You lost me there.

[snip]
> > In my scheme the literal is derived from a relational expression. The ER
> > defines the type. What is it about types that you think cannot be defined
> > by a (set of) REs?
>
> Operations and parameterized types.

As per TTM's 'treating operators as relations', I claim operations can be fully represented as relation literals or relational expressions that define relation literals. I will admit that we do need computational completeness (to which I now assume means having recursion in the algebra) to be able to represent many operators as REs.

For example the operator INTEGER_PLUS can be specified quite easily with a recursive relational expression if one of the poss reps of INTEGER is the relation of tuples <index:INTEGER, bit_value:BIT_1> and if we have the bit operators AND, OR and NOT defined...

Now, I will (under duress) admit that coming up with (often recursive) relational expressions that are equivalent to the huge number of known procedural and functional algorithms that are currently used to define (and implement) type operators, would be no small undertaking. And as such, I might accept externally defined functions as an pragmatic extension of the core 'relations from the ground up' logical catalog.

Paramertised types (aka Type Generators - Yes?), will require relational expressions that can generate other relation expressions. Hopefully we can get round to tackling this one later.

> > > It does not seem feasible to parameterize catalog literals, but it is
> easy
> > > enough to parameterize other representations of a type. Using the
> > > parameterized representation, the dbms can instantiate a literal when
> > > provided arguments for the parameters.
> >
> > Agreed and I suggest "parameterized" REs as such an representation. (Well,
> OK,
> > I've not yet defined what the basic RE type looks like, never mind
> > parameterized REs, but that's to-do next).
> >
> >
> > > > All in all, it looks to me like a type system that does have some
> useful
> > > > properties, and solves some of the problems that I am (and I think
> that
> > > > databases are) interested it.
> > >
> > > I am less interested in some of the properties you want, but I agree a
> > > relation literal can represent the set of values of a possible
> > > representation easily enough. However, I do not think a relation literal
> is
> > > the appropriate way to represent a type in the catalog.
> >
> > Again, the literal is (most times) derived. From REs such as:
> >
> > "SELECT i.i as Bound_Lower, j.i as Bound_Upper
> > FROM INTEGERS_0-2^32 i, INTEGERS_0-2^32 j
> > WHERE i.i <= j.i"
>
> I want to add another level of operational indirection to support type
> generators. The literal is derived from an expression is instantiated
> operationally from some parameterized representation and a set of parameter
> arguments.

I'm not fond of the term "instantiated operationally" - i.e. I'm not 100% sure what it means, but I agree with the sentiment. As a *very* rough and ready exposition, I suggest we would want something like.

INSERT INTO Type_Generators(Type_Gen_Name, Type_Gen_Exp) VALUES ( "Interval"
, "SELECT i.i as Bound_Lower, j.i as Bound_Upper

    FROM [ORDERED_TYPE_NAME] i, [ORDERED_TYPE_NAME] j     WHERE i.[ORDERED_TYPE_NAME] <= j.[[ORDERED_TYPE_NAME]]" )

CREATE DATABASE VIEW Generated_Types AS
  // generate type literal relations for each generated interval type EVALUATE_AND_EXPLODE(
    SELECT SUBSTITUTE(Type_Gen_Exp, "ORDERED_TYPE_NAME", Type_Name)     FROM Type_Generators, Types
    WHERE Types.Is_Ordered_Type = True )
UNION // now populate the relation name catalog table for the generated types
EVALUATE_AS_RELATION (

    SELECT     "Interval" || Type_Name AS Type_Name,
       RELATION({ <Bound_Lower:[ORDERED_TYPE_NAME]>,
        <Bound_Upper:[ORDERED_TYPE_NAME]> }) AS Tuple_Type
    FROM Types
    WHERE Types.Is_Ordered_Type = True )

!!

> > > The catalog representation must suffice to derive the appropriate
> relation
> > > literal, of course.
> >
> > Agreed. Including the rule that each poss rep for a type has the same
> number
> > of values, and has one value per type value.
>
> I disagree. Whether there is a 1:1 correspondence between values of
> implemented types is a choice of the type designer.

My model requires such a correspondence and so, I note, does the Manefesto. . What is the benifit of allowing anything other? Why would it be good it a type with 2 poss reps such that

    not all type values had a poss rep value for both reps     some type values have no a poss rep value!!     etc

[snip]
> > > > Agreed. I wonder if <TCLOSE> should be replaced with generalised
> > > recursion?
> > >
> > > TTM has recursion as a relational prescription.
> >
> > True. Which must therefore be an extension of their algebra (assuming that
> > <TCLOSE> cannot be used to do generalised recursion), unless they say that
D
> > does the recursion rather than the algebra.
>
> "Recursion shall be permitted in relational expressions"
>
> That sounds like it is in the D algebra even if it is not in the A algebra.
>
>
> > Even more reason to replace <TCLOSE> with generalised recursion.
>
> Tutorial D includes both.
>
>

I guess one thing that I am aiming for is to make D logically unnecessary!! (at some useful level of abstraction). I.e. that every D statement is but a short hand for a statement of the A algebra.

I understand that A does not support say 'assignment', so more correctly I'm looking at:

   D is_a_subset_of A + the database variable + database assignment + ???

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Thu Jul 10 2003 - 19:57:44 CEST

Original text of this message