Re: Representing Types in the Catalog (was Re: Distributed foreign keys)

From: Bob Badour <bbadour_at_golden.net>
Date: Thu, 10 Jul 2003 16:23:47 -0400
Message-ID: <5SkPa.581$1m2.74543232_at_mantis.golden.net>


"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:bek9jh$25k0$1_at_gazette.almaden.ibm.com...
> "Bob Badour" <bbadour_at_golden.net> wrote in message
> news:YY0Pa.509$5o.70409340_at_mantis.golden.net...
> > "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
> > news:bef3uj$1vlg$1_at_gazette.almaden.ibm.com...
> > > OK, although I think we are suggesting roughly equivalent
representations.
> >
> > I don't think we are. I am suggesting a representation from which we can
> > derive the relational literal, but we cannot necessarily derive the
> > representation from the relational literal.
>
> I'm claiming that we can, so ideally I'd like a counter example.

How do you parameterize a relation literal? If it is a literal, it is not parameterized. So for instance, what is the relation literal that describes "interval type generator" or "relation type generator" ?

> I'm claiming that a relational expression (that defines a type literal) is
> sufficient to define all the core aspects of a type.

Because a type comprises both a set of values and a set of operations, a relational expression does not suffice. The relation literal at best can describe the set of values. Because the catalog must describe not only types but type generators, a relation literal suffices only to describe one component of one instance of what the catalog must describe.

> This includes enumerating
> all the values of a type - specifying all the different possible
> representation values.

A relation literal can describe a set of values; I already agree on that point. A set of values does not describe a type and comes nowhere near describing a type generator.

> I do admit that separate expressions would be need to cater for
> representations with non-canonical forms, and I'm sure that there is
'other
> stuff' (esp when we come to type generators) that will need more/different
> relational expressions forms..

Since we know the catalog must describe type generators and that the description of a type generator must include a way to derive specific instantiated types, why not move onto describing type generators? We already know that if one can instantiate a relation literal from the description of a type generator, one can describe the set of values of the instantiated type. One can then describe a type as a type generator with no (free) parameters.

> > The relational literal only
> > describes the set of values in one possible representation. A type is
more
> > than a set of values.
> >
> >
> > > I guess I am bundling the components and the constraints into one
> > relational
> > > expression. You suggest holding the components separately from the
> > > constraints, then if we want a literal, building a single RE by cross
> > > producing the components and restricting the result with the
constraint.
> > > Sounds OK for simple types, but how would you define how two poss reps
for
> > > a single type equate?
> >
> > I am not sure I would--the equality operator already defines it at the
> > logical level. If we have two possible representations, P1 & P2, with
> > corresponding relational literals P1_LIT and P2_LIT, we can determine
the
> > equal values in P2 for a P1 literal by:
> >
> > P2_LIT WHERE P2(/*...components from P2_LIT...*/) = P1(/*...literal
> > components...*/)
> >
>
> Humm. So you would be happy to leave the definition of which poss rep
values
> are equal to the implementation of the '=' operator for the type?

Of course. The '=' operator already defines it at the logical level.

> For one that would be a pain for user defined types. Any type with > 1
poss
> rep would require users to move outside of the logical environment to
define
> how the poss reps equate.

I disagree. At the logical level, users simply apply the '=' operator.

> I can't see the point, especially as my scheme - with just one type
literal
> for all poss reps (rather than one per pos rep as you have assumed) -
allows
> equality to be very easily defined in the logical environment.

I disagree. Such a relation literal would not be easy to define without the '=' operator, and describing it as an expression would necessarily expose the internal implentation of the '=' operator.

> [snip]
> > Since we are talking about a type and not a variable, the reference
> > statement should declare a type:
> >
> > TYPE Boolean
> > POSSREP ( STRING_5 ) /* possrep and component name default to Boolean
*/
> > POSSREP Bit ( BIT_1 ) /* component name defaults to Bit */
> > CONSTRAINT ( MATCHES(Boolean, RegExp('True|False') ) ) /* Boolean
refers
> > to component of the Boolean possrep */
> > ;
>
> OK. That's the TTM way - I agree. It's just not the way I think it should
be
> done (or maybe the TTM way could be a shorthand for my way - I'ld
certainly be
> looking for shorthand suggestions in my scheme).

While a type has a set of values, a type is more than just a set of values.

> > > If a type has more than one poss rep, then the values of those poss
> > > reps need to be explicitly linked (in the current scheme by sharing
the
> same
> > > tuple in the type literal)
> >
> > You assume that each possrep is a candidate key of the equals relation,
and
> > I do not. I assume only that the equals relation is 1NF. While types
whose
> > possreps are all candidate keys of the equals relation probably have
> > important and desirable properties just as relations in 5NF have
important
> > and desirable properties, I think it is a matter of design to choose
those
> > properties just as it is a matter of design to choose 5NF.
>
> I see what you are saying, but because I'm not willing to leave the
definition
> of '=' for the poss reps of a type to the physical arena

The internal implementation of the '=' comparison is internal and an implementation issue. Do you propose to similarly expose the internal implementation of the SQRT or the LOG operation?

>, having separate type
> literals for each poss rep is not a valid normalisation of my single type
> literal. So, for me, it's not a design choose.

I do not believe any single type literal exists. One needs a literal for each possrep and another literal for operations at the very minimum.

> My design does force the choose of canonical forms for poss
representations
> that have more than one value representing the same type value. I see
this
> forcing of canonical forms as very much a good thing.

As I said above, such types probably have important and desirable properties, but it is a matter of design to choose to have those properties.

> > > > THE_A(var1) := "NewAValue";
> > > >
> > > > is a shorthand for
> > > >
> > > > var1 := P1( A := "NewAValue", B := THE_B( var1 ) );
> > >
> > > Maybe, but I want to see the P1() function above be cast in purely
> > > relational terms.
> >
> > It is already. It is a domain value identifier.
>
> Did you just make that term up? TTM would call it a [type] selector
> [literal] - I think.

What is a [type] selector [literal] if not a way to identify a value of a domain?

> When I said "in purely relational terms", I meant "as a relation or an
> operator of the algebra"

In other words, you meant in any relational term except domain. I am happy for a domain to remain a domain.

> > > Functions are just shorthand for some <COMPOSE> with a relation
> > > literal. In this case, the relation literal for the function is the
poss
> > > rep literal for the type.
> >
> > The relation literal is the <COMPOSE> of each of the component relation
> > literals restricted by the constraints. Of course, <COMPOSE> degenerates
to
> > cartesian product when none of the tuples share attributes so this just
> > restates an earlier observation. Unlike functions, a selector's
parameters
> > are the result so, in a sense, the relation describing the function of
every
> > selector is just TABLE_DEE.
>
> You lost me there.

See the example in TTM using TWO_AND_TWO and PLUS. Consider a similar situation where we replace PLUS with SELECTOR ::= TABLE_DEE and TWO_AND_TWO with any relation. Apply the <COMPOSE> operation as it is applied in the example and mentally picture the result.

> [snip]
> > > In my scheme the literal is derived from a relational expression. The
ER
> > > defines the type. What is it about types that you think cannot be
defined
> > > by a (set of) REs?
> >
> > Operations and parameterized types.
>
> As per TTM's 'treating operators as relations', I claim operations can be
> fully represented as relation literals or relational expressions that
define
> relation literals.

But that is a different relation literal than the literal that describes a possrep. If we have two relation literals, one describing the set of values and another describing the set of operations, we have a description of one instantiated type. We still lack a description for a type generator, and we have violated the information principle because the constraints are not represented as values in relations.

> I will admit that we do need computational completeness
> (to which I now assume means having recursion in the algebra) to be able
to
> represent many operators as REs.
>
> For example the operator INTEGER_PLUS can be specified quite easily with a
> recursive relational expression if one of the poss reps of INTEGER is the
> relation of tuples <index:INTEGER, bit_value:BIT_1> and if we have the
bit
> operators AND, OR and NOT defined...
>
> Now, I will (under duress) admit that coming up with (often recursive)
> relational expressions that are equivalent to the huge number of known
> procedural and functional algorithms that are currently used to define
(and
> implement) type operators, would be no small undertaking. And as such, I
might
> accept externally defined functions as an pragmatic extension of the core
> 'relations from the ground up' logical catalog.

To provide a complete logical description of an operation, we need only describe the operator (in the original sense of the symbol representing the operation), the names and types of the parameters, the type of the result, the preconditions and the postconditions. In fact, the types of the parameters and the type of the result are redundant because those are preconditions and postconditions as well.

My big question is: Is there one parameterized representation that will describe both "RETURNING" and "UPDATING" operations?

> Paramertised types (aka Type Generators - Yes?), will require relational
> expressions that can generate other relation expressions.

A relational expression is a derived relation and doesn't really generate anything. If you meant to say the catalog must represent sufficient data about a type generator from which to instantiate a relation literal for each possrep of an instantiated type and some representation of the operations on the instantiated type, I agree.

I assume that the representation of an operation on an instantiated type may be parameterized. One operand and the return type of the JOIN operation for a specific relation type will remain parameterized.

> Hopefully we can get
> round to tackling this one later.

Why wait?

> > > > It does not seem feasible to parameterize catalog literals, but it
is
> > easy
> > > > enough to parameterize other representations of a type. Using the
> > > > parameterized representation, the dbms can instantiate a literal
when
> > > > provided arguments for the parameters.
> > >
> > > Agreed and I suggest "parameterized" REs as such an representation.
(Well,
> > OK,
> > > I've not yet defined what the basic RE type looks like, never mind
> > > parameterized REs, but that's to-do next).
> > >
> > >
> > > > > All in all, it looks to me like a type system that does have some
> > useful
> > > > > properties, and solves some of the problems that I am (and I think
> > that
> > > > > databases are) interested it.
> > > >
> > > > I am less interested in some of the properties you want, but I agree
a
> > > > relation literal can represent the set of values of a possible
> > > > representation easily enough. However, I do not think a relation
literal
> > is
> > > > the appropriate way to represent a type in the catalog.
> > >
> > > Again, the literal is (most times) derived. From REs such as:
> > >
> > > "SELECT i.i as Bound_Lower, j.i as Bound_Upper
> > > FROM INTEGERS_0-2^32 i, INTEGERS_0-2^32 j
> > > WHERE i.i <= j.i"
> >
> > I want to add another level of operational indirection to support type
> > generators. The literal is derived from an expression is instantiated
> > operationally from some parameterized representation and a set of
parameter
> > arguments.
>
> I'm not fond of the term "instantiated operationally" - i.e. I'm not 100%
sure
> what it means, but I agree with the sentiment. As a *very* rough and ready
> exposition, I suggest we would want something like.
>
> INSERT INTO Type_Generators(Type_Gen_Name, Type_Gen_Exp) VALUES
> ( "Interval"
> , "SELECT i.i as Bound_Lower, j.i as Bound_Upper
> FROM [ORDERED_TYPE_NAME] i, [ORDERED_TYPE_NAME] j
> WHERE i.[ORDERED_TYPE_NAME] <= j.[[ORDERED_TYPE_NAME]]"
> )
>
> CREATE DATABASE VIEW Generated_Types AS
> // generate type literal relations for each generated interval type
> EVALUATE_AND_EXPLODE(
> SELECT SUBSTITUTE(Type_Gen_Exp, "ORDERED_TYPE_NAME", Type_Name)
> FROM Type_Generators, Types
> WHERE Types.Is_Ordered_Type = True
> )
> UNION // now populate the relation name catalog table for the generated
> types
> EVALUATE_AS_RELATION (
> SELECT "Interval" || Type_Name AS Type_Name,
> RELATION({ <Bound_Lower:[ORDERED_TYPE_NAME]>,
> <Bound_Upper:[ORDERED_TYPE_NAME]> }) AS Tuple_Type
> FROM Types
> WHERE Types.Is_Ordered_Type = True
> )
>
>
> !!

The above seems to violate the information principle to me. Why are the parameters not represented using values in relations? Why are the constraints not represented using values in relations? Why are th possrep components not represented using values in relations? etc. Why are all these things encoded in a character string?

> > > > The catalog representation must suffice to derive the appropriate
> > relation
> > > > literal, of course.
> > >
> > > Agreed. Including the rule that each poss rep for a type has the same
> > number
> > > of values, and has one value per type value.
> >
> > I disagree. Whether there is a 1:1 correspondence between values of
> > implemented types is a choice of the type designer.
>
> My model requires such a correspondence and so, I note, does the
Manefesto. .

I've got the first edition handy: "However, we see this problem as an implementation problem merely; it is analogous to problems that have always been with us regarding such matters as the largest and smallest numbers that can be repesented in different floating point formats." (Pg. 115 Para. 2)

Did they change their minds for the second edition?

> What is the benifit of allowing anything other?

Often, computability and representability. Rational numbers only approximate real numbers, but we use them anyhow.

> Why would it be good it a type with 2 poss reps such that
> not all type values had a poss rep value for both reps
> some type values have no a poss rep value!!
> etc

Utility, performance, accessibility--basically any pragmatic reason one can imagine.

> [snip]
> > > > > Agreed. I wonder if <TCLOSE> should be replaced with generalised
> > > > recursion?
> > > >
> > > > TTM has recursion as a relational prescription.
> > >
> > > True. Which must therefore be an extension of their algebra (assuming
that
> > > <TCLOSE> cannot be used to do generalised recursion), unless they say
that
> D
> > > does the recursion rather than the algebra.
> >
> > "Recursion shall be permitted in relational expressions"
> >
> > That sounds like it is in the D algebra even if it is not in the A
algebra.
> >
> >
> > > Even more reason to replace <TCLOSE> with generalised recursion.
> >
> > Tutorial D includes both.
> >
> >
>
> I guess one thing that I am aiming for is to make D logically
unnecessary!!
> (at some useful level of abstraction). I.e. that every D statement is but
a
> short hand for a statement of the A algebra.
>
> I understand that A does not support say 'assignment', so more correctly
I'm
> looking at:
> D is_a_subset_of A + the database variable + database assignment +
???

TTM has an entire list of "issues not covered" that one would have to add to the list above. Received on Thu Jul 10 2003 - 22:23:47 CEST

Original text of this message