Re: Distributed foreign keys (was Re: Category Types)

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Mon, 7 Jul 2003 20:12:34 +0100
Message-ID: <becgrp$2h7m$1_at_gazette.almaden.ibm.com>


"Bob Badour" <bbadour_at_golden.net> wrote in message news:sF5Oa.379$rr1.53875807_at_mantis.golden.net...
> "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
> news:beaa1d$1atm$1_at_gazette.almaden.ibm.com...
> > Directly as a relation literal, or indirectly as a relational expression
> over
> > type literals used in the type's possible representations.
>
> Since a literal is an expression, we can simplify the above to just a
> relational expression.

True, and as they are constants (until we consider schema evolution) they might as well all be view constants in the database catalog.

> I am not satisfied with that, though, because how do
> we deal with a 64 or 128 or 256 bit integer? A literal would be large. Do we
> represent it as the cartesian product of 64 or 128 or 256 relation literals
> each with a range over boolean?

Not sure, but I do absolutely want to ultimately ground all type possible reps in explicit BIT representations.

> Perhaps it makes sense to have a boolean type and a set of type generators?

Does SINGLE_BIT == boolean ?
I suggest not. That would, for me, make a BIT type generator (probably generating bit stings of both fixed and variable length) the single type from which all others would be built.

> > > The cardinality of values in the full range of an interval type or
> > > of a tuple type or of a relation type could be huger than huge. I don't
> > > think that makes much sense.
> >
> > Most types would be represented as a relational expression. E.g. the type
> > consisting of all intervals of positive integers less than 2^32 could be
> > declared to the DMBS by inserting the following relational expression and
> > relational expression name into the DBMS catalog.
> >
> > Assuming the type INTEGERS_0-2^32 is already know to the DBMS (say as a
> > relational expression over some inbuilt BIT type?), then
> > INTEGER_0-2^32_INTERVALS could be defined by
> >
> > INSERT INTO RE_TYPES ( Type_Name, Type_RE_Defn)
> > VALUES ( "INTEGER_0-2^32_INTERVALS"
> > , "SELECT i.i as Bound_Lower, j.i as Bound_Upper
> > FROM INTEGERS_0-2^32 i, INTEGERS_0-2^32 j
> > WHERE i.i >= j.i"
> > )
>
> Doesn't seem to match my concept of interval.

Why not? Obviously I've only defined the set of values for the type and none of it's operators (or other "properties"), but are those not the values you would expect in such an integer interval type?

> That's more like a contiguous interval type.
>
>
> > If a user then does something like say
> > SELECT AVG(Bound_Lower)
> > FROM INTEGER_0-2^32_INTERVALS
> >
> > and the DBMS decides it will need to fully generate the type relation to
> > compute the result, then I say the DBMS "safety system" (i.e. the time
> > estimate from the query compiler) should restrict such a query from
> actually
> > running (unless the user has the time spare to wait...)
>
> That assumes the cost of the "safety system" is low and tunable to prohibit
> costly errors while allowing most costly critical queries.

Agreed. I suggest existing DBMSes already have reasonably low in cost query cost estimators, and that future advances will allow cost estimates to be revised during the run of a query to make up for initial bad estimates.

>A system that
> allows an error to run for a long time before giving up is not particularly
> desirable, and neither is a system that allows a critical costly query to
> run almost until it is finished. Of course, the safety system might end up
> being some dba killing a process manually.

> While I like the idea of having relation literals for the ranges of some
> types, I don't necessarily want them for all types, and I don't want to
> automatically reference one through the name of the type. I would like to
> have a type Employee and a named relation Employee without any prior
> assumptions about the relation.

So that would just be a name then.?!

> > The same "safety system" argument applies, for me, to the <OR> and <NOT>
> > operators of D&D's "A" algebra. I would like to see them supported in the
> > database language and leave the computational difficulties to the safety
> > system and the query estimator.
>
> While I find A interesting in terms of formalizing relational operations as
> shorthands using minimal operators, I have not thought of any pressing need
> for A in an actual dbms. Why would you like to see them implemented?

Well I do want type literals one way or another. If I had <NOT> then simply <NOT>ing an empty relation will give be it's type literal. Here is an argument for Integer enumeration, I just want it for all types. http://www.dbazine.com/tropashko2.html

I guess the classic example would so that we could take the DATE dimension table, that most every data warehouse schema has, and do it properly (so to speak).

> > > The system catalog must represent the name of the type, the subtypes,
> the
> > > supertypes, the possible representations and the constraints. I assume
> we
> > > can treat the operations separately.
> > >
> > > If the type is parameterized, how does the above change? If the type has
> an
> > > arbitrary number of parameters, how does the above change? For instance,
> how
> > > does one describe types like tuples and relations?
> >
> >
> > The tuple and relation type generators are surly defined by the DBMS
> outside
> > of the database.
>
> You do not think the dbms should self-describe its builtin types?

No, I do agree with that. Just not sure of a reasonable way to represent type generators yet.

> Does that
> not limit the catalog as a vehicle of discovery for users?
>
>
> > My scheme might define particular, named tuple and relation
> > types using e.g.
> >
> > The binary tuple type
> > TUPLE_X:INTEGER-Y:INTEGER
> > could be defined as
> > SELECT i.i as X, j.i as Y FROM INTEGERS i, INTEGERS j
> >
> > The relation type
> > RELATION_TUPLE_X:INTEGER-Y:INTEGER
> > could be defined using a relation valued attribute (if you will excuse the
> > crap syntax) as
> > TABLE(POWER_SET(
> > SELECT i.i as X, j.i as Y FROM INEGERS i, INTEGERS j
> > ))
> >
> > ?!
>
> If the dbms has type generators, I think the catalog must describe
> them--especially, if the dbms is extensible by adding user-defined type
> generators.

Suggestions are welcome.

[snip]
>
> > I'm not saying that there are no logical differences (at a certain level
> of
> > abstraction), I'm just saying that they can be all represented as
> relations,
> > so maybe we should.
>
> Hmmm, that's interesting. The preconditions of an operation are constraints
> on on the parameter attributes and the post conditions of an operation are
> constraints on the result attribute. I'll think about that some more. What
> do we do with the parts of operations requiring a turing-equivalent
> language?

Allow an extension of the algebra to make it Turing equivalent I (naively) guess. How far to being Turing equivalent does adding <TCLOSE> make "A" ?

> I am still having problems with types though. It just doesn't seem to me
> that types are relations. In fact, it seems to me that types are not
> anything like relations.

At the least would you agree that some types are rather like relations? All those enumerated, 'code' types that we create for, and in our databases. They are often little more than a relation are they not?

[snip]
> > > Drop the names of relations? How does one refer to them without names?
> >
> > By their tuple type. Obviously that does mean enforcing POOD however.
>
> That seems like a complex way to refer to a relation compared to using a
> name.

Wouldn't deny that. Relation names work OK as shorthands for tuple types, it's just that not all tuple tpyes should need them and some might have more than one...

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Mon Jul 07 2003 - 21:12:34 CEST

Original text of this message