Re: Distributed foreign keys (was Re: Category Types)

From: Bob Badour <bbadour_at_golden.net>
Date: Mon, 7 Jul 2003 19:42:23 -0400
Message-ID: <guoOa.415$7M3.59788339_at_mantis.golden.net>

"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:becgrp$2h7m$1_at_gazette.almaden.ibm.com...
> "Bob Badour" <bbadour_at_golden.net> wrote in message
> news:sF5Oa.379$rr1.53875807_at_mantis.golden.net...
> > "Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message
> > news:beaa1d$1atm$1_at_gazette.almaden.ibm.com...
> > > Directly as a relation literal, or indirectly as a relational
expression
> > over
> > > type literals used in the type's possible representations.
> >
> > Since a literal is an expression, we can simplify the above to just a
> > relational expression.
>
> True, and as they are constants (until we consider schema evolution) they
> might as well all be view constants in the database catalog.

Okay, a light went on. The set of values of a type is a literal (constant) which one specifies with an expression. Because a function of a literal is itself a literal, the expression describing the literal could be as simple as a bunch of values or as complex as a deeply nested relational expression.

Since the expression is always a cartesian product, wouldn't it suffice to just have a set of components of a possible representation? One could derive the expression easily enough just by taking the product of each component.

> > I am not satisfied with that, though, because how do
> > we deal with a 64 or 128 or 256 bit integer? A literal would be large.
Do we
> > represent it as the cartesian product of 64 or 128 or 256 relation
literals
> > each with a range over boolean?
>
> Not sure, but I do absolutely want to ultimately ground all type possible
reps
> in explicit BIT representations.
>
> > Perhaps it makes sense to have a boolean type and a set of type
generators?
>
> Does SINGLE_BIT == boolean ?
> I suggest not. That would, for me, make a BIT type generator (probably
> generating bit stings of both fixed and variable length) the single type
from
> which all others would be built.

Are you suggesting that the boolean type is built as a BIT(1) ?

> > > > The cardinality of values in the full range of an interval type or
> > > > of a tuple type or of a relation type could be huger than huge. I
don't
> > > > think that makes much sense.
> > >
> > > Most types would be represented as a relational expression. E.g. the
type
> > > consisting of all intervals of positive integers less than 2^32 could
be
> > > declared to the DMBS by inserting the following relational expression
and
> > > relational expression name into the DBMS catalog.
> > >
> > > Assuming the type INTEGERS_0-2^32 is already know to the DBMS (say as
a
> > > relational expression over some inbuilt BIT type?), then
> > > INTEGER_0-2^32_INTERVALS could be defined by
> > >
> > > INSERT INTO RE_TYPES ( Type_Name, Type_RE_Defn)
> > > VALUES ( "INTEGER_0-2^32_INTERVALS"
> > > , "SELECT i.i as Bound_Lower, j.i as Bound_Upper
> > > FROM INTEGERS_0-2^32 i, INTEGERS_0-2^32 j
> > > WHERE i.i >= j.i"
> > > )
> >
> > Doesn't seem to match my concept of interval.
>
> Why not?

My bad. For some reason, I keep thinking of a set of intervals as an interval.

> Obviously I've only defined the set of values for the type and none
> of it's operators (or other "properties"), but are those not the values
you
> would expect in such an integer interval type?

I was actually thinking of a set of intervals. Sorry.

> > That's more like a contiguous interval type.
> >
> >
> > > If a user then does something like say
> > > SELECT AVG(Bound_Lower)
> > > FROM INTEGER_0-2^32_INTERVALS
> > >
> > > and the DBMS decides it will need to fully generate the type relation
to
> > > compute the result, then I say the DBMS "safety system" (i.e. the time
> > > estimate from the query compiler) should restrict such a query from
> > actually
> > > running (unless the user has the time spare to wait...)
> >
> > That assumes the cost of the "safety system" is low and tunable to
prohibit
> > costly errors while allowing most costly critical queries.
>
> Agreed. I suggest existing DBMSes already have reasonably low in cost
query
> cost estimators, and that future advances will allow cost estimates to be
> revised during the run of a query to make up for initial bad estimates.

But are they sufficiently accurate? The revision during the run sounds worrisome to me. I would hate a sixteen hour query to abort after six hours--especially if someone needs the results first thing in the morning.

> >A system that
> > allows an error to run for a long time before giving up is not
particularly
> > desirable, and neither is a system that allows a critical costly query
to
> > run almost until it is finished. Of course, the safety system might end
up
> > being some dba killing a process manually.
>
>
> > While I like the idea of having relation literals for the ranges of some
> > types, I don't necessarily want them for all types, and I don't want to
> > automatically reference one through the name of the type. I would like
to
> > have a type Employee and a named relation Employee without any prior
> > assumptions about the relation.
>
> So that would just be a name then.?!

Yes, a name is a name.

> > > The same "safety system" argument applies, for me, to the <OR> and
<NOT>
> > > operators of D&D's "A" algebra. I would like to see them supported in
the
> > > database language and leave the computational difficulties to the
safety
> > > system and the query estimator.
> >
> > While I find A interesting in terms of formalizing relational operations
as
> > shorthands using minimal operators, I have not thought of any pressing
need
> > for A in an actual dbms. Why would you like to see them implemented?
>
> Well I do want type literals one way or another. If I had <NOT> then
simply
> <NOT>ing an empty relation will give be it's type literal.
> Here is an argument for Integer enumeration, I just want it for all types.
> http://www.dbazine.com/tropashko2.html

If you have one (either range literals or <not>), you don't really need the other because you can derive it easily enough.

> I guess the classic example would so that we could take the DATE dimension
> table, that most every data warehouse schema has, and do it properly (so
to
> speak).
>
> > > > The system catalog must represent the name of the type, the
subtypes,
> > the
> > > > supertypes, the possible representations and the constraints. I
assume
> > we
> > > > can treat the operations separately.
> > > >
> > > > If the type is parameterized, how does the above change? If the type
has
> > an
> > > > arbitrary number of parameters, how does the above change? For
instance,
> > how
> > > > does one describe types like tuples and relations?
> > >
> > >
> > > The tuple and relation type generators are surly defined by the DBMS
> > outside
> > > of the database.
> >
> > You do not think the dbms should self-describe its builtin types?
>
> No, I do agree with that. Just not sure of a reasonable way to represent
type
> generators yet.

Me neither.

> > Does that
> > not limit the catalog as a vehicle of discovery for users?
> >
> >
> > > My scheme might define particular, named tuple and relation
> > > types using e.g.
> > >
> > > The binary tuple type
> > > TUPLE_X:INTEGER-Y:INTEGER
> > > could be defined as
> > > SELECT i.i as X, j.i as Y FROM INTEGERS i, INTEGERS j
> > >
> > > The relation type
> > > RELATION_TUPLE_X:INTEGER-Y:INTEGER
> > > could be defined using a relation valued attribute (if you will excuse
the
> > > crap syntax) as
> > > TABLE(POWER_SET(
> > > SELECT i.i as X, j.i as Y FROM INEGERS i, INTEGERS j
> > > ))
> > >
> > > ?!
> >
> > If the dbms has type generators, I think the catalog must describe
> > them--especially, if the dbms is extensible by adding user-defined type
> > generators.
>
> Suggestions are welcome.

I suggest changing the focus from using expressions to define the set of values as a relational literal to using expressions to describe the possible representations etc. From a description of the possible representations, one can derive or generate an expression to define the set of values. A parameterized type would use a parameterized expression for the possible representation etc.

> [snip]
> >
> > > I'm not saying that there are no logical differences (at a certain
level
> > of
> > > abstraction), I'm just saying that they can be all represented as
> > relations,
> > > so maybe we should.
> >
> > Hmmm, that's interesting. The preconditions of an operation are
constraints
> > on on the parameter attributes and the post conditions of an operation
are
> > constraints on the result attribute. I'll think about that some more.
What
> > do we do with the parts of operations requiring a turing-equivalent
> > language?
>
> Allow an extension of the algebra to make it Turing equivalent I (naively)
> guess. How far to being Turing equivalent does adding <TCLOSE> make "A" ?

I don't know. I strongly suspect full recursion support would get it there, though.

> > I am still having problems with types though. It just doesn't seem to me
> > that types are relations. In fact, it seems to me that types are not
> > anything like relations.
>
> At the least would you agree that some types are rather like relations?

A type is a set of values and a set of operations. If one derives tuples from a possible representation, the set of tuples corresponding to the set of values in any single possible representation is rather like a relation literal. But this relation literal describes only the set of values and only in one possible representation. It does not describe operations etc.

> All
> those enumerated, 'code' types that we create for, and in our databases.
They
> are often little more than a relation are they not?

That's certainly one possible representation for them.

> [snip]
> > > > Drop the names of relations? How does one refer to them without
names?
> > >
> > > By their tuple type. Obviously that does mean enforcing POOD however.
> >
> > That seems like a complex way to refer to a relation compared to using a
> > name.
>
> Wouldn't deny that. Relation names work OK as shorthands for tuple types,
it's
> just that not all tuple tpyes should need them and some might have more
than
> one...

The names I am thinking of are names of relation variables or relation valued database attributes, if you prefer, and not shorthands for types. A relation of employee number and employee's name has the same type as a relation of employee number and employee's manager's name, but they are different things. Received on Tue Jul 08 2003 - 01:42:23 CEST

This message: [ Message body ]
Next message: andrewst: "Re: modeling question"
Previous message: Paul Vernon: "Re: Distributed foreign keys (was Re: Category Types)"
Maybe in reply to: Paul Vernon: "Re: Distributed foreign keys (was Re: Category Types)"
In reply to Paul Vernon: "Re: Distributed foreign keys (was Re: Category Types)"
Next in thread: Paul Vernon: "Re: Distributed foreign keys (was Re: Category Types)"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message