Re: Distributed foreign keys (was Re: Category Types)

From: Bob Badour <bbadour_at_golden.net>
Date: Fri, 4 Jul 2003 19:31:45 -0400
Message-ID: <o1pNa.287$_T1.36633719_at_mantis.golden.net>


"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:be4aan$14u6$1_at_gazette.almaden.ibm.com...
> "Bob Badour" <bbadour_at_golden.net> wrote in message
> news:S74Na.244$6t7.30924927_at_mantis.golden.net...
> > I think a 'standard catalog' straw man project would be very
informative.
> > You suggested it. When are you going to kick it off? <g>
>
> Well the postings here are some kind of start. Although I'm not sure that
this
> is the best place to work on such matters. I'm thinking that some Wiki
pages
> could work better. It would allow the building up of such a straw man (in
> contrast to the tearing down that newsgroups are good at :-). Any ideas?
>
> The second thing is that when I look into matters of the catalog, things
start
> to get err, interesting.
> Not least because, for me at least, it gets tied up with the idea of a
> database algebra (amongst other things), which is a problem as the TTM
does
> not cover such a thing, and frankly would need not insignificant revisions
to
> do so.

I apologize for the length and meandering nature of the following. It represents a train of thought that has to back off a couple of dead-end sidings.

I am not sure how a database algebra would affect the catalog. Representing types, type generators and operations seems like the tricky part to me; although, I have a few ideas germinating. Type generators seem particularly tricky because they could instantiate a type that changes the known type of some value already in the database.

What are the operations on database values? If the type of a database value defines a universe of discourse and the tuples in the database value define the known facts in the universe, it would seem to make sense to have intersections, unions and differences.

Database union would be a little different from relational union. If one takes a set of named, typed relations as a possible representation of a database, the comparable set derived by taking the union of two database operands would have a union of the names and types from the operands. But the values of the named relations would contain new relation values obtained by taking the union of identically named relations. The catalog would have to allow the union operation such that the predicate for the derived database value equates to the logical OR of the predicates of the operands.

Similarly, database intersection would be a little different from relational intersection. The derived database value would have an intersection of the named, typed relations, but the values of the named relations would contain new relation values obtained by taking the intersection of identically named relations. The catalog would have to allow the intersection operation such that the predicate for the derived database value equates to the logical AND of the predicates of the operands.

If each operand contained a relation with the same name but different types, database union would fail due to an integrity violation but database intersect would succeed by eliminating the relation from the result. Does this match the logical AND requirement though?

I am not satisfied with the definition of database intersect above. Perhaps it would be better to define database intersect by taking the intersect of the relation names and the intersect of the predicates, but I am not sure how to describe the resulting relation values. One would have to project each operand relation onto the attributes that remain and then join or intersect corresponding projected values. Seems complicated too.

If we define intersect this way, we have no equivalent to join or to cartesian product. Instead of defining database union and database intersect, we should probably define database union and database join (or maybe all three?). I propose that database join would derive a database value from its operands by taking the union of the set of relation names where the type of each relation is derived by taking the ? logical AND ? of the predicates for the corresponding relation from each operand. The values of each named relation would contain the join of each operand's corresponding value or of TABLE_DEE for any operand lacking the correspondingly named relation.

Database join would correspond to logical AND or to product, and database union would correspond to logical OR.

Join (and either definition of intersect) would succeed when operands contain identically named relations with different types while database union would still result in an integrity violation. Is there a way to define union or database OR such that identically named operand relations can have different types?

Okay, I see some of the problems with database algebra and the catalog. The JOIN of the catalog must result in the logical AND of the predicates. It almost seems like the catalog relation describing the predicates must have an empty candidate key and an attribute for each predicate, but then the predicates must match for each union operand. If there is a single catalog relation for the database predicate, union compatible databases would have to have identical sets of named relation headings. If there is a separate predicate relation for each relation to allow the union of databases with non-overlapping relations, not only does the catalog violate POOD, but how does the catalog describe the predicate of these catalog relations?

I vote for an operational catalog--at least partially operational. Thus a PREDICATE operation would return a value describing the predicate of its operand where the operand is a database or a relation.

We need to find one or more possible representations for database values by which to define the database algebra. Would a set of named (possibly parameterized) types, a set of named relations and a set of signatured operations suffice? (The operation signature would include the name of the operation and the named, typed operands.)

I never even got around to considering database difference.

It strikes me that, if the type of a database value defines a universe of discourse and if the tuples represent known facts within that universe, one problem is we have similar concepts both for universes and for facts. For instance, one can conceive of the union of facts in an intersection of universes and an intersection of facts in a union of universes. Does that make any sense?

Logical OR would be a union of facts in a union (or some combination) of universes. Logical AND would be a join of facts in a join (or some combination) of universes. Difference would be a difference of facts, but in what universe? If A and B are database values, what is the universe of A MINUS B? Is it the universe of A? Does it combine the universes of A and B somehow?

Project and Extend likewise operate on both universes and facts.

Hmmm, what happens if we define the database operations correspondingly in terms of the A algebra using <AND>, <OR> etc. ? I'll have to look into that. Have you gone down that path already?

It seems like there are a lot of different paths to explore. Received on Sat Jul 05 2003 - 01:31:45 CEST

Original text of this message