Re: Types and "join compatibility"

From: André Nęss <andre.naess_at_gmail.com>
Date: 4 Aug 2005 11:58:39 -0700
Message-ID: <1123181919.455069.151790_at_g49g2000cwa.googlegroups.com>


Between overtime at work, and my girlfriend actually wanting to see me occasionally I finally had the time to sit down and follow up on this discussion.

Marshall Spight wrote:
> André Nęss wrote:
> >

> > As I started considering how to add a simple type system to the
> > implementation I stumbled upon something that my (possibly lacking)
> > googling skills can't find an answer to, so I hope someone here
> > might shed some light on it. The question is: given a relation A,
> > with a single attribute a of type TypeA, and a relation B with a
> > single attribute b of type TypeB, where TypeA is a subtype of
> > TypeB, what (if anything) does it mean to perform the natural join
> > of A and B (the same goes for union I guess...) Of course I here
> > assume that you either rename A.a to b or B.b to a before
> > performing the operation. >

> What an excellent question!
>
> As to the rename, yes of course, otherwise it wouldn't be a *natural*
> join. In fact, I'm going to rename your whole example for clarity.
>
> (read ":" as "has type"; P for parent, C for child)
> Relation PR : (a : P)
> Relation CR : (a : C)
>
> What is the type of PR join CR?
>
> Date&Darwen's TTM says it's a relation of type (a : P).
> This has never made sense to me; the result can only contain
> values that correspond to C, so what's the point of having
> it be P?

At first, it was obvious to me that it has to be P, because only P can contain the values of both PR and CR. But then your point dawned on me. Clearly only values that can exist in CR will be allowed into the resulting relation, so I agree with you, the TTM answer seems strange to me too.

Now if instead we did PR union CR, we should have a relation of type (a : P), right? Because the result can clearly contain values from both PR and CR, which requires us to "relax" the constraints defined by the type C.

But as I thought about this I realized that it would be much simpler to consider the basic operators. We should really just have to consider union, difference, restrict, project and product. Of these, the only one that requires us to think about types is clearly union. How to handle the case of union seems so obvious that I can't see any room for disagreement.

So what about the join? Well, the join is a restricted product. But the final operation you need to perform on this restricted product is to project out one of the two columns you are comparing in the restriction. The decision about which column to project out is not something the relational model has anything to say about. It's up to the type system. I find that reassuring, it seems to indicate that you can couple an implementation of the relation model with the type system of your choice, which is a good thing.

> Furthermore, if you want to have updatable views, the result
> attribute type *has* to be C, because otherwise the type
> would permit inserting values of type (a : P), which would
> violate the type of CR. [...]

I'd say that you can have updatable views regardless of what choice you make here, but the choice will clearly affect which views will be updatable.

> My take: subtyping adds a huge amount of complexity to these
> issues and adds little of value. Subtyping is hugely valuable
> in OOP languages but I'm not convinced it adds much to relational
> languages. It's possible I'm wrong about the value added, but
> I'm certainly right about the complexity; it's worse than it
> looks.

Subtypes seem so natural to include that I've not stopped to consider whether I need them. I guess that would be a good idea to do. For now I'm better of ignoring them. One thing that troubles me is the relationship between types and constraints. To me it seems that types and constraints overlap to a large degree. So for example you can define a type Email, which only accepts well-formed email addresses. Then you might have a business rule that says that you don't accept hotmail, yahoo and gmail as valid addresses, so you can either create an extra constraint for this, or create a new subtype of Email. Of course you could also implement the Email type as a constraint on a string attribute.

The fact that the choice seems quite arbitrary troubles me a little. It seems to indicate that something can be improved. I've so far only found one thread that touches this (if ever so lightly) here: http://groups.google.com/group/comp.databases.theory/browse_frm/thread/7899118d48145eaa/29bb770fce9ca264#29bb770fce9ca264

But I think it's an interesting question. My first thought is that it's a choice you have to make when you design the schema. You have to consider if the constraint you're adding is general enough to warrant it's own type, (which can be easily reused) or whether you're better off just using a constraint. A lot of constraints just constrain the possible values of some basic type, such as string or integer.

Anyway, material that discusses these things would be much appreciated!

> > If someone can provide straight answers that would be great, but I'm
> > also very interested in pointers to material I might study to
> > understand this in more detail. If something similar has been discussed
> > here before (I'm sure it has) I would be very interesting in reading
> > those threads. As I mentioned, except for an article about using a
> > union type as the attribute type I've not been able to find any threads
> > that cover this.
>
> In general, the type theory people and the relational theory people
> don't mix much. You will find a few people here and there who are
> interested in both. (I'm interested in both but I'm no expert.)
>
> If you want to read more about type theory, you can't do any better
> than "Types and Programming Languages" by Benjamin Pierce.
>
> http://www.amazon.com/exec/obidos/tg/detail/-/0262162091/qid=1122761096/sr=8-1/ref=pd_bbs_1/002-4661063-2538456?v=glance&s=books&n=507846
>
> It's not an easy read by any means.

Amazon just sent me the dispatch confirmation, so I hope to find it in my mailbox on Monday. I am actually one of the people interested in both the relational model and types. My knowledge about the former is meager, but nevertheless easily eclipses my knowledge about the latter. The book had actually been on my shortlist for quite a while, after I'd seen it recommended over at Lambda the Ultimate, and I look forward to some much needed enlightenment in the area of type theory. Received on Thu Aug 04 2005 - 20:58:39 CEST

Original text of this message