Re: Discovering new relationships

From: paul c <toledobythesea_at_oohay.ac>
Date: Sat, 03 Mar 2007 15:00:10 GMT
Message-ID: <_3gGh.1207878$5R2.459887_at_pd7urf3no>


Walt wrote:
> ...
> Thanks for the translation. Once I looked at it in mathematical notation,
> it reminded me that a function is just a special case of a relation. From
> there, I realized that I could have presented the example I gave without
> having to use SQL "functions" at all.
>
> Let's say that we have two relations S(a) and R(b) with no "natural join"
> between them.
> All we need to do to relate the data is discover a relation T(a,b). Now
> there's a natural join between S and T, and one between T and R. In the
> abstract, such a relation always exists.
>
> The question, in data management terms is this: Is such a relation
> "meaningful" in our universe of discourse? Stated in another way, is the
> proposition that T represents one that says anything useful in the U of D?
> This brings it back to data analysis.
>
> So I'm glad to have learned something, at the cost of having launched an
> unnecessary discussion.

Not to disagree, but just to offer a few notes from somebody who's also timid about some of the math:

 From what I know of RT, I wouldn't call the discussion unnecessary and I think it is quite useful. Also, given S and R, I think T certainly exists, I just wouldn't think of it as abstract, rather that we haven't made it concrete until somebody expresses the join. I believe that a significant number of people, including Darwen & Date, think of natural join as the join or match on equal values of equal sets of attributes, rather than matching on attributes with the same names.

(In case you haven't seen D&D's basic approach, which is an algebra, it's at http://www.dcs.warwick.ac.uk/~hugh/TTM/APPXA.pdf )

Their "natural join" is based on their <AND> operator. In that approach, the empty set of attributes is common to S and R. (What I like about it is that it allows relations to be manipulated as if they were classical boolean variables, along with the <OR> and <NOT> operators. Not to tout SQL, but I believe it does have a TIMES operator, which is just a specialization of <AND>. Also it is a hybrid or mixture of calculus and algebra and it's easy to get confused by the names of its keywords, eg., SELECT is really project, WHERE is really restrict, FROM is really join etc.)

Secondly, I think the D&D link above offers a maybe simpler way to express a "theta" function, namely relations. Not to criticize Marshall's bind, but in this approach it isn't needed. If one wants to avoid renaming attributes (or "bind") the functions f and g could be relations of two attributes, eg., if a and b were of domain zipcode and we wanted to know when a pair of a and b values were in the same state, we might write something like S{a} <AND> R{b} <AND> f{a,sn} <AND> g{b,sn}.

(Another way to say why I like this this approach is that conceivably I could quantify such an application solely in terms of relations, even if some of those relations happened to be code that implemented functions, ie., stop thinking, in my old-fashioned way, about lines of code.)

p Received on Sat Mar 03 2007 - 16:00:10 CET

Original text of this message