Re: Thinking about MINUS

From: Bob Badour <bbadour_at_pei.sympatico.ca>
Date: Tue, 09 Jan 2007 16:58:02 GMT
Message-ID: <uQPoh.42727$cz.625517_at_ursa-nb00s0.nbnet.nb.ca>


Marshall wrote:

> On Jan 8, 8:22 pm, Bob Badour <bbad..._at_pei.sympatico.ca> wrote:
>

>>Marshall wrote:
>>
>>>Back to the original idea, which was what happens when you
>>>join two relations with an attribute name in common but different
>>>types for that attribute. Bob suggested using a union type, which
>>>is sound, however I prefer calling it a type error.
>>
>>It is not an either/or thing. The resulting value is as I described, and
>>if that value violates a constraint like "There are no union types",
>>then it raises an error. However, how does one express the constraint if
>>one does not recognize the value in the first place?

>
> This perspective is very foreign to me, and I am not sure I understand
> it. Let me recast this in a different context, and see if I follow:
>
> Suppose we attempt to evaluate the following program fragment:
>
> 5 + "hello"
>
> in a formal language with two data types, int and string, and the
> operation + takes two integer operands. Does the above comment
> apply? Your comment seems to be saying we need to have
> "express[ed]" constraints to have constraints, but it is possible
> to have constraints on a language simply by the *lack* of an
> appropriate evaluation rule. Or was the question rhetorical?
> Or did I completely misapprehend you?

Unless we make one up, we have no + operation that operates on numbers and strings. Similarly, we have no < comparison that operates on numbers and strings. However, the equality comparison operates on any two values (unless we go out of our way to redefine it otherwise.)

Semantically, it makes no sense to say "This orange is less than that apple"; however, it makes perfect sense to say "This orange is not that apple". If someone asks "Is this orange less than that apple?", we have a question we can parse but neither interpret nor answer. If someone asks "Is this orange that apple?", we have a question we can parse, interpret and answer correctly: "No, it is not."

 From a usability standpoint, there are many contexts in which humans benefit from an answer more along the lines of: "Of course, it is not. Are you aware you are comparing apples and oranges?" However, as I have said all along, that depends on context (and human cognition.)

Thus, if joining attributes with similar names and different types is an error, it is an error unrelated to the equality comparison. If there is an error, it must relate to the data type of the resulting joined attribute, which raises the question: "What is the data type of the resulting joined attribute?" It is obviously the most specific supertype of the types of the joined attributes.

If the dbms lacks any facility to represent or to evaluate the resulting supertype, then you have your desired error. However, that would be a limitation forcing errors even in contexts where we desire no error.

One can devise a dbms to arbitrarily leave some equality comparisons undefined, and that would give an error too. However, that would be a kludge or hack forcing errors in even more contexts where we desire no error.

>>>(Also note that if we use the union type solution, the result will
>>>always be empty, which is a signal that the operation doesn't
>>>really do anything interesting.)
>>
>>Do I understand correctly that your position is: "Contradictions lack
>>interest" ?

>
> Heh.
>
> Another example: consider the following function of two parameters:
=>
> int f(int a, int b) { return 0; }
>
> Nothing wrong with it per se, no type errors or anything, but the
> fact that its return value is independent of the value of its arguments
> should at least arouse some suspicion as to its moral character.

Agreed. In fact, if I recall correctly, someone once determined that the lint warning for unused local variables correlated more with bugs than any other warning or error message. Of course, that might be urban legend. (I am relying solely on a ~20 year old memory, and I doubt I read the original work.)

However, at this point what we are discussing is more applied psychology than applied mathematics (as important as psychology is when dealing with humans.)

>>>If we use a name in two different scopes, and use it differently
>>>in those two scopes, and we intend to merge the two scopes
>>>together, we need to resolve that.
>>
>>Please note that the above provides additional context to explain the
>>constraint. Whether it is an error depends on context.

>
> I would agree; I would also claim that the particular formal language
> one is using is part of that context.

Of course.

>>>One of the things join does is merge two namespaces,
>>>namely the attribute namespace
>>>of each relation.
>>
>>I disagree. The join operation necessarily operates within a single
>>namespace. Different invocations of the join operation may operate
>>within separate namespaces, however.

>
> Mmmm, not sure whether you understood me.
>
> To my way of thinking, the database is a namespace that contains
> named relvars, or in SQL we say "tables." *Each* table is itself
> a separate namespace for attributes.

When it comes to relations, I disagree, and I see no reason to perpetuate the sins of SQL.

If you have not read it yet, I highly recommend this recent essay by Hugh Darwen:
http://www.dcs.warwick.ac.uk/~hugh/TTM/HAVING-A-Blunderful-Time.html

(Pay close attention to what he says about names.)

  The demonstration of which
> is that we can reuse an attribute name in many different tables.
> So there are two different levels of namespaces, an "upper" one
> for relvar names and a "lower" one for attribute names.

You and I apparently disagree on what constitutes a namespace. In Tutorial D, the WITH statement identifies a namespace but a relation does not.

Imagine a mathematical proof that makes some statements about x and y and other statements about y and z. How would you react to a proof where y refers to a different dimension when used with x than it does when used with z ?

> When one does a join, each of the two operands brings with
> it a lower namespace, and the two namespaces have to be
> unified for the operation to succeed. The result of the join
> will be a new relation, with a new attribute namespace built
> (somehow) from the attribute namespaces of the two operands.

Ah, I see where part of the difference is, and it is more to-may-to/to-mah-to than anything. In my way of thinking, the join operation operates on a single namespace. By invoking join, we first have to agree on what that namespace is. In your way of thinking, the join creates the contract to which we agree to agree.

  1. Consider the following system of equations:
    1. R1(x,y): y^2 + y = x^2*y
    2. R2(y,z): y^2 - 1 = y*z - z
  2. We can transform those equations under certain given constraints into:
    1. R3(x,y): y = x^2 - 1, y != 0
    2. R4(y,z): y = z - 1, y != 1
  3. We can relate x to z by joining the above and projecting away y:

R5(x,z): z-1=x^2-1, x != +/-1, z != 2

To my way of thinking, before one can even think about applying the join operation, we have to agree on what y means just as we have to agree that y != 0 before we divide both sides of the equation by y. In 2 a), y means y in some domain where y != 0. In 2 b), it means y in some domain where y != 1. In order to apply the join operation, we must agree that y means the intersection of the two sets so that y is neither 1 nor 0. In your way of thinking, the join operation creates the domain by interesecting the two sets.

One could imagine a different step 1 than above leading to a slightly different step 2 above, which more accurately reflects the issue of disjoint types:

  1. R3(x,y): y = x^2 -1, y > 0
  2. R4(y,z): y = z - 1, y < 0

Proceeding as above would yield:

R5(x,z): z-1=x^2-1, false

You are saying that the ", false" necessarily indicates an error, and in many contexts I agree human users would benefit from a warning at the minimum. I am not yet prepared to say it is necessarily an error or that it even requires a warning in every context.

I ask: What if the ", false" were part of an intermediate result? What if some later operation requires a union of domains rather than an intersection so that the end result has no ", false"? Can I imagine a context were no warning is appropriate? I can at least imagine one--even if I have not yet identified one?

I say that in some contexts we can simpify "R5(x,z): z-1=x^2-1,false" even further into:

R5(x,z): false

> There are many approaches a formal system could take
> for how to handle what happens when one has a name used
> with two different types in the two namespaces:
>
> 1) create a union type
> 2) fail
> 3) automatically rename both of the conflicting names
> 4) don't require attribute names to be unique
>
> (3 and 4 suck but illustrate that many approaches are possible.)

I suppose the above depends on what one means by formal system. A formalism is a notation suitable for symbolic manipulation. Assuming a formal system includes the formalism and the rules for symbolic manipulation, I assume the above relates to those rules. I don't think namespaces are relevant to the current discussion and I disagree with your analysis.

We have a formalism for expressing predicates as constraints. Data types are merely constraints in those predicates. The manipulation rules spell out what to do with those constraints in various situations, and the rules differ from one situation (ie. operation) to another.

I think I have convinced myself that my original suggestion was wrong, but for completely different reasons. I originally thought the resulting attribute would have the data type of the most specific supertype and a constraint such that there are no values. However, a constraint identifies a subtype and the manipulation rules for join differ from the above.

To join the attributes, we must agree to use the intersection of their domains. This suggests the data type must be a subtype of both attributes. Joining reals with integers, one necessarily ends up with integers. In fact, it should be the most general common subtype or least specific common subtype of the declared types.

Thus for completely disjoint declared types, the data type of the resulting attribute is the universal subtype not the universal supertype. The universal subtype, of course, has an empty set of values and the union of all operations.

I suggest this is not an error at all from the dbms' perspective as one can directly observe the data type. It is then up to the application presenting the data to a user to identify the data type through some appropriate representation, which might be an error message in the case of the universal subtype.

>>Can you imagine a single proof or lemma in mathematics where the symbol
>>x exists within multiple namespaces? In fact, one could look at a lemma
>>as primarily introducing a separate namespace. As soon as one writes
>>"Let x represent...", one defines the namespace of x.

>
>
> Just so.
>
>
>
>>>Well, I'm not explaining it very well. Hope to get more sleep tonight
>>>than last night.
>>
>>I hope I get more sleep tonight than last night too--hopefully at least
>>as much sleep as I got this afternoon! :P

>
> Lately I have found that, while a large amount of alcohol makes me
> sleep poorly, a small amount of alcohol can help me
> sleep quite well. I'm exploring dessert wines in tiny glasses
> just before bedtime. Sweet.

I used to make ice wine. At only $30 - $40 per liter, it was quite a bargain. However, I find I get ear infections after drinking it because the sugar and viscosity create an ideal medium for bacteria to grow in my eustachian tubes.

Alcohol is a stimulant. It is probably the sugar helping you sleep more than the alcohol by creating an insulin response that then moves tryptophan across the blood-brain barrier in preference to taurine. Combining the carbs with a little protein would make it even more effective.

A bit of meat, cheese and a cracker might prove more effective without the alcohol. A bit of fruit and a slice of ham should do the trick too. Received on Tue Jan 09 2007 - 17:58:02 CET

Original text of this message