Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: vc <boston103_at_hotmail.com>
Date: 7 Jul 2005 08:24:16 -0700
Message-ID: <1120749856.318972.293250_at_f14g2000cwb.googlegroups.com>


Jon Heggland wrote:
> In article <1120666414.650757.194510_at_g49g2000cwa.googlegroups.com>,
> boston103_at_hotmail.com says...
> > > Let me speculate: Your union type is the union of the int, float and
> > > char domains (domains being sets of values). Let's make the reasonable
> > > assumption that they are disjoint, even though a case could be made that
> > > (e.g.) 2 == 2.0 (and that C consider characters and integers pretty
> > > interchangeable). In that case, a value of the my_data type could be
> > > *either* an int, a char or a float, but not more than one at a time.
> > > Trying to treat it as a char if it is a float, leads to a (run-time?)
> > > error.
> >
> > The C example is not a very good one due to C's unsatisfactory type
> > system. I gave it because you said you were unfamiliar with languages
> > having a sound type system(ML, Haskell, etc).
>
> Very well. Is my speculation above valid if we apply it, mutatis
> mutandis (to use Date's slightly pompous phrase:), to the ML example? If
> not, why not?

Your speculation is not correct. The value of the my_data type cannot be, say, an int (which is shorthand for saying the value would be of the int type) because its type is a new user-defined type "my_data". You cannot apply any function dealing with ints to the value of the my_data type, you need to define functions capable of handling values of the type you've just defined.

E.g. you'd need to define conversion functions (my_data->int, my_data->char, etc) and any additional functions you'd want to have.

>
> > My point was that the component data types are hidden and you have to
> > operate with the resulting type via operators/functions you need to
> > define.
>
> Sure. I guess you could view possreps as little more than
> operators/functions on the domain. They are important in order to be
> able to specify literals, though. In my example, you cannot say
> Temperature t = 24.5, because 24.5 is a double, and Temperature is not.
> You'd have to say Temperature t = Celsius(24.5) (not valid Java syntax,
> of course), where Celsius(24.5) is an expressing denoting the (one and
> only) Temperature that is 24.5°C (and 76,1°F and 297,65 K). (That
> expression is called a selector invocation, by the way.)

See below.

>
> > > In contrast, a TTM domain with possreps has in internal representation
> > > that we don't care about. It has one or more possible representations
> > > that are always valid, no matter what the value is. This probably
> > > necessitates a 1:1 mapping between the internal representation and each
> > > possrep, though I can't recall having seen this spelled out explicitly
> > > in TTM. In any case, the domain is *not* a subtype of its internal
> > > representation---Temperature is not a "narrowing" of double. It is
> > > encapsulated. We can change the internal representation without
> > > impacting the rest of the world. We can add possreps without breaking
> > > existing code. Your union type can't do that. It's the magic of OO. :)
> >
> > When I started talking about union types I had in mind *multiple*
> > possreps as in the cartesian point vs. polar. You do not need u.t.
> > with a single possrep, of course, just multiple operators/functions
> > to produce the desired result. Sorry for the confusion.
>
> My Temperature has three possreps; four after my KelvinString
> introduction. I can make a point, if you will. :)

Let's use the Tutorial D example instead in order to avoid Java type system pecularities. There, TEMPERATURE is defined as having one possible representation:

TYPE TEMPERATURE POSSREP CELSIUS ( C RATIONAL ) ; In ML we would say (trying to be close to the Tutorial D example):

datatype celsius = {c:rational}
datatype temperature = Celsius of celsius

Then we can decribe a value of the temperature type as Celsius {c=10} (analogous to the T.D. selector)and define a function, say, the_c mapping a temperature value to a rational value (temperature->rational). The T.D generates the access operator automatically.

Since with one possible representation the Celsius word does not do much (if anything), we can simplify the ML type dfinition to just:

datatype temperature = {c:rational} and describe a value of the type temperature as just {c=10}.

We do not need any additional terminology (possrep) and just operate in terms of types. The ML type itself can be considered as defining a set of scalars very much in the same way as the Tutorial D type.

>
> class Point {
> private double x, y;
> public double getX() { return X; } // And getY, setX and setY likewise
> public double getR() { return Math.sqrt(x * x + y * y); }
> public double getTheta() { return Math.atan(y / x); }
> // I need to handle x==0 and so on, but you get the picture.
> // setters are also trivial
> }
>
> Still Java, so selectors (which more plainly show the possreps) don't
> show up. They correspond almost to constructors, but Java constructors
> create a new variable (object); selectors merely designate a value (all
> of which exist a priori, just like all ints exist without any need to
> "create" them). In Tutorial D, you would just specify the two possreps
> (x, y) and (r, theta), and the "getters/setters" would be implicit.

Now, to the POINT example. The T.D. example:

TYPE POINT

     POSSREP CARTESIAN( X RATIONAL, Y RATIONAL )
     POSSREP POLAR ( R RATIONAL, THETA RATIONAL ) ;

.. can be expressed in ML as:

datatype cartesian = {x:rational, y:rational} datatype polar = {r:rational, theta:rational} datatype point = Cartesian of cartesian|Polar of polar

The last datatype (point) is called a union type because it's a union of two types, cartesian and polar. To designate a value, one would say:

Cartesian {x=1, y=2} or Polar {r=3, theta=4}. naturally, both values would be of the point type.

Again, in order to access the x, y, r, theta, one would need to create the respective functions manually as opposed to the T.D. automatically generating those.

>
> > The POINT example from the TTM. In one of your messages you've
> > referred to the Point definition although I believe you did not mention
> > multiple possreps. Anyway, I was objecting to the term possrep vs.
> > type (or, in the case of multiple possreps, vs. union types) since
> > there is already established terminology for this sort of things.
>
> No, there isn't---or at least, it isn't union types. Of course, every
> type/domain is a set of values, and every set can be expressed as the
> union of other sets. "int" is a union type, it is the union of odd and
> even numbers, no? But that is trivial.

The "union type" terminology has been used for a long time in programming languages, both imperative and functional, like C, Pascal, ML, Haskell, etc. Please search in Google, for example, for words "Luca Cardelli" (OO type theorist) and "union type". Example in Pascal:

type

   country = (canada, usa);
   zipcode =

       record
           case where: country of
              canada: (czip: string);
              usa : (azip: number)

end;

It defines the "country" union type with two tags "canada" and "usa", in a manner similar to ML or the Tutorial D possible representations. I cannot see how T.D.'s multiple representation types are different from the union type except for minor syntactical pecularities, of course.

>
> > > > The ability to say i=14 or i=0xE has got nothing to do with union data
> > > > types.
> > >
> > > My point exactly! But it has very much to do with possreps. You can
> > > represent an integer in decimal, or hex, or oct, or binary. Different
> > > ways of denoting the very same value.
> >
> > Hold on. 14 and 0xE are ways to represent the same values of the
> > integer type so that the compiler could understand it. It's got nothing
> > to do with the type system, possreps and such. It's like using Arabic
> > vs. Roman numerals. Let's not dwell on it -- it's irrelevant.
>
> No, THAT IS WHAT POSSREPS ARE! 0°C and 273,15 K are ways to represent
> the same value (not values) of the Temperature type so that the compiler
> (and the code writer/reader, I might add) could understand it.

The temperature example (from the T.D.) has only one possible representation defined (see above). But, no matter, let's assume that the temperature is defined as:

TYPE TEMPERATURE POSSREP CELSIUS ( C RATIONAL )

                  POSSREP KELVIN  ( K RATIONAL);

How "POSSREP CELSIUS ( C RATIONAL )" is different from "type celcius = {c:rational}" in ML, or any other language ?

>
> > > > How is it even relevant to the issue of types?
> > >
> > > It is a demonstration that possrep is a useful concept, that dates from
> > > before Date. So why not apply it to *all* types? A temperature is a
> > > temperature, be it measured in F, K or C. A point is a point, be it
> > > polar or cartesian. A distance is a distance, be it measured in
> > > lightyears, feet, meters or ångstrøm. An int is an int. 14 == 0xE && 0°C
> > > == 273.15 K.
> >
> > Cool. So one uses a user-defined type TEMP, what's the point of the
> > word possible representation ?
>
> Every value must have a representation so that the user can denote it,
> and the compiler can understand it. This representation could be the
> same as the representation the computer uses internally, but it
> shouldn't *have* to be---we might want to change the internal
> representation later (for performance reasons, perhaps), and existing
> code should not break because of it. Therefore, the representation used
> externally is called a possible representation,

I think you are wrong here (equating the external representation with possible representation). If by external representation you mean something the user can see, or type, then that would be a string of characters only (forgetting about GUI for a moment). The user cannot "see" an integer, for example, the integer has to be converted to a string of caharacters which can be shown to the the user on the display screen(the same of course applies to input). Yet, we can say possrep abc (x integer). I do not think that substituting the nebulous "possible representation" term for the well established and understood by many (hopefully) "type" is very productive. I can understand Date's motivation in trying to emphasize the fact that a user defined type should be treated as a scalar and avoiding the expression "component types"( for example) but the price is too high -- confusion.

> Thinking of possreps as (groups of) accessor functions (getters/setters)
> might help your understanding, though this ignores selectors, which are
> important.

It's much easier and more productive to think about accessor functions as mapping the user defined type value to component type values. I do not see how trying to digest the expression "possible representation" helps here.

>
> > As I understand, its sole purpose is to
> > introduce *multiple* possible representations.
>
> No. You don't need more than one.

What I meant here was that a possible epresentation is no different from the type and multiple representations are just union types. See the arguments above.

>
> > If so, again, why not
> > just call the thing a union type ?
>
> Because it isn't. Temperature is not the union of double, double and
> double, or even of fahrenheit, celsius and kelvin. Point is not the
> union of cartesian points and polar points. It does not even make sense
> to speak of such a union, because the set of cartesian points and the
> set of polar points is the same set---the set of 2D points. (0,4) and
> (4,90°) is the same point,

It's an incorrect( simplistic point) of view. When we define a union type in a p.l., we provide a set of labels:

The following is possible:

In ML,
  datatype dt = Label1 of int | Label2 of int

In the T.D.
  TYPE DT POSSREP LABEL1( C INTEGER)
           POSSREP LABEL2( C INTEGER); Therefore, even though the unlabelled sets are the same set, the labelled sets are different. Union types corespond to disjoint (discriminated) sets in math:

let {Xi: i of I} be a family of sets indexed by I. Then, the disjoint union will be Union( (x, i): x of Xi ), in other words the elements of the disjoint union are ordered pairs (x, i). So, if you have a set {1,2,3} and an index {i1, i2}, the disjoint union will be {(1,i1), ..., (3,i2)}. If we have only one set X, the disjoint union is just the Cartesian product of X and I.

> just like 14 and 0xE is the same int.

No, see above.

> --
> Jon
Received on Thu Jul 07 2005 - 17:24:16 CEST

Original text of this message