Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: vc <>
Date: 8 Jul 2005 07:36:38 -0700
Message-ID: <>

Jon Heggland wrote:
> In article <>,
> says...
> > Your speculation is not correct. The value of the my_data type cannot
> > be, say, an int (which is shorthand for saying the value would be of
> > the int type) because its type is a new user-defined type "my_data".
> In that case, I am confused by the use of the word "union". What does it
> mean, if it is not the union of the set of integers, the set of
> characters and the set of floats?

It's a union of labelled/indexed/tagged sets. Henceforth, I'll use the word tag since that's what you used farther in your message. E.g. let A = {1,2} and Tag = {L1, L2}. The definition

datatype T = L1 of A | L2 of A

will describe a set T = {(1,L1), (2. L1), (1,L2) (2, L2)}

> > You cannot apply any function dealing with ints to the value of the
> > my_data type, you need to define functions capable of handling values
> > of the type you've just defined.
> Then what use is it? What can you do with it except assign values to
> variables of that type? There seems to be at least some operators
> associated with it by default, since you can say x.c='f' and y.f=3.14.
> What is the type of the expression x.c? Is it char or my_data? If it is
> my_data, there at least is an operator to convert char to my_data. Why
> isn't there one to to do the opposite? Can you compare a my_data value
> to a char, int or float?
In order to do anything with any user defined type, one has to define operations/functions that can handle elements of the newly defined type. What can you do with the T.D. TYPE without defining functions over the type ?

As to comparing apples with oranges, can you compare TEMPERATURE with RATIONAL ?
> > E.g. you'd need to define conversion functions (my_data->int,
> > my_data->char, etc) and any additional functions you'd want to have.
> How would the definition of such a function look?

Given datatype t = L1 of int| L2 of float;

fun get_int(L1 x)= x | get_int(_) = /*raise error */ fun get_float(L2 x) = x | get_float(_) = /*raise error*/

> > Let's use the Tutorial D example instead in order to avoid Java type
> > system pecularities. There, TEMPERATURE is defined as having one
> > possible representation:
> >
> ...and possreps FAHRENHEIT ( F RATIONAL ) and KELVIN ( K RATIONAL ).

There was no 'and possreps FAHRENHEIT ( F RATIONAL ) and KELVIN ( K RATIONAL )' in my discussion.

> > In ML we would say (trying to be close to the Tutorial D example):
> >
> > datatype celsius = {c:rational}
> > datatype temperature = Celsius of celsius
> >
> > Then we can decribe a value of the temperature type as Celsius {c=10}
> > (analogous to the T.D. selector)and define a function, say, the_c
> > mapping a temperature value to a rational value
> > (temperature->rational). The T.D generates the access operator
> > automatically.
> Great. Is your temperature a union type?

Certainly not. The original example from which I derived mine has only one component datatype (one "possrep").

> Where is the union? Is
> temperature the union of celsius? Why are they not the same, in that
> case?

The questions do not make sense because with a single component type there is no union obviously.

> (By the way, I find it counter-intuitive to treat celsius as a datatype.
> Celsius is a representation of temperature.)

I'll ignore the remark for now since we already have more than enough "representations" to talk about.

> > Since with one possible representation the Celsius word does not do
> > much (if anything), we can simplify the ML type dfinition to just:
> >
> > datatype temperature = {c:rational} and describe a value of the type
> > temperature as just {c=10}.
> I guess I really should learn ML, but.... Can I now define a datatype
> coloumb {c:rational}? How can celsiuses and coloumbs then be
> distinguished? If I can't, what is the use of defining the datatypes as
> opposed to just using rational?

Of course, you can. They'll be distinguished by their tags {celsius, coloumb}.

> > Now, to the POINT example. The T.D. example:
> >
> >
> > .. can be expressed in ML as:
> >
> > datatype cartesian = {x:rational, y:rational}
> > datatype polar = {r:rational, theta:rational}
> > datatype point = Cartesian of cartesian|Polar of polar
> >
> > The last datatype (point) is called a union type because it's a union
> > of two types, cartesian and polar. To designate a value, one would
> > say:
> >
> > Cartesian {x=1, y=2} or Polar {r=3, theta=4}. naturally, both values
> > would be of the point type.
> But you have three types instead of just one. And I guess comparing {x=
> 3,y=0} and {r=3,theta=0} would be a type mismatch error, while comparing
> Cartesian {x=3,y=0} and Polar {r=3,theta=0} would yield true. Or would
> it?

Of course there are three type (actually more, but that's beside the point), two component types and one resulting type as, in my opinion, there are in the T.D. POINT example. What do you mean by comparing ? What specific comparison function do you have in mind ?

> > The "union type" terminology has been used for a long time in
> > programming languages, both imperative and functional, like C,
> > Pascal, ML, Haskell, etc. Please search in Google, for example, for
> > words "Luca Cardelli" (OO type theorist) and "union type".
> I looked in Wikipedia. It says union types are incompatible with type
> safety, unless you use tagged unions, or you only use operations
> belonging to a common supertype of the types involved in the union
> (which I don't see the sense of, if you already have sub/supertypes).
> Tagged unions (also known as disjoint unions) are safe, but any value
> belongs to just *one* of the types in the union.

I've been talking about tagged unions all the time. The fault is mine entirely in not making that clear.

> In contrast, any value
> in the possrep system has a representation in *all* of the possreps of
> its type.

That's odd. Are you saying that a hypothetical implementation would store all the "possible representations" for a given value ? What would be the point of that ?

But assuming the unlikely mental model is indeed correct, we can look at a type defined by multiple possreps as just a record type consisting of other record types (or "structures" using the C vocabulary):

datatype cartesian = {x:rational, y:rational}; datatype polar = {r:rational, theta:rational}; datatype temperature = (cartesian, polar); <-- instead of "Cartesian of cartesian | Polar of Polar" just a record with two types

> > Example in Pascal:
> >
> > type
> > country = (canada, usa);
> > zipcode =
> > record
> > case where: country of
> > canada: (czip: string);
> > usa : (azip: number)
> > end;
> >
> > It defines the "country" union type with two tags "canada" and "usa",
> > in a manner similar to ML or the Tutorial D possible representations.
> I don't follow you here. I haven't used Pascal since high school, but
> isn't the country type an enum? Is it a union type? A union of what? Of
> canada and usa? Are those types?

Sorry I mistyped. It should be read as: "It defines the "zipcode" union type with two tags "canada" and "usa".

An ML translation would be: datatype zipcode = Canada of string | Usa of number;

There are two sets, string and number, tagged with the country set {canada, usa}:

Union(string*canada, number*usa).

> > I cannot see how T.D.'s multiple representation types are different
> > from the union type except for minor syntactical pecularities, of
> > course.
> Your zipcode is either a string or a number. A value with (say) two
> possreps has two representation, not just the one xor the other.

My zipcode is either a tagged string or a tagged number. If a value with two possreps is thought of as having two "representations" at the same time than it's a record/structure (see above). Btw, does the T.D. say the definition implies that two possreps *exist* at the same time or it's your own interpretation ?

> > > > > > The ability to say i=14 or i=0xE has got nothing to do with union data
> > > > > > types.
> > > > >
> > > > > My point exactly! But it has very much to do with possreps. You can
> > > > > represent an integer in decimal, or hex, or oct, or binary. Different
> > > > > ways of denoting the very same value.
> > > >
> > > > Hold on. 14 and 0xE are ways to represent the same values of the
> > > > integer type so that the compiler could understand it. It's got nothing
> > > > to do with the type system, possreps and such. It's like using Arabic
> > > > vs. Roman numerals. Let's not dwell on it -- it's irrelevant.
> > >
> > > No, THAT IS WHAT POSSREPS ARE! 0C and 273,15 K are ways to represent
> > > the same value (not values) of the Temperature type so that the compiler
> > > (and the code writer/reader, I might add) could understand it.
> >
> > The temperature example (from the T.D.) has only one possible
> > representation defined (see above). But, no matter, let's assume that
> > the temperature is defined as:
> >
> >
> > How "POSSREP CELSIUS ( C RATIONAL )" is different from "type celcius =
> > {c:rational}" in ML, or any other language ?
> CELSIUS is not a type, it is a representation. But it seems we both are
> just repeating ourselves here.

OK, let's approach the problem from another side. A type is a set. When we say TYPE T POSSREP P(X INTEGER), how T is related to P ? Obviously T is a set which means that something defined by "POSSREP P(X INTEGER)" must be a set too. What kind of set this is ? How would you describe its elements ?

> You said, "14 and 0xE are ways to represent the same values of the
> integer type".

"14" and "0xE" are strings that are mapped to the same integer by the compiler in a specific language implementation. A human who wrote the function told the computer to do so. Per se, they do not have any specific meaning beyond just being strings of characters.

But let's postpone the "14" vs "0xE" discussion until we figure out what exactly you mean by "possible representation" ( a more narrow question).

> > Yet, we can say possrep
> > abc (x integer). I do not think that substituting the nebulous
> > "possible representation" term for the well established and understood
> > by many (hopefully) "type" is very productive.
> "Possrep" is definitely not a substitution for "type", and I have never
> said so! What are 14 and 0xE in your union type world? Values of
> different type?

"14" and "0xE" are strings and as such are of the same type, but see above.

> > It's much easier and more productive to think about accessor functions
> > as mapping the user defined type value to component type values. I do
> > not see how trying to digest the expression "possible representation"
> > helps here.
> That is a value judgment. I think it's much easier to understand that a
> given value can have multiple representations (think of the letter 'a'
> in different typefaces)---especially given the extremely simple example
> of 14 and 0xE---than to understand how union types are supposed to work.

See above. "14" and "0xE", being members of the string type, have got nothing to do with union types.

> > > > As I understand, its sole purpose is to
> > > > introduce *multiple* possible representations.
> > >
> > > No. You don't need more than one.
> >
> > What I meant here was that a possible epresentation is no different
> > from the type and multiple representations are just union types. See
> > the arguments above.
> No. With a union type, a value is *one* of the types involved in the
> union. With a multiple possrep type, a value has *all* of the possible
> representations.

See above (my question on whether the T.D. supports your interpretation and the "record type").

> > > just like 14 and 0xE is the same int.
> >
> > No, see above.
> I don't get it. If they are not the same, what is the difference? What
> are their types, if not int? Why does comparing them yield true if they
> are not the same?

By "No" I meant that "14" and "0xE" were not integers at all but strings *and* they did not have anything to do with the union types discussion.

> --
> Jon
Received on Fri Jul 08 2005 - 16:36:38 CEST

Original text of this message