Re: Does Codd's view of a relational database differ from that ofDate&Darwin?[M.Gittens]

From: Jon Heggland <heggland_at_idi.ntnu.no>
Date: Fri, 8 Jul 2005 11:15:29 +0200
Message-ID: <MPG.1d3859cbb8b84619896ec_at_news.ntnu.no>


In article <1120749856.318972.293250_at_f14g2000cwb.googlegroups.com>, boston103_at_hotmail.com says...
> Your speculation is not correct. The value of the my_data type cannot
> be, say, an int (which is shorthand for saying the value would be of
> the int type) because its type is a new user-defined type "my_data".

In that case, I am confused by the use of the word "union". What does it mean, if it is not the union of the set of integers, the set of characters and the set of floats?

> You cannot apply any function dealing with ints to the value of the
> my_data type, you need to define functions capable of handling values
> of the type you've just defined.

Then what use is it? What can you do with it except assign values to variables of that type? There seems to be at least some operators associated with it by default, since you can say x.c='f' and y.f=3.14. What is the type of the expression x.c? Is it char or my_data? If it is my_data, there at least is an operator to convert char to my_data. Why isn't there one to to do the opposite? Can you compare a my_data value to a char, int or float?

> E.g. you'd need to define conversion functions (my_data->int,
> my_data->char, etc) and any additional functions you'd want to have.

How would the definition of such a function look?

> Let's use the Tutorial D example instead in order to avoid Java type
> system pecularities. There, TEMPERATURE is defined as having one
> possible representation:
>
> TYPE TEMPERATURE POSSREP CELSIUS ( C RATIONAL ) ;
...and possreps FAHRENHEIT ( F RATIONAL ) and KELVIN ( K RATIONAL ).

> In ML we would say (trying to be close to the Tutorial D example):
>
> datatype celsius = {c:rational}
> datatype temperature = Celsius of celsius
>
> Then we can decribe a value of the temperature type as Celsius {c=10}
> (analogous to the T.D. selector)and define a function, say, the_c
> mapping a temperature value to a rational value
> (temperature->rational). The T.D generates the access operator
> automatically.

Great. Is your temperature a union type? Where is the union? Is temperature the union of celsius? Why are they not the same, in that case?

(By the way, I find it counter-intuitive to treat celsius as a datatype. Celsius is a representation of temperature.)

> Since with one possible representation the Celsius word does not do
> much (if anything), we can simplify the ML type dfinition to just:
>
> datatype temperature = {c:rational} and describe a value of the type
> temperature as just {c=10}.

I guess I really should learn ML, but.... Can I now define a datatype coloumb {c:rational}? How can celsiuses and coloumbs then be distinguished? If I can't, what is the use of defining the datatypes as opposed to just using rational?

> Now, to the POINT example. The T.D. example:
>
> TYPE POINT
> POSSREP CARTESIAN( X RATIONAL, Y RATIONAL )
> POSSREP POLAR ( R RATIONAL, THETA RATIONAL ) ;
>
> .. can be expressed in ML as:
>
> datatype cartesian = {x:rational, y:rational}
> datatype polar = {r:rational, theta:rational}
> datatype point = Cartesian of cartesian|Polar of polar
>
> The last datatype (point) is called a union type because it's a union
> of two types, cartesian and polar. To designate a value, one would
> say:
>
> Cartesian {x=1, y=2} or Polar {r=3, theta=4}. naturally, both values
> would be of the point type.

But you have three types instead of just one. And I guess comparing {x= 3,y=0} and {r=3,theta=0} would be a type mismatch error, while comparing Cartesian {x=3,y=0} and Polar {r=3,theta=0} would yield true. Or would it?  

> The "union type" terminology has been used for a long time in
> programming languages, both imperative and functional, like C,
> Pascal, ML, Haskell, etc. Please search in Google, for example, for
> words "Luca Cardelli" (OO type theorist) and "union type".

I looked in Wikipedia. It says union types are incompatible with type safety, unless you use tagged unions, or you only use operations belonging to a common supertype of the types involved in the union (which I don't see the sense of, if you already have sub/supertypes). Tagged unions (also known as disjoint unions) are safe, but any value belongs to just *one* of the types in the union. In contrast, any value in the possrep system has a representation in *all* of the possreps of its type.

> Example in Pascal:
>
> type
> country = (canada, usa);
> zipcode =
> record
> case where: country of
> canada: (czip: string);
> usa : (azip: number)
> end;
>
> It defines the "country" union type with two tags "canada" and "usa",
> in a manner similar to ML or the Tutorial D possible representations.

I don't follow you here. I haven't used Pascal since high school, but isn't the country type an enum? Is it a union type? A union of what? Of canada and usa? Are those types?

> I cannot see how T.D.'s multiple representation types are different
> from the union type except for minor syntactical pecularities, of
> course.

Your zipcode is either a string or a number. A value with (say) two possreps has two representation, not just the one xor the other.

> > > > > The ability to say i=14 or i=0xE has got nothing to do with union data
> > > > > types.
> > > >
> > > > My point exactly! But it has very much to do with possreps. You can
> > > > represent an integer in decimal, or hex, or oct, or binary. Different
> > > > ways of denoting the very same value.
> > >
> > > Hold on. 14 and 0xE are ways to represent the same values of the
> > > integer type so that the compiler could understand it. It's got nothing
> > > to do with the type system, possreps and such. It's like using Arabic
> > > vs. Roman numerals. Let's not dwell on it -- it's irrelevant.
> >
> > No, THAT IS WHAT POSSREPS ARE! 0°C and 273,15 K are ways to represent
> > the same value (not values) of the Temperature type so that the compiler
> > (and the code writer/reader, I might add) could understand it.
>
> The temperature example (from the T.D.) has only one possible
> representation defined (see above). But, no matter, let's assume that
> the temperature is defined as:
>
> TYPE TEMPERATURE POSSREP CELSIUS ( C RATIONAL )
> POSSREP KELVIN ( K RATIONAL);
>
> How "POSSREP CELSIUS ( C RATIONAL )" is different from "type celcius =
> {c:rational}" in ML, or any other language ?

CELSIUS is not a type, it is a representation. But it seems we both are just repeating ourselves here.

You said, "14 and 0xE are ways to represent the same values of the integer type". Isn't that useful? Why not extend such functionality to other types? That is what possreps do, and I don't think it makes thing clearer to define int as the union of decimal and hexadecimal numbers.

> > Every value must have a representation so that the user can denote it,
> > and the compiler can understand it. This representation could be the
> > same as the representation the computer uses internally, but it
> > shouldn't *have* to be---we might want to change the internal
> > representation later (for performance reasons, perhaps), and existing
> > code should not break because of it. Therefore, the representation used
> > externally is called a possible representation,
>
> I think you are wrong here (equating the external representation with
> possible representation). If by external representation you mean
> something the user can see, or type, then that would be a string of
> characters only (forgetting about GUI for a moment). The user cannot
> "see" an integer, for example, the integer has to be converted to a
> string of caharacters which can be shown to the the user on the display
> screen(the same of course applies to input).

A character is no more concrete (or abstract) than an integer. Characters are represented by integers (ASCII, EBCDIC); in many programming languages you can denote a character by an integer. The user cannot "see" a character, for example, the character has to be converted to a pattern of pixels which can be shown to the user on the display screen (the same of course applies to input).

> Yet, we can say possrep
> abc (x integer). I do not think that substituting the nebulous
> "possible representation" term for the well established and understood
> by many (hopefully) "type" is very productive.

"Possrep" is definitely not a substitution for "type", and I have never said so! What are 14 and 0xE in your union type world? Values of different type?

> It's much easier and more productive to think about accessor functions
> as mapping the user defined type value to component type values. I do
> not see how trying to digest the expression "possible representation"
> helps here.

That is a value judgment. I think it's much easier to understand that a given value can have multiple representations (think of the letter 'a' in different typefaces)---especially given the extremely simple example of 14 and 0xE---than to understand how union types are supposed to work.

> > > As I understand, its sole purpose is to
> > > introduce *multiple* possible representations.
> >
> > No. You don't need more than one.
>
> What I meant here was that a possible epresentation is no different
> from the type and multiple representations are just union types. See
> the arguments above.

No. With a union type, a value is *one* of the types involved in the union. With a multiple possrep type, a value has *all* of the possible representations.

> > just like 14 and 0xE is the same int.
>
> No, see above.

I don't get it. If they are not the same, what is the difference? What are their types, if not int? Why does comparing them yield true if they are not the same?

-- 
Jon
Received on Fri Jul 08 2005 - 11:15:29 CEST

Original text of this message