Re: Mixing OO and DB

From: David BL <davidbl_at_iinet.net.au>
Date: Fri, 22 Feb 2008 20:15:36 -0800 (PST)
Message-ID: <67d62f47-7913-4add-afa6-a4541295b495_at_64g2000hsw.googlegroups.com>


On Feb 23, 3:04 am, Robert Martin <uncle..._at_objectmentor.com> wrote:
> On 2008-02-21 20:43:52 -0600, David BL <davi..._at_iinet.net.au> said:
>
> > On Feb 22, 4:03 am, Robert Martin <uncle..._at_objectmentor.com> wrote:
> >> On 2008-02-20 20:27:28 -0600, David BL <davi..._at_iinet.net.au> said:
> >>> How does a program represent an ellipse? I would prefer it if you
> >>> could be more careful with your terminology!
>
> >> I think you know the answer to that. The source code in
> >> ellipse.{h,cpp} that defines the C++ class named ellipse *represents*
> >> an ellipse at runtime; but is not an ellipse.
>
> > This is your justification for not sub-typing circle from ellipse?
>
> Yes! I think it's a pretty good justification too. References to
> subtypes are not themselves subtypes.

Using the approach described by C.Date, it is possible to define a language where it is meaningful for circle to subtype ellipse. What does that mean for your argument? Evidently your argument depends in more detail on the choice of language (ie C++). It does not follow from "class named ellipse *represents* an ellipse at runtime; but is not an ellipse". You need to sharpen your pencil!

> >> The point is that a C++ class named "Circle" is not a
> >> circle. It "refers" to a circle in some sense; but it is not a circle
> >> in and of itself.
>
> > I think you mean a run time instance of the Circle class "refers" to a
> > circle in some sense.
>
> Of course. Class circle is a model of circle-ness. Instances of class
> circle are models of individual circles. Those instances are not
> themselves circles; they are references to (models of) circles. And
> references to subtypes are not themselves subtypes.
>
> >> The fact that the slicing rules of C++ make it
> >> impossible to properly upcast a Circle value to an Ellipse value is
> >> simply evidence that the Circle value is not a subtype of an Ellipse
> >> value.
>
> > Disagree. C++ is broken in that case.
>
> Not in the slightest. The slicing rules of C++ are consistent with
> substitutability.
>
> >> Nor should it be since a circle value is not a circle, and an
> >> Ellipse value is not an ellipse.
>
> > No, by definition a circle value is a circle etc. Are you
> > associating the word "value" with what C.Date calls an appearance (or
> > encoding if you like) of a value?
>
> I don't know Date's nomenclature.

Consider the following definition of a circle as a set of points parameterised by centre (xc,yc) and radius r >= 0, where xc,yc,r are reals:

    C(xc,yc,r) = { (x,y) in RxR | (x-xc)^2 + (y-yc)^2 = r^2 }

This set is uncountably infinite. Nevertheless it is parameterised by only three reals. There is clearly a bijection between triplets (xc,yc,r) with r >= 0 and the set of all circles. Using your terminology, you would perhaps say a triplet (xc,yc,r) is a "model" for a circle. Unfortunately you're moving into a world of fuzzy terminology because the word "model" can mean all sorts of things to different people. Some people might say that a model must resemble the thing it models and triplets don't look like circles. When I see you use the word "model" my first inclination is to ignore you because I'm more interested in statements that can be understood mathematically. In this case I would rather just say that circles can be parameterised by triplets, which means specifically that the aforementioned bijection exists.

Taking this further, on computers we need a way to encode numbers and have to give up on any idea of being able to represent all the reals, so we only represent some subset - say by using an IEEE encoding. In this case we can talk about a function that maps an IEEE encoding to an element of the reals. This map isn't surjective.

Now so far in the above I have avoided using the word "value". However I regard that word as being associated with each and every of the above abstract mathematical "things". Each real number is a value, each circle is a value, a triplet is a value, a bijection is a value, a set of triplets is a value and so on. For me the term "value" is independent of any particular *appearance* (ie encoding) of a value that may appear in computer memory or whatever.

It is also worth saying what cannot be called a value. A human isn't a value because it isn't mathematically defined. Neither is a computer and therefore a variable associated with a specified region of computer memory. All of these occur in time and space. Therefore they are not values. Please take this as a definitional matter.

In this discussion we have only been talking about value types. I think of a value type as formally a set of values plus well defined operations on those values. As such a value type is itself mathematically defined and therefore is itself a value (that doesn't occur in time and space).

Even though a variable is not a value, C.Date says by definition a variable is a holder for an appearance of an encoded value.

In the context of OO I would qualify that by pointing out that some people may say that a variable has state but doesn't necessarily represent a value. A mutex variable is an example. However, in this discussion about value types, C.Date's definition seems quite useful.

A vitally important idea in computer science is the idea that real hardware running real software can be abstracted to an idealised *abstract machine*. IMO this is the single most important idea for formalising proofs of correctness of programs. For example an abstract machine can be assumed to have infinite memory so that heap allocations never fail. One can assume there is an infinite stack frame so we can ignore the question of whether we will run out of stack space. Another is to abstract away the idea of physical memory.

Perhaps the single most important abstraction is to think of variables as directly holding a mathematical value. ie we abstract away the fact that in reality there is an underlying encoding. As a specific example, in an abstract machine we may abstract away the whole question of whether 32 bit integers use a little or big endian encoding of the bytes in the physical address space. More formally there is an assumed mapping from the encoding used by a variable in the real machine to the mathematical value that the variable is deemed to hold in the abstract machine.

Note BTW that the mapping between the real machine and the abstract machine is not a necessarily injective or surjective. The whole point is to ignore irrelevant details or even limitations like IEEE representations of the reals.

So to summarise: we distinguish between variables and values. Variables can "hold" values and be assigned

Using Marshall's example

    int p = 5;

p denotes a variable that holds the value 5.

In the following

    Circle c( Point(0,0), 10 );

c denotes a variable that holds a particular circle value. The circle value is exactly the circle of radius 10 and centre (0,0) which is well defined as a set of points in RxR. There is no ambiguity in which circle value the variable holds, and that is all that matters when formalising the mapping from the real machine to the abstract machine.

Saying that c is a model of a circle is just gibberish. c is a variable that can be assigned a circle value. That a variable is able to do this is implicit in the definition of a variable - ie something that can bind to a value, even though values are mathematical abstractions like numbers or circles.

> However the fact that I put two
> variables next to each other, and call one "radius", and the other
> "center", and then I wrap both in a container labled "circle", does not
> mean I have a circle.

When variables are put together in a container in the way you describe, what we end up with is a *composite variable*. Therefore you are correct. What you have is a variable, not a circle (value).

> All I have is a model of a circle.

I would say you have a (composite) variable that can be deemed to hold a circle value.

> I cannot
> roll that container the way I could role a circle. I cannot use a tape
> measure to empirically measure the circumference. I cannot arrange
> little tiny squares within it to approximate the area. I cannot draw
> the container with a compass, nor can I bisect it and measure the
> internal angle of the bisection at a right angle. The container, for
> all it's wonderful ability to *describe* a circle, is not a circle.
> And so that poor container is not a subtype of a similar container that
> happens to describe an ellipse.

That's gibberish; all you need to say is that there is a difference between a variable and a value.

> > The analogy depends on the idea that a value-type variable that holds
> > an appearance of a value can be regarded as analogous to a "pointer"
> > to a value. I find this analogy a little suspect - because I would
> > reserve the terms "pointer" or "reference" to when there is some
> > concept of an address space and the pointer or reference is to a
> > location in the address space and that is distinct from what is stored
> > at that location.
>
> Consider three real numbers, x, y, and r. Taken together they
> represent a circle. (3,3,27) represents a unique circle with center at
> (3,3) and radius of 27. Let's say I have two triplets, both with
> values (3,3,27). They both represent the same circle, not two
> independent circles. Indeed, the triplets are references to the same
> circle. The triplets hold the "address" of the circle in cartesian
> space. The triplets are, for all intents and purposes, pointers to
> circles.

You're terminology is indefensible. Circles don't have an address. Triplets are values and not pointers to circles.

I would strongly suggest you try to be precise with terminology rather than the reverse! Avoid any need for analogy in your argumentation. Strong arguments don't depend on "proof by analogy".

> And pointers to subtypes are not themselves subtypes.
>
> > Values are self-identifying. When a variable holds an appearance of a
> > value there is an encoding involved, but that isn't the same as an
> > indirection in the sense of a pointer dereference.
>
> But the value is not a circle. The value *refers* to a circle.

You seem to confuse variable and value. A variable "refers" to a circle.

> >> References do not inherit the subtyping of their referents.
>
> > Sure.
> >> I once saw a geometry library where a circle was a subtype
> >> of arc. Same flaw.
>
> > So do you agree it is a bad idea to treat value-types in the OO sense?
>
> No, it is a bad idea for people to think that instances of C++ classes
> ARE the objects they represent. The author of that library made the
> erroneous assumption that instances of the circle class WERE circles.

I'm convinced that a language like C++ would benefit enormously from special support for value types using the approach described by C.Date.

> >>>> struct Address {
> >>>> string street, city, state;
>
> >>>> };
>
> >>>> struct AddressWithZip : Address {
> >>>> string zip;
>
> >>>> }
>
> >>>> void f(Address a);
>
> >>>> AddressWithZip az;
> >>>> f(az); // fine. Slices off the zip. Works great.
>
> >>> I find that extremely ugly. It only takes another small step for the
> >>> following
>
> >>> struct Square { int width; };
> >>> struct Rectangle : Square { int height; }
>
> >>> void f(Square s);
>
> >>> Rectangle r;
> >>> f(r); // fine. Slices off the height. Works great.
>
> >> But here you have made the error that the class of Square should be a
> >> subtype of the class of Rectangle, just because a true square is a true
> >> rectangle. That's the flaw! References do not inherit the subtype
> >> relationship of their referents.
>
> > You got it round the wrong way. The above code states (incorrectly)
> > that a Rectangle is a subtype of Square.
>
> Sorry, my mistake. I see it now.
> Yes, slicing a rectangle to become a square is possible. Don't do that.
>
> This is not a C++ problem any more than:
> class DesertTopping : public FloorWax
>
> Programmers can do dumb things.
>
> On the other hand slicing an AddressWithZip to an Address is not
> necessarily a dumb thing.
>
> > You have defined a base class called Address. Presumably this is a
> > value type, meaning that it is associated with an abstract set of all
> > (possible) address values.
>
> > From Address you define a subtype called AddressWithZip. As a subtype
> > there is meant to be some sense in which it is a specialisation. I
> > presume you would say informally that a AddressWithZip is-a Address.
> > Agreed?
>
> Yes.
>
> > It would seem valid to assume that within the set of all address
> > values, some have zip codes and some do not. As a type,
> > AddressWithZip is relevant to the addresses that have a zip code. If
> > follows that a zip code is regarded as part of an address value.
>
> > Unfortunately C++ will happily (silently) allow the zip code to be
> > sliced away. This means we end up with an Address value without a
> > zip code. Are we to assume that the address doesn't have a zip
> > code. Clearly we can't! Therefore given an instance of class
> > Address we cannot assume it actually locates a particular value in our
> > abstract set of all possible address values. This contradicts the
> > definition of Address as a value-type for all possible address values.
>
> I don't know about other parts of the world. However, in the US a
> zipcode is completely redundant information. It speeds up the postal
> sorters, but adds no new information. An Address uniquely specifies a
> postal delivery point. Indeed, the zipcode is derivable from the
> address.
>
> Now, let's say I have two programs. One that was written in the 60s
> before there were zipcodes, and so uses Address. Another that manages
> modern mailing lists and so uses AddressWithZip.
>
> void functionInOldProgram(Address a);
>
> void functionInNewProgram(AddressWithZip az) {
> functionInOldProgram(az);
>
> }
>
> This works, even though it slices. Indeed it *must* slice because the
> old program can't deal with even the existence of the zip code. And
> yet, the substitutability is consistent with the LSP definition of
> subtype.

Well you weaselled your way out of it (by saying that the zip code is redundant information) :)

What about the example C.Date uses: that of subtyping ColouredRectangle from Rectangle? Do you agree that is suspect? You can't say that a colour is redundant can you? If you think it's valid, please substitute Rectangle for Address and ColouredRectangle for AddressWithZip in my previous counterargument.

> >>> I said values not variables. By definition a value type is a set of
> >>> values plus operators on those values. Let S be a subtype of T. In C
> >>> ++ an S* can be upcast to a T*. Let v denote some value in S stored
> >>> in variable x (of type S). We can take the address of x, upcast it to
> >>> a T* and dereference (to read the value v) and assume it is of type T.
> >>> Therefore we have shown that upcasting pointers implies that the
> >>> values in a subtype must be a subset of the values in a supertype.
>
> > Please read my description again, with the following code in mind
>
> > class T { ... };
> > class S : public T { ... };
>
> > S x = v; // v is a value . (r-value in C++ terminology)
> > S* p = &x;
> > T* q = p;
> > T v2 = *q;
>
> > No slicing => values in type S are subset of values in type T
>
> OK, I think I follow. You are saying that V2 should == v since they
> both represent the same value, and therefore the properties of S ought
> to be a subset of the properties of T.

Not properties, values

> Of course this is not the way the language works. *q does indeed get
> sliced before it is assigned to v2. It's also not the way that
> subtyping works. Subtyping is a guarantee of substatutability, not of
> value preservation.

Yes in C++ subtyping means that. However most of my comments concern how I believe things should be. Received on Sat Feb 23 2008 - 05:15:36 CET

Original text of this message