An alternative to possreps

From: David BL <davidbl_at_iinet.net.au>
Date: Fri, 27 May 2011 01:06:23 -0700 (PDT)
Message-ID: <d4edcbae-ff80-4bf4-9c04-c0e14d599003_at_35g2000prp.googlegroups.com>



The purpose of this post is to outline an alternative to POSSREPS for distinguishing between a type and the physical representation of values of that type on a system.

This is achieved using a DDL which can be regarded primarily as a language for writing a schema which describes what values can legally be recorded in the database and this is achieved without any dependence or even reference to physical representations. In that sense the types are purely abstract and this approach maximises physical independence. It should be understood that implementations sometimes impose additional limitations on what values can be physically represented, but these are regarded as deficiencies of the implementation, even if they seem inevitable. The DDL is illustrated with examples below.

Types are fully abstract in the sense that they are simply declared and not defined in terms of possible representations. For example:

    type Bool;
    type Natural;
    type Integer;
    type Rational;
    type Real;
    type Point;
    type Circle;
    type Ellipse;

Note that there is no difficulty declaring a type like Real even though it is uncountably infinite. At this stage the ways of logically or physically representing values of these types is unspecified and therefore open ended.

Let us write T1 isa T2 to declare the fact that T1 is a subtype of T2. This means every value of type T1 is also a value of type T2. It follows that any operator with an argument of type T2 can also be passed a value of type T1. For example, the schema may declare:

    Natural isa Integer;
    Integer isa Rational;
    Rational isa Real;
    Circle isa Ellipse;

As for types, consider that all operators are treated in a fully abstract way as well. i.e. the signatures are declared but no implementations of the operators are defined. This is because as far as a DDL is concerned we are concerned with (logical) data representation and not with defining a computation to be performed on a computer. For example:

    Natural <--- 0;

    Real <--- add(Real, Real);
    Real <--- subtract(Real, Real);
    Real <--- mult(Real, Real);
    Real <--- divide(Real x, Real y) on not(y = 0);
    Real <--- pi;
    Real <--- sin(Real);
    Real <--- cos(Real);

    Circle <--- circle(Point centre, Real radius) on radius >= 0;
    Point <--- centre(Circle);
    Real <--- radius(Circle);

Operators can have arity 0, allowing for literals like false, true, 0, pi or e. To the extent that all operators are declared but not defined, operators of arity 0 can be regarded as representing otherwise unspecified abstract values.

Operators can declare constraints (i.e. a domain on which the operator is defined). For example, divide(Real x, Real y) is only defined on nonzero y.

Covariant return types


Consider these operator declarations:

    Natural <--- add(Natural, Natural);
    Natural <--- mult(Natural, Natural);

    Real <--- add(Real, Real);
    Real <--- mult(Real, Real);

It would appear that operator overloading is supported (e.g. 'add' is declared on both Natural and Real). The purpose here is not to redefine behaviour but rather only to express covariant return types. This appears useful if implicit downcasts are not permitted.

Operator expressions


The schema in effect defines a grammar for nested expressions of operator invocations. Each legal expression is a literal meaning it denotes a value. For example from the following two operators

    Natural <--- 0;
    Natural <--- s(Natural);

we can represent any natural number starting from 0 and using the successor operator s as many times as required:

      1 is logically represented by s(0)
      2 is logically represented by s(s(0))
      3 is logically represented by s(s(s(0)))
      etc

We regard these expressions as /logical representations/ of values, to distinguish them from physical representations on some physical medium. For example the value with a logical representation s(s(s(0))) may be physically represented in computer memory as a single byte with the binary value 00000011, or in countless other ways.

The following two operators provide two different ways to select geometrical point values (and can be compared to POSSREPS, although here we make it very clear that we are strictly talking about alternative logical representations):

    Point <--- cartesian(Real x, Real y);     Point <--- polar(Real modulus, Real arg) on modulus >= 0;

The fact that regardless of physical representation we can always retrieve the "components" of these two representations is reflected in the following operators:

    Real <--- x(Point);
    Real <--- y(Point);
    Real <--- modulus(Point);
    Real <--- arg(Point);

Note that we don't designate some operators as "selectors" and some not.

One can imagine a representation of rationals using a pair of integers:

    Rational <--- rational(Integer n, Integer d) on not(d=0);

In this case unless we specify a canonical representation for the pair of integers there are no corresponding operators to retrieve numerator or denominator from a given rational.

As another example,

    circle( cartesian(0,0), s(0) )

denotes the unit circle.

Notice that these expressions only represent values and there is no implied calculation to be performed. We don't assume the operators map to executable functions in the implementation, e.g. which allow an expression like 2*3+1 to designate a calculation to be performed on binary representations of numbers. All such concepts are irrelevant at the logical level of discourse.

A schema is necessarily incomplete on uncountable sets like the reals. i.e. there must exist reals for which no logical representation exists. Nevertheless the schema may allow for expression of some useful countable subset that occur in particular applications (e.g. ruler and compass geometry) but are not simply the rationals or even the algebraic numbers. The schema ends up precisely defining the set of values that can be expressed. Even though the schema only ends up allowing for a subset of the reals to be expressed, it may not be appropriate to invent some name for it.

The logical representations form equivalence classes according to the value that they represent. For example, the following expressions all represent the integer 1:

      s(0)
      add(s(0),0)
      subtract(s(s(0)),s(0))
      radius( circle( cartesian(0,0), s(0) ) )

Since all expressions are literals, it can be assumed for example that any boolean valued expression is either equivalent to 'false' or 'true' (but not both at the same time). Any natural number valued expression is equivalent to exactly one expression of the form s(s(...s(0)...)). In both these cases we have a concept of a canonical form.

In principle the equivalence relation could be defined in some language, entirely at an abstract level (i.e. just as for the schema, without any dependency on physical representations). I find the idea intriguing that once the equivalence relation has been defined we have a basis for modelling computation on literals that generalises arithmetic but is entirely divorced from physical representations.

Perhaps there are cases where the grammar and/or operators are so complicated that determining whether two literal expressions are equivalent is undecidable. This is a computational problem and outside the scope of a DDL per se. It is a crucial consideration for physical representations.

Physical representation


Ideally for each type T there are bijections between the following:

  1. values
  2. equivalence classes over logical representations; and
  3. equivalence classes over physical representations.

Clearly this cannot be achieved on the reals. It seems best to regard floats as representing ill-defined overlapping ranges of reals (which means there is no mapping from reals to floats or vice versa).

It shouldn't be assumed that a schema forces the implementation to directly represent nested expressions in a generic manner suitable for arbitrary symbolic manipulation. For example the logical representation of the unit circle given previously might be physically realised using an instance in memory of a Circle struct in the C programming language defined as follows

    struct Circle
    {

        Cartesian centre;
        float radius;

    };

Union types


Since we have chosen to allow for uncountable types, we can for example allow for the type Set<Point> which is associated with any set of points on the 2D plane:

    type Set<Point>;

and declare:

    Circle isa Set<Point>;
    Rectangle isa Set<Point>;

In a sense Set<Point> acts like a union type, which is "open" because it can be a supertype for any number of types representing 2D shapes added to the type system in the future. This type is useful for expressing some operators that are uniformly available on all its subtypes:

    Set<Point> <--- intersection(Set<Point>, Set<Point>);
    Set<Point> <--- union(Set<Point>, Set<Point>);
    Set<Point> <--- difference(Set<Point>, Set<Point>);
    Set<Point> <--- translate(Set<Point>, Real dx, Real dy);
    Set<Point> <--- rotate(Set<Point>, Real theta);
    Set<Point> <--- scale(Set<Point>, Real sx, Real sy);

E.g. we can then express literals like

    rotate(

        intersection(
            circle(cartesian(0,0),1)),
            translate(unitsquare,0.5,0.5)),
        pi/4)

without declaring specific operators such as for the intersection of circles and rectangles. This is an advantage of allowing uncountable types.

We may also want to express closed union types. E.g.

    type CircleOrRectangle;

    Circle isa CircleOrRectangle;
    Rectangle isa CircleOrRectangle;

The distinction being made here is that we don't expect new types to declare themselves as subtypes of CircleOrRectangle.

The proposed DDL avoids syntactically distinguishing open and closed union types because it separates declarations of types from declarations of subtype relationships between types.

Elegant and orthogonal operators


Uncountable types tend to allow more elegant and orthogonal operators which are an advantage for data representation. For example, consider the following two data types:

  1. "idealised images" which are mappings on a domain which is the entire R^2 plane, which is both unbounded and a continuum.
  2. "digital images" which are mappings from a finite set of pixel locations to a finite set of brightness or colour values, and allows for a finite physical representation.

Consider rotation, scaling and translation operators on these two data types. For idealised images these operators are simple and elegant. By contrast on digital images they are extremely ugly. For example they raise the question of how pixel interpolation should be performed (e.g. nearest neighbour, linear interpolation etc). Pixel interpolation also has to deal with special cases near the edges or corners of the sampled image. Also these operations beg the question of what bounding rectangle should be used to contain the result, and what happens in areas in the result that aren't mapped to any part of the original image. It is noteworthy that on digital images these operators don't generally preserve information, and don't have nice inverses. For example scaling an image by 0.5 then by 2 typically results in pixel averaging. Received on Fri May 27 2011 - 10:06:23 CEST

Original text of this message