Re: what are keys and surrogates?

From: Marshall <marshall.spight_at_gmail.com>
Date: Wed, 9 Jan 2008 18:44:02 -0800 (PST)
Message-ID: <d5f9163f-c631-4d79-bc5d-34d6514e2d3b_at_e23g2000prf.googlegroups.com>


On Jan 9, 5:26 pm, David BL <davi..._at_iinet.net.au> wrote:
> On Jan 10, 1:40 am, Marshall <marshall.spi..._at_gmail.com> wrote:
> > On Jan 8, 9:59 pm, David BL <davi..._at_iinet.net.au> wrote:
> > > On Jan 9, 1:25 pm, Marshall <marshall.spi..._at_gmail.com> wrote:
> > > > On Jan 8, 6:17 pm, David BL <davi..._at_iinet.net.au> wrote:
>
> > > > > In November I started a thread called "RM and abstract syntax trees"
> > > > > in which I suggested that RM was poorly suited for the representation,
> > > > > never mind manipulation of ASTs.
>
> > > > Hmmm, I think I remember that. ;-)
>
> > > > > The problem is that the only
> > > > > reasonable way to represent the structure is to introduce meaningless
> > > > > node identifiers. An important principle in the RM is that a tuple
> > > > > should always represent a proposition that makes sense to the problem
> > > > > domain expert, so I agree with you that we cannot allow hidden
> > > > > identifiers. Therefore the RM cannot help but expose the node
> > > > > identifiers for all to see.
>
> > > > > Prolog is able to parse string expressions entered by users and build
> > > > > and manipulate ASTs. Behind the scenes, nested functor expressions
> > > > > are usually implemented using dynamically allocated nodes wired up
> > > > > with pointers. However, as far as the programmer is concerned, only
> > > > > unification is available to decompose the structure. It seems to me
> > > > > that Prolog has a more general support for data modeling than
> > > > > available in the RM, to the extent that nested functor expressions
> > > > > avoid the need to introduce lots of meaningless identifiers.
>
> > > > This issue goes away if we relax 1NF and allow attributes that are
> > > > lists or relations. This gives us nested structures. (Nested relations
> > > > are not particularly controversial around here.)
>
> > > When you say "nested relations", is it your intention to nest at every
> > > node as one recurses down the AST?
>
> > The short, low-fidelity answer is yes.
>
> > My "intention" specifically is to show that this is one way that
> > relations can deal with ASTs: just like functional languages do.
> > I do not hold an opinion on which way of handling ASTs is best;
> > certainly the FP approach seems to have many merits, though.
>
> > > Can you please explain how an expression like
>
> > > (5 + 6) * x
>
> > > would be represented? I can imagine for example that the top node
> > > will be stored in a relation R as follows
>
> > > R: { (0,R0), (1,R1), (2,R2) }
>
> > > where 0,1,2 are used to index the elements of a list where the 0th
> > > element R0 is an RVA that represents the type of node (in this case a
> > > multiplicative node) and the subsequent elements are the child nodes
> > > which are also RVAs (R1 represents "5+6" and R2 represents "x").
>
> > This leaves out enough details that I can't tell whether it matches
> > what I was thinking. I don't see why 0 is in the same list that 1 and
> > 2 are in, for example.
>
> > I'd just say, a la SML:
>
> > product(sum(5, 6), x)
>
> So is the name of a relation part of its value?

No. The "product" and "sum" above are what are usually called "constructors." The above is a value. You can put it in a relation if you want. (Probably I introduced the confusion by talking about union types immediately after talking about RVAs.)

> That doesn't seem
> quite right to me. I think of relvars as having names, whereas
> relation values do not. [aside: maybe it's best to say relvars don't
> have names either, and instead we say that a dbvar is a map from
> relation-name to relvar].

I'm not thrilled with the term "relvar" although I can see why it was coined. We bind names to values; in data management and in imperative programming we allow names to be bound to new values over time; in functional programming (and in math, one could argue) we do not. This issue has nothing to do with the types of the values themselves.

> If an unnamed relation value is used to represent a node of an AST
> then it needs to encode its type, such as product,sum,num or var. I
> was thinking more along the lines of
>
> (product (sum (num 5) (num 6)) (var x))

Sure. I almost wrote that, but I left out the constructor names for num (or int or whatever.) (I wouldn't consider "var" a constructor; it's something different.)

> > > An alternative approach (which would look even more like LISP) would
> > > be to use head-tail lists.
>
> > > I agree this avoids the need to introduce meaningless identifiers.
>
> > Well, that was my main point anyway, so good enough.
>
> > > I guess it comes down to a matter of definition of what the relational
> > > approach means. It strikes me as counter to the intentions behind RM/
> > > RA to use a distinct relation value for every node as we recurse down
> > > the AST.
>
> > I didn't know the RM even *had* intentions.
>
> Maybe "characterisation" is a better word.
>
> > > If you're happy to call that approach "relational" then I
> > > won't disagree. I will however ask the question of whether much of
> > > the theory and practice discussed on cdt is at all relevant. For
> > > example what parts of the RA are useful? Where is there any set based
> > > processing?
>
> > It sorta seems here like you're thinking of the RM as an ideology
> > rather
> > than as a framework for modeling, constraining, and operating on data.
> > I'm not aware of any dictum that says we can't use other techniques
> > within the same relational system. We use arithmetic in SQL, right?
> > And it's not defined in terms of set operations. (At least not in the
> > RM.)
> > We can still write functions, and use recursion, can't we? You don't
> > mean to exclude that, do you?
>
> If "relational" encompasses too wide a spectrum then the word starts
> to become rather meaningless. It's not meant to have any bearing on
> how one solves problems - I'm not suggesting that RVAs are evil in any
> sense!

Okay. But let me be explicit: in my mind the "ideal" programming language encompasses different "pure" computing models into a single hybrid computing model. That model will necessarily include relations, (because they are so awesome) and it will also necessarily include other things, such as first class functions, because they too are awesome.

> > > How does the typing system work with such an approach? ie how do you
> > > constrain the allowed RVAs?
>
> > What's wrong with the usual way?
>
> I don't know what the usual way is. There are different ways of
> defining relation types. For example one can define a relation type
> that encompasses all possible relations, or one can parameterise on
> the types of the attributes, or one can parameterise on both the names
> and the type of the attributes. In addition one could allow the
> system to define distinct types even though they have the same names
> and type of attributes. This seems to relate to Jim's recent post on
> POOD.
Well, my own thoughts in this area are evolving and may sound pretty weird at this point. Roughly: there are two "kinds" of values: relations and atoms. The type of a relation is exclusively the set of its attribute names. Atoms don't have types, however we may specify a set of atoms in such a way as looks very much like what is usually called a type. (Thus: we have the set of natural numbers, and we have the set of integers, and N is a subset of Z. And we can tell of an atom whether it belong to N, etc.) In addition we have constraints, and the constraint language is Turing complete (and hence not decidable.) We can specify a constraint on a variable, in which case it is a runtime construct, or we may specify a constraint on immutable constructs, in which case they should be checked at compile time. Under this scheme much of the machinery usually associated with types gets taken over by constraints.

This is just my own particular way of thinking; it doesn't represent anything more than that.

> > > Would there be some concept of
> > > inheritance for relation types (by analogy to an OO implementation of
> > > an AST that uses a Node base class)?
>
> > I'm inclined to say NOOOOOOO!!!!!</vader>
>
> I wonder then how all the relevant constraints would be specified.

Given a Turing complete constraint language, any constraint that can be computed can be expressed. I don't think you can do any better than that. :-) Of course, this approach brings its own issues ...

Marshall Received on Thu Jan 10 2008 - 03:44:02 CET

Original text of this message