Re: what are keys and surrogates?

From: David BL <davidbl_at_iinet.net.au>
Date: Tue, 8 Jan 2008 18:17:59 -0800 (PST)
Message-ID: <d1328096-476c-4beb-a17b-59c856341266_at_e6g2000prf.googlegroups.com>


On Jan 8, 9:26 am, JOG <j..._at_cs.nott.ac.uk> wrote:

> We have some bits of paper with numbers written on (in pencil). We are
> storing info about these bits of paper in a database using the schema:
> {paperID, Value}. The key, PaperID, is a unique database generated
> hidden surrogate.
>
> We have an enumeration:
> { (paperID:1, Value:X), (paperID:2, Value:Y), (paperID:3, Value:Z) }
>
> Someone comes to you the DB admin, with 3 bits of paper and says, ok
> the boss has changed the values on some of the bits of paper. What I
> have here is one bit of paper with an A on, one with a B and and one
> with a Z. Please update the database accordingly.

Yes, in this example you can't afford to have hidden identifiers.

In November I started a thread called "RM and abstract syntax trees" in which I suggested that RM was poorly suited for the representation, never mind manipulation of ASTs. The problem is that the only reasonable way to represent the structure is to introduce meaningless node identifiers. An important principle in the RM is that a tuple should always represent a proposition that makes sense to the problem domain expert, so I agree with you that we cannot allow hidden identifiers. Therefore the RM cannot help but expose the node identifiers for all to see.

Prolog is able to parse string expressions entered by users and build and manipulate ASTs. Behind the scenes, nested functor expressions are usually implemented using dynamically allocated nodes wired up with pointers. However, as far as the programmer is concerned, only unification is available to decompose the structure. It seems to me that Prolog has a more general support for data modeling than available in the RM, to the extent that nested functor expressions avoid the need to introduce lots of meaningless identifiers.

You could well argue that the hidden identifiers are not implicit to the data - because after all an expression has a string representation over an appropriate grammar, and in that form there is no concept of nodes and node identifiers. However, the tree representation is often more suitable for manipulation by a computer and I can imagine applications with very large amounts of such persistent data, and we could hardly expect all that data to persist as strings and need to be parsed every time it is brought into memory.

I'm interested in 3D scene graphs that support complex interactions between the parts, requiring nested expressions in the data model. RM seems to have limitations for such an application. To some extent I like to think of a program as data (like a Lisp programmer), and I think there will be exciting applications in the future that blur that distinction. The humble spreadsheet is a revealing example of how one can go beyond tables of raw values and have formulae in spreadsheet cells to represent information (at some higher level one might say). Of course I'm not suggesting that spreadsheets are generally suitable for data management! Received on Wed Jan 09 2008 - 03:17:59 CET

Original text of this message