Re: RM and abstract syntax trees

From: David Cressey <cressey73_at_verizon.net>
Date: Sun, 11 Nov 2007 22:35:21 GMT
Message-ID: <JsLZi.257$WM.97_at_trndny05>


"David BL" <davidbl_at_iinet.net.au> wrote in message news:1194798913.080431.70540_at_i38g2000prf.googlegroups.com...
> On Nov 11, 1:08 am, "David Cressey" <cresse..._at_verizon.net> wrote:
> > "David BL" <davi..._at_iinet.net.au> wrote in message
>
> > > The mathematician in me axiomatizes the concept of pointer in terms of
> > > its abstract properties, which (ignoring null pointers) are
> >
> > > 1) the existence of an associated address space, which is just a
> > > set of objects
> > > 2) a bijection between objects in the address space and pointer
> > > values
> > > In one direction this bijection is "address of" and in the
> > > other
> > > it is "dereference"
> > > 3) the ability to compare pointer values
> >
> > > This encompass physical address spaces, virtual address spaces, C++
> > > smart pointers, persistent OIDs and pointer swizzling etc.
> >
> > > It also encompass node identifiers of an AST if we regard the DB as
> > > defining an address space of AST nodes, and an appropriate select
> > > query represents a dereference that logically binds to precisely one
> > > node of the AST.
> >
> > > My perspective conflicts with your statement above.
> >
> > First, the term "pointer" originates in computer science, not
mathematics.
> > This is unlike the term "relation" which originates in mathematics and
is
> > transferred to the world of computer science, where it presumably
retains
> > all the properties that made it important from a mathematical
perspective.
>
> Yes the term "pointer" originates in computer science and its meaning
> is not at all precise. Wouldn't it be nice if the software
> development discipline was generally treated as a branch of
> mathematics, and there were many standardised axiomatic definitions to
> give it the proper foundations.
>
> > Second, your analysis overlooks the difference between address based
> > addressing and content based addressing. If you retrieve a row by
> > specifying an OID to be matched, you are still engaging in content based
> > addressing. If you specify a pointer, there is no law that says that
the
> > object pointed to is going to contain a copy of the pointer you used for
> > access.
> >
> > In most pointer based systems, the pointer is not included in the
contents,
> > being seen as redundant to its location.
> >
> > The distinction between content based addressing and pointer based
> > addressing is fundamental to the comparison of databases built on the
> > relational model and databases built on the graph model.
>
> This is an interesting and important distinction. I assume you are
> suggesting pointers by definition never use content based addressing.
> However I regard the following as a counter example...
>

I am claiming that the word "pointer" as used in computer science and going back at least as far as the invention of Lisp in the 1950s, by definition refers to address based addressing. That any use of the word "pointer" that conflicts with this use is technically incorrect, and may be useful only to the extent that it suggests an analogy to the purpose for which pointers are used.

An analogy is only an analogy. It isn't a definition.

> In the literature on pointer swizzling, persistent OIDs are classified
> as being of two different types : physical or logical. This is
> exactly your distinction above with different terminology.

Fien. I don't propose to criticise you for the terminology you've been exposed to, nor do I intend to accept criticism for my use of terminology that I've been exposed to. A trend towards more uniform terminology would be welcome, but we won't make any progress in that direction by finger pointing.

> A physical
> OID directly specifies the location of the object on disk. For
> example, in the commercial product ObjectStore, the low 32 bits of the
> OID represents a seek position within a "cluster".

A physical OID corresponds to what I call a "pointer".

> By contrast a
> logical OID uses an indexing data structure allowing for objects to be
> relocated. A Log Structured Store for example will typically use
> this latter approach.

The only indexing structures I'm directly familiar with are the ones in SQL DBMS products like Oracle RDBMS, or in indexed file systems like RMS (a facility within VMS). In the former case, the index is a data structure that serves to map what you are calling a "logical OID" to what I am calling a "pointer". The pointer that the DBMS retrieves from an index serves for direct lookup of the thing pointed to (a table row or table row fragment, in this case)

> Despite such an important and useful
> distinction, it is only relevant to the underlying physical
> implementation of the Persistent Object Store (POS). It has no impact
> whatsoever on the fact that clients of a POS like to think of OIDs as
> persistent pointers between objects.

I tend to use the acronym "POS" with a different expansion, but we'll let that pass for now.

It may or may not have any impact on what the clients like to think. It does however have a great impact on the delay time needed to access the thing referenced by the pointer. It also has a great impact on the consequences of shuffling data within the address space. (What you are calling "swizzling" might be the same as what I'm calling "shuffling".

If there are physical pointers beyond the control of the DBMS, then shuffling data can result in the invalidation of such pointers with no possibility of raising an exception at the appropriate point in time.

So we can at least table the discussion on terminology by agreeing that "pointer", in strict usage, should ONLY refer to physical OIDs. Logical OIDs could be compared to "surrogate keys", but a discussion might ensue about whether dependence on them adds to complexity with adding to expressive power or flexibility.

That's the issue I'd really like to grapple with. It seems to me that the people who model the connection between items of data based on logical OIDs create a monster that has all the defects of graph based databases, coupled with the access overhead associated with content based addressing. In other words, the worst of both worlds. If that's true, that would be a fairly convincing argument for finding a different way of modeling data and designing databases.

> Note further that this is not
> merely conceptual - these products actually provide explicit pointer
> syntax and semantics for the application programmer - perhaps even to
> the extent of using hardware support for page faulting when
> dereferencing virtual memory pointers to perform pointer swizzling.
>
> > It is possible, by suitable abstraction and by poor design choices, to
> > create a database that offers none of the advantages claimed for the
> > relational model, while appearing to conform to the relational model
> > superficially. Sad to say, thousands of such databases have been
> > constructed over the years, and have led many uneducated database
newbies
> > to believe that the relational model doesn't really offer the advantages
> > that its proponents cliam.
>
> My programming background is image processing and systems programming,
> and I've rarely seen a need for the RM in my work. 18 months ago I
> could easily have been one of those newbies that doesn't realise that
> RM is fundamental to data management.
>
> Now I think the RM is great, and have been studying it as a hobby.
>

Good!

>
Received on Sun Nov 11 2007 - 23:35:21 CET

Original text of this message