Re: The Fact of relational algebra (was Re: Clean Object Class Design -- What is it?)

From: Bob Badour <bbadour_at_golden.net>
Date: 17 Oct 2001 23:09:52 -0700
Message-ID: <cd3b3cf.0110172209.6320dafd_at_posting.google.com>


Jim Melton <Jim.Melton_at_Technologist.com> wrote in message news:<Jim.Melton-419E09.23354415102001_at_news1.denver1.co.home.com>...
> In article <766y7.3062$my7.96943209_at_radon.golden.net>,
> "Bob Badour" <bbadour_at_golden.net> wrote:
>
> > > Irrelevant. For one who is keen on physical independence, you refuse to grant
> > > the difference between logical and physical when it comes to object references.
> > > When you say that a reference (logical construct) is a pointer (physical
> > > construct) you are redefining terms.
> >
> > Neither a reference nor a pointer is a logical construct in the manner you
> > used the term. Both are physical constructs. A relation is the logical
> > equivalent of a conceptual reference.
>
> According to you, a relation is an unencapsulated set of tuples. What
> does that have to do with a reference at all?

Conceptually, a reference establishes a relationship among two or more things. Each tuple in a relation establishes relationships among all of its attributes.

In SQL99 and OO, "references" are physical references. They are inherently unidirectional, asymmetric and implementation dependent. In the relational model, relations are logical references. They are multidirectional, symmetric and implementation independent.

> A reference is a "handle" by which information may be retrieved.

I disagree. A "handle" is an artifact used for referencing.

Interestingly, I found the following definitions online:

reference n. [See Refer.]
1. The act of referring, or the state of being referred; as, reference to a chart for quidance.
2. That which refers to something; a specific direction of the attention; as, a reference in a text-book. 3. Relation; regard; respect.

handle n.

1. A part that is designed to be held or operated with the hand. 
2. An opportunity or a means for achieving a purpose. 
3. Slang. A person's name. 
4. That of which use is made; the instrument for effecting a purpose;
a tool.
5. A magic cookie, often in the form of a numeric index into some array somewhere, through which you can manipulate an object like a file or window. The form `file handle' is especially common. 6. A pointer to a pointer to dynamically-allocated memory; the extra level of indirection allows on-the-fly memory compaction (to cut down on fragmentation) or aging out of unused resources, with minimal impact on the (possibly multiple) parts of the larger program containing references to the allocated memory.

> Note
> that at the top of this article, there is a "reference" to the article
> you posted previously. It is NOT a pointer (there are no physical
> addresses -- such would be impossible given the different computers and
> news servers involved).

It is a pointer. It points from your message to my prior message just as this message points to your message.

> Yet, from this reference it is possible to
> *navigate* to your article to which I am responding.

It is possible to navigage any pointer. The real problem is the requirement for navigation. These messages have other properties by which we can find messages. For instance, I can ask for all messages posted by "Jim Melton". Or, using a join, I can ask for all replies to messages posted by "Jim Melton". Or, using a transitive closure, I can ask for all subthreads initiated by "Jim Melton".

In relational terms, the values in the headers of these messages serve as candidate keys for the messages and thus are useful as handles for the purpose of referencing the messages.

> The notion of a "handle to previously stored information" is a logical
> concept.

If the handle is a logical artifact, then it is logical. If it is a physical artifact, then it is physical.

> It may not belong in the relational model, but it is still a
> logical concept.

What do you think "logical identity" is?

> The physical implementation can take any number of
> forms (including the ascii string above).

The relational model requires that logical identity exist independent of physical implementation or physical representation.

> > > > As you can see above, in Java, a reference is a pointer to a pointer.
> > > > Additional levels of indirection do not change the pointer nature.
> > >
> > > But they do. It is the difference between logical construct and a physical
> > > implementation.
>
> > > It is the difference between a table and a view.
> >
> > I agree. Both a table and a view are logical relations. Both a reference and
> > a pointer are physical constructs. In terms of use, no important difference
> > exists between a table and a view. In terms of use, no important difference
> > exists between a reference and a pointer.
>
> Hmmm. I would expect a table to be closer to a physical construct they
> way you draw the line. A view allows a table (or set of tables) to be
> transformed to look like a different "table" (but it is NOT a table). It
> is in fact indirection over the table.

A view is a named, derived relation. A snapshot is a stored, named, derived relation. A table or base relation is a named relation variable that forms part of the logical schema. As such, all derived relations ultimately derive their values from tables or base relations (ignoring relation literals, for simplicity).

Due to the requirement for view updatability, no important difference exists between a table and a view. In fact, all casual users, including application programs, should exclusively interact with views.

Due to the requirement for physical independence, a table need be no closer to any physical construct than a view is to a table. One can conceive of databases where some view more closely resembles the physical storage of the data than any of the relations from which it derives its value. In fact, I would expect this to occur quite frequently.

> > > > Since
> > > > everything is a pointer (and there are no pointer manipulation operations),
> > > > Smalltalk
> > > > does not have the exposures of, say, C or C++ to pointer abuse.
> > >
> > > Again, you miss the point. Since they are not exposed, they do not exist.
> >
> > But they are exposed. As it says above, everything is a pointer.
>
> As it says above, pointers are not exposed. "There are no pointer
> manipulation operations. Pointers do not have the exposures of ... C++."

They are exposed. If two named variables point to the same instance, a message sent to one named variable that changes the state of the instance changes the observable state of both named variables.

> I don't mind you being dogmatic in your view, but would you please at
> least read first?

I did read. I subcribe to no dogma. I suspect you actually do mind, which would explain the ad hominem.

> > > What
> > > the compiler does "under the covers" is irrelevant.
> >
> > Smalltalk exposes pointers to users. All named variables are pointers and
> > this is not kept under the covers.
>
> Since there are no pointer operations, the fact that named variables are
> pointers are not exposed to users.

The fact that users cannot subtract one pointer from another or add an integer to a pointer does not equate to the users having no exposure. All named variable references dereference a pointer. All methods operate directly on pointers and indirectly on object instances.

> Since the *language implementation*
> uses pointers, all named variables are not bounded in scope like, say,
> automatic C++ variables or local FORTRAN variables.

Actually, since all named variables contain pointers, the values of named local variables are bounded in scope just like automatic C++ variables. The dynamically allocated object variables to which the named variables point are no different from dynamically allocated C++ or Java variables. Smalltalk and Java require garbage collection, however.

> garbage collection. This is an important
> feature of the Smalltalk language, that users do not have to worry about
> scope (lifetime) of variables and differentiates it from other languages
> (like C++ and FORTRAN).

Garbage collection is not unique to Smalltalk and does not alter the nature of pointers.

> It does not expose pointers to users. It exposes
> named variabled to users.

Since all named variables are pointers, Smalltalk exposes pointers to users.

> > > > In C++, of course, the only differences between references and pointers are
> > > > syntactic.
> > >
> > > Posh. There are semantic differences between pointers and references in C++.
> >
> > And those semantic differences would be...?
>
> A reference can only be bound at initialization time. It cannot be
> "re-seated" to refer to any other object than the one it was initially
> bound to.

In C++, then, a reference is a pointer constrained to a single value. A reference has the semantics of a pointer with the syntax of an object variable.

> A pointer may point to one object in one statement and a
> totally different object in the next.

If the only non-syntactic difference between a reference and a pointer is a simple constraint, then I must point out that a reference is a constrained pointer.

> A reference is also described as an "alias".

A pointer is equally an alias as anyone who has debugged programmes with pointer aliasing issues knows.

> It is used exactly the same
> as the referent, with no (user discernable) indirection.

It has the syntax of the referent, but users can discern the indirection:

void main(void)
{
int i = 0;
int &ir = i;

ir = 2;
printf("%d\n",i);
}

> A pointer
> explicitly introduces indirection.

As does a reference. One must explicitly dereference a pointer, however, and I suspect that is what you meant to say.

> > > But Adrian was not describing a programming language pointer. He was describing
> > > a reference to a persistent object (which is not intrinsic to *any* programming
> > > language. So it is a new concept.
> >
> > The combination of a relation name, column name and candidate key value is
> > also a reference to a persistent object variable. The concept is old.
>
> And there is no programming language construct to refer to the
> combination of relation name, column name and candidate key value
> atomicaly.

I realise we need to improve our programming languages.

> Do you know anything about distributed programming?

Of course.

> Using CORBA, for
> example, a client holds an "object reference" to a remote object that is
> incarnated by (lives in) some server.

A client holds a pointer to a remote object. Your point?

> The client invokes operations on
> the reference that it has. This reference is certainly NOT a pointer, as
> the referent may be on a different computer on a different continent.

While the reference is not a memory address on the client computer, it most definitely is a pointer. It just happens to point to a location beyond the scope of the memory addresses of the client computer.

> In
> fact, the referent may not actually be resident in memory in ANY
> computer at the time the operation is invoked.

But it does reside somewhere.

> Also, invoking the
> operation on the reference causes a network connection to be established
> (possibly being routed through several intermediaries), operation
> arguments to be marshalled into the network byte stream, and all manner
> of other things to happen on the remote end.

These are all physical issues, which is why I say that your "reference" is a physical artifact equivalent to any physical pointer.

> A reference can be quite a bit more than a pointer.

A pointer can be quite a bit more than a memory address, but a reference is a pointer.

> > The
> > physical construct Adrian describes is old too: it's called a pointer.
>
> If you don't have the experience to see the difference, that's OK.

If you don't have the experience to see the equivalence, that's OK.

> But
> you might want to ease up on the dogma a bit.

I don't subscribe to any dogma. You might want to ease up a bit on yours though.

> > > It has nothing to do with the logical concept.
> >
> > I believe you are mixing models. What you call a reference is a physical
> > equivalent to the reference concept. A pointer is also a physical equivalent
> > to the reference concept and equates to what you call reference. The logical
> > equivalent to the reference concept is a relation.
>
> OK. I'll define my terms so you know what I mean.
>
> Physical == implementation. A physical pointer points to an address of
> memory or a sector of a disk, etc.

A physical pointer points to some physical location. I can agree with this.

> Logical == abstraction of physical (hiding implementation details).

I quibble with the above equation. You are working the definitions backward. You must start with conceptual.

Conceptual deals with the concepts. Logical deals with the logic. Physical deals with the physics.

From the standard vocabulary for databases:

17.02.01 conceptual level

All aspects dealing with the interpretation and manipulation of information describing a particular universe of discourse or entity world in an information system.

17.03.05 conceptual schema

A consistent collection of sentences expressing the necessary propositions that hold for a universe of discourse.

17.03.07 logical level

A level of consideration at which all aspects deal with a database and its architecture, consistent with a conceptual schema and the corresponding information base, but abstract from its physical implementation.

17.03.08 physical level

A level of consideration at which all aspects deal with the physical representation of data structures and with mapping them on corresponding storage organizations and their access operations in a data processing system.

> Logical allows for different physical implementations without affecting
> the operations defined or available.
>
> Now, certainly a reference (as any computer concept) has one or more
> implementations.

The relational model allows for multiple implementations of its logical identifiers -- even within a single database. Under the covers, the dbms could implement logical identifiers solely as physical pointers. Under the covers, the dbms could implement some of them as pointers and some of them as relation, column name, value triplets.

> One such implementation may actually be a crude pointer
> as you like to imply. But it is not necessarily so.

No matter how sophisticated, no matter how many levels of indirection and no matter how complex the decoding algorithm, a pointer is a pointer. I have implied no crudeness nor have I implied any particular implementation.

I have implied that what you call references are unidirectional, asymmetric and implementation dependent.

> Other
> implementations are possible without changing the *concept* represented
> by a reference.

How many different implementations of "reference" does any given SQL99 database implement? How many different implementations of "reference" does any give OOPL implement?

> I will grant you that a reference is used to navigate directly to the
> referent. You assert that navigation == pointer. I disagree.

What does one navigate if not a pointer? How does one use a reference to "navigate directly" if a reference is not physical? If your dogma axiomatically dictates that a physical reference is not a pointer even in the face of all contrary evidence, nobody will never convince you.

> > > Nope. I'm just saying that EITHER the object "lives" in your database OR it
> > > "lives" in your spreadsheet. It cannot "live" in both places.
> >
> > Why not? I live in both Canada and the USA. Do I change my identity when I
> > cross the border? If the lifetime of the object entails multiple locations,
> > your previous statement implies that identity should not change when the
> > location changes. The spreadsheet is just another location.
>
> At any instant in time, you occupy exactly one spot in space. You "live"
> there.

I agree. My identity does not change when I move from one place to another.

> Sure your object can live in both the spreadsheet and the
> database. Just not simultaneously. That would be like you being in
> Calgary and New York at the same time.

My question is not whether the object variable can simultaneously "live" in two places. My question is: How does it maintain its identity as it moves from place to place? How does it maintain the same identity in the spreadsheet that it has in the database?

> No matter where it lives, it maintains identity. In order for this to be
> true, it must live in exactly one place (at any instant in time).

No matter where I move, I maintain my identity. In order for this to be true, my identity must be independent of location. How does an object maintain its identity when it moves from a database to a spreadsheet?

> > > Otherwise, you
> > > would have two objects of equivalent state (at some snapshot in time).
> >
> > Do you realise how fundamentally and how thoroughly you have just impeached
> > OID or "reference"?
>
> Nope.

That does not surprise me.

> > You are saying that OID provides an identity for variables and not for data.
> > The variable in the database is different from the variable in the
> > spreadsheet, and I would agree. However, both variables describe a single
> > real-world entity with unique identity.
>
> Not quite. This gets back to the "data copying" sub-thread we had a
> while ago. I think your mindset is that the values from the database
> copied into the spreadsheet have the same "identity" as the values in
> the database. To see how this is not so, copy values into a spreadsheet
> and then change values in the database. Since the two representations no
> longer have the same values, they cannot have the same identity.

Different values have different identity. However, if both the database and the spreadsheet claim to make statements about the same entity, they need to identify that entity. Stability is a criterion for selecting primary keys for this very reason.

For instance, a database and a spreadsheet might make different statements regarding my hair colour, my height or my weight, but both can use the same logical identifier to identify me.

In my original example, someone might copy data about me to a spreadsheet, print it out and fax it to someone else. The other person might observe that my hair colour changed, update the database, annotate the fax with the new information and fax back the annotated copy. The first person then might or might not update the spreadsheet.

Regardless, the spreadsheet and the database contain data about the same entity representing the same information.

> If, however, your spreadsheet had an "active link" (reference) into the
> database, changes in the database values would automatically be
> reflected in the spreadsheet. Object identity is preserved (the object
> lives in the database) and the spreadsheet uses the reference.
>
> By the way, Excel already works like this. I've had spreadsheets
> e-mailed to me with references to external data that did not make the
> trip. The links are then broken.

This is a problem with physical pointers. Logical identifiers are much better for human users.

> > You are saying that OID does not provide any facility to identify the actual
> > data or real world entity.
>
> Real-world entities don't often lend themselves to computer
> representation.

I would disagree; real-world entities often lend themselves to computer representation. Regardless, users must identify information and computers must identify data.

> An OID is *an* implementation of a reference. And please notice that
> there is no singular implementation of an OID.

Every ODBMS or SQL database implements at most one reference. I agree that no singular implementation of these pointers exists, which makes these physical pointers non-portable.

> But I would still maintain that a database reference would refer to the
> actual data (in the database).

Does it refer to the data in any way that a human user can understand? Is it useful outside the scope of any given database? Can two databases share the same reference to identify the same conceptual entity? Can two different vendors' databases share the same reference to identify the same conceptual entity? Can other applications use the same reference to identify the same conceptual entity?

> > The value of a candidate key identifies both. When one combines it with a
> > relation variable and the name of a column, it identifies an object variable
> > in the database.
>
> A database reference (as Adrian originally described) embodies and
> encapsulates this combination as a single entity that can be manipulated
> by a programming language. There doesn't *have* to be any disagreement.

The question is: Does one want to cripple the database? Does one prefer to empower the programming language?

A database reference is a physical pointer with all of the disadvantages for database management that a physical pointer entails.

> > A spreadsheet identifies its variables using a coordinate
> > scheme.
>
> Yes, and if the row/column headings are not printed out, you cannot
> identify that variable from a printout.

Since the human user is much more interested in identifying conceptual entities than in identifying variables, this should not present any practical problems.

> > Even in the spreadsheet or on a fax, the candidate key value
> > identifies the same entity in the real world.
>
> Well, actually the fax captures state of some entity at some point in
> time. There is nothing intrinsic in the spreadsheet printout to tell you
> what the candidate key value is.

I agree. Some user must choose an appropriate logical identifier when constructing the spreadsheet. If the database already provides a convenient logical identifier, most users will naturally choose it. If the database does not maintain logical identity or if the database does not present identifiers in human-consumable form, the database does not support the user's needs very well.

> And the "real world" is useful for object modelling, but models are
> necessarily approximations of the "real world", so the fax may or may
> not identify anything in the "real world".

One would then question the utility of the fax in the first place.

> > > Identifying the "correct" variable in "the database" is always problematic.
> >
> > Not in a relational dbms. The user observes the values in the spreadsheet or
> > on the fax and uses those values to identify variables within the database.
>
> Really. I supposed you've never receive e-mail for some "other" Bob
> Badour.

Never. I know another Bob Badour lives somewhere near Kingston, ON but I have never received any of his email. Do you receive e-mail for some "other" Jim Melton?

> Or had a problem with some vendor who mis-entered your social
> security number (or whatever they use in Canada).

A user is much less likely to mis-enter my SSN than an OID. When the database doesn't give any usable representation of identity, the database will almost certainly contain multiple variables representing me.

> Or received a catalog
> with your name misspelled.

These are not problems with identifying variables, per se. They are general data entry problems. OID and physical pointers do nothing to address any of them.

> I will grant you that in theory all relations are unique. But from a
> spreadsheet fax or a corporate e-mail directory or a bulk-mail database
> sometimes the wrong values are associated with the wrong "real world"
> entity. Values are subject to error. References are explicit in their
> referent.

Please explain how physical pointers overcome data entry errors at the source.

> > > > Can you believe that people accuse ME of platonic idealism???
> > >
> > > No, but you've sure got a bee in your bonnet about OIDs and human consumption.
> >
> > Bee or no bee, humans consume identity information.
>
> Humans consume values. They attempt to derive (sometimes incorrectly)
> identity from them. If your only concern is queries and spreadsheet
> displays, you have little use for references. There are other uses for
> databases.

Ultimately, human users consume the information represented by the data managed by the dbms.

> > > How does a relational database "expose objects directly"?
> >
> > As variables in relation variables uniquely identified by a combination of
> > relation variable name, candidate key value and column name. Or as values in
> > relations uniquely identified by a combination of relation, candidate key
> > value and column name.
> >
> >
> > > Much as you might wish otherwise, the current state of the art is that not all
> > > things can be accomplished via set operations internal to the RDBMS. For those
> > > cases, database values (since there is typically NOT a tight language binding
> > > to allow direct access to variables -- except in object databases) must be
> > > "exposed" to a programming language. How is the encapsulation of the object
> > > maintained then?
> >
> > One must construct an application programming language object variable based
> > on the observable properties of an object value from the database. To update
> > the database, one must change the value or state of a database object
> > variable based on the observable properties of an object value in the
> > application.
>
> How is it possible to construct a programming language object variable
> based only on the observable properties of an object value from the
> database. In order to have the same representation as the database
> value, doesn't the programming language object need access to the
> (hidden) internal state of the database object?

Physical independence requires the DBMS to support multiple applications using multiple different representations for the same domain value. Identical representation between applications and the dbms has never been a goal of sound database management.

Even in the case where a database uses the same representation of a value as a given application today, tomorrow the dbms representation might change and then the application will use a different representation than the dbms.

> > How is encapsulation violated?
>
> By this exposing of hidden state of database objects.

A straw man. The programming language neither needs nor wants to access the internal physical representation of the object values.

I know that most ODBMS vendors confuse this issue, but non-relational ODBMS vendors know little about sound database management. Received on Thu Oct 18 2001 - 08:09:52 CEST

Original text of this message