Re: Extending my question. Was: The relational model and relationalalgebra - why did SQL become the industry standard?

From: Bob Badour <>
Date: Thu, 13 Feb 2003 18:24:15 -0500
Message-ID: <g8W2a.1405$>

"Paul Vernon" <paul.vernon_at_ukk.ibmm.comm> wrote in message news:b2g807$su6$
> "Anton Versteeg" <> wrote in message
> >
> >
> > Lauri Pietarinen wrote:
> >
> > > I don't think that anybody is suggesting that intermediate results
> > > to remove duplicates. It's
> > > the end result that counts. E.g. in the following code fragment
> > >
> > >
> >
> > Well, I think that even for end results duplicates can be useful.
> If you define 'end results' as not being the result of a relational query,
> the result of some extra non-relational transformation of a relation then
> but please don't try to argue that a database is best supported by a 'bag'
> algebra (or an array algebra, or a network algebra,...) rather than a
> relational algebra.
> > It is the difference between the set theory and query results in
> No. It is not a case of theory not matching practice (which, if true,
tells us
> that a theory is broken), rather it is the lack of a clear understanding
> where relational algebra ends and something else (like array variable and
> values) picks up.
> Of course theorists are sometimes guilty of over stating the scope of a
> theory, but much more usually it is the practitioners that do the try to
> a given theory outside of it's bounds.
> > For instance: a set doesn't have an order but it would be impossible to
> present
> > results to a user of our database if we cannot order the end result.
> We live in a four dimensional world, and there is order everywhere:
> left/right, before/after. Because of this we indeed cannot ever see or
hear or
> feel or somehow sense a Relation per se. The best we can do is to
> transform an relation into say an array, then present such an array using
> visual display unit, like a computer screen. Things like arrays, trees,
> etc are all slightly closer to being able to be sensed than relations
> (although I'm not sure one can really see an array either, all we see is
> light...)
> > To give an example of the use of duplicates:
> > Suupose we have a table that holds text (letters for instance).
> > We would probably have a line number field and a text field.
> > To improve readability we will have several occurrences of blank lines.
> > If we then select the text column ordered by the line number, we will
> > (meaningful) duplicates in the end result.
> OK, but just be clear such an 'end result' is not a relation and so any
> to help produce such 'end results' is not part of a relational algebra.
> A logically clean syntax would be to 'cast' a relational query result to
> an array, then further 'select' certain array elements for display in some
> specified(or unspecified) order.
> Unfortunately SQL does not make such a clean separation. Regardless of it
> being a bag algebra (of sorts), it lets ORDER BY operate on it's bags
> any explicit casting to a orderable non-scalar type such as an array.
> Regards
> Paul Vernon
> Business Intelligence, IBM Global Services

Anton, Lauri and Paul,

Your recent exchange seems to be talking around a couple of issues, and I just want to try to express the issues directly for clarity--more to help me and other readers see and understand the shortcuts you may have taken in your reasoning than for any other purpose. I realise that you and many of the frequent contributors to this newsgroup will find it quite remedial.

Sets have no implicit order. When representing a set, one may order the members of the set in any way without changing the thing represented. In other words, all orders of representation are equally valid to describe a set.

Some operations on relations require an explicit order: quota query, min, max etc. Using physical order in the representation of relations can sometimes speed the evaluation of unordered operations like joins.

Presentation implies some tangible physical representation, which by definition lies outside the scope of logical data models. Specifically, 'presentation' usually means external physical representation as opposed to internal physical representation.

Logical representations relying on an implicit order are generally undesirable for data management. Implicit order in the logical representation will generally require a fixed order in the physical representation thereby disallowing many otherwise allowable storage and manipulation options, which for some operations will include the most efficient options. Because implicit order is irrelevant to sets, database management systems that logically represent data as sets may use any convenient physical order to improve performance when evaluating operations.

Logical arrays, bags, trees and lists have no more tangible existence than relations or sets. They are all abstract. For human reception, we must encode any of these abstractions on some physical medium for transmission: ink on paper, fluorescing phosphorous on glass, scrapings in wax or sand, bumps on paper, banks of lights or LEDs etc.

Even the physical devices composing a computer system must encode the abstractions above onto some physical medium: isolated charged regions on a semi-conductor device, magnetic alignment of small regions of ferrous materials, holes in a piece of paper etc.

Many equivalent physical representations or encodings will exist for each of the logical abstractions, because operations rather than physical encodings define the abstractions. One really draws from the same universe of physical representations when encoding any of the above logical abstractions. However, those abstractions relying on implicit order must either draw from a smaller subset of that universe or must extend the managed data with some additonal encoding of the implicit order.

In some cases, a particular physical encoding will have specific features or properties that map directly to the specific operations defining a logical abstraction. For instance, encodings of physical memory addresses as 'link pointers' may map directly to the Next and Previous operations for a list abstraction or physically encoding sequential elements at sequential memory addresses with a fixed offset may map directly to array referencing operations.

When most people discuss certain logical abstractions such as arrays or lists, they usually assume physical encodings drawn from a very small subset of the possible encodings--specifically they assume those physical encodings with features that map to the most frequently used operations for the logical abstraction. For example, most people would assume some encoding of link pointers when talking about lists, and most people would assume some sequential encoding--possibly at some level of indirection--when talking about arrays.

Duplicates in a logical abstraction can only have meaning if the logical abstraction relies on implicit order or some other implicit information. Otherwise, users cannot reference the individual instances even to count them. IMO, that's an important concept to understand.

Those who argue one only needs to add a COUNT operation fail to recognize that the resulting abstraction relies on implicit order because the COUNT operation would depend on the implicit order of the internal physical storage. Received on Fri Feb 14 2003 - 00:24:15 CET

Original text of this message