# Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

Date: Thu, 27 Feb 2003 00:36:54 -0500

Message-ID: <b3k823$kn5$1_at_slb9.atl.mindspring.net>

Thanks Paul,

The equivalence class suggestion is mathematically sound, but since it goes in the opposite direction from manageability of physical implementation, I'd try to avoid it. Mathematicians don't mind equivalence classes, represented or not by canonical elements, but implementations may be easiest if they only represent these classes with canonical elements.

It's looking like practical implementations for many algebras, sets, bags, or even those an imperative language is using when implementing fractions or bigInts of the array-of-digit kind, may involve non-canonical values at some point in the operations. For sets, bags may come up between projection and elimination of duplicates. For bags, non-canonical bag representations may come up in the same situation. For fractions, non-reduced-form representations may come up after arithmetic but before reduction to lowest terms, or with bigints, "digits" greater than 9 before "carrying".

Some day I should delve into the voluminous literature about this (and about bags in particular), but my first guess is that it is possible to use only canonical representations, and not equivalence classes, at the expense of straightforward implemenations of operations, and there are probably always three routes one could choose from:

Strict: unique representations of elements, and all operations implemented so that no invalid representation occurs, even in the implementation details of an externally compliant atomic operation.

Relaxed: unique representations, but accommodation of noncanonical equivalence class members inside atomic operations.

Loose: full recognition of equivalence classes, accommodation of arbitrary class representatives.

So my guess as to the answer to your question is that I suspect you can provide a mathematical foundation for any of these approaches.

One does ends up in circles - to represent bags without using bags, one may choose to use bags behind the scene (just as one may choose when representing sets) And there are probably further distinctions between what choice is made for the physical implementation and what choice is made if one wants to specify (as SQL does, for better or for worse) the meaning of an expression as the result of a canonical process described at the subatomic level.

SK

Paul wrote:

>Steve Kass <skass_at_drew.edu> wrote in message news:<b3ga4e$4nh$1@slb0.atl.mindspring.net>...

*>
**>
**>>>Yes, it has. But how are you going to express in your algebra that you are
**>>>not going to eliminate duplicates immediately after a projection?
**>>>
**>>>
**>>>
**>>Do you mean "are going to eliminate", meaning duplicate a_i ?
**>>Given { (<101, 'abc'>, 3), (<102, 'abc'>, 2)}, a projection onto
**>>the second column naively gives { (<'abc'>, 3), (<'abc'>, 2)}.
**>>Since we might not want multiple representations of a bag of
**>>5 'abc's, we can aggregate after set-like projection or make
**>>aggregation a part of bag projection, analogous to a
**>>non-bag algebra's need to eliminate duplicates.
**>>
**>>
**>
**>Well my initial definition of a bag explicitly stated that in the set
**>{(a,2),(b,3),(c,1),...} etc, a,b,c were distinct. So by this
**>definition
**>{(<'abc'>, 3), (<'abc'>, 2)} is not a bag, and the projection would
**>have to be {(<'abc'>, 5)}. In fact all we've done really is shifted
**>the elimination of duplicates elsewhere.
**>
**>Maybe you could amend the definition slightly so that the bag
**>[a,b,b,b,c,c] is the equivalence class of sets like
**>{(a,1),(b,3),(c,2)} etc.
**>So for example [a,a] would be the equivalence class that contains both
**>{(a,2)} and {(a,1),(a,1)}
**>
**>So I think I'm beginning to see what we are getting at now:
**>if you have a series of procedures to carry out (I'm thinking at the
**>physical level internal to the DMBS) do you do:
**>step 1
**>eliminate duplicates
**>step 2
**>eliminate duplicates
**>step 3
**>eliminate duplicates
**>step 4
**>eliminate duplicates
**>
**>or might it be equivalent and more efficient to leave the
**>deduplication to the end i.e:
**>step 1
**>step 2
**>step 3
**>step 4
**>eliminate duplicates
**>
**>So what you need is an algebra to tell you that the two processes will
**>give you the same result.
**>
**>Paul.
**>
**>
*

Received on Thu Feb 27 2003 - 06:36:54 CET