# Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Paul <pbrazier_at_cosmos-uk.co.uk>
Date: 26 Feb 2003 07:17:13 -0800

Steve Kass <skass_at_drew.edu> wrote in message news:<b3ga4e\$4nh\$1_at_slb0.atl.mindspring.net>...
> >Yes, it has. But how are you going to express in your algebra that you are
> >not going to eliminate duplicates immediately after a projection?
> >
> Do you mean "are going to eliminate", meaning duplicate a_i ?
> Given { (<101, 'abc'>, 3), (<102, 'abc'>, 2)}, a projection onto
> the second column naively gives { (<'abc'>, 3), (<'abc'>, 2)}.
> Since we might not want multiple representations of a bag of
> 5 'abc's, we can aggregate after set-like projection or make
> aggregation a part of bag projection, analogous to a
> non-bag algebra's need to eliminate duplicates.

Well my initial definition of a bag explicitly stated that in the set {(a,2),(b,3),(c,1),...} etc, a,b,c were distinct. So by this definition
{(<'abc'>, 3), (<'abc'>, 2)} is not a bag, and the projection would have to be {(<'abc'>, 5)}. In fact all we've done really is shifted the elimination of duplicates elsewhere.

Maybe you could amend the definition slightly so that the bag [a,b,b,b,c,c] is the equivalence class of sets like {(a,1),(b,3),(c,2)} etc.
So for example [a,a] would be the equivalence class that contains both {(a,2)} and {(a,1),(a,1)}

So I think I'm beginning to see what we are getting at now: if you have a series of procedures to carry out (I'm thinking at the physical level internal to the DMBS) do you do: step 1
eliminate duplicates
step 2
eliminate duplicates
step 3
eliminate duplicates
step 4
eliminate duplicates

or might it be equivalent and more efficient to leave the deduplication to the end i.e:
step 1
step 2
step 3
step 4
eliminate duplicates

So what you need is an algebra to tell you that the two processes will give you the same result.

Paul. Received on Wed Feb 26 2003 - 16:17:13 CET

Original text of this message