Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Bob Badour <bbadour_at_golden.net>
Date: Thu, 13 Feb 2003 21:18:42 -0500
Message-ID: <RHY2a.1418$0T3.169406528_at_mantis.golden.net>


"Mikito Harakiri" <mikharakiri_at_ywho.com> wrote in message news:jcX2a.16$O%2.38_at_news.oracle.com...
> "Bob Badour" <bbadour_at_golden.net> wrote in message
> news:4uW2a.1409$0G3.168661119_at_mantis.golden.net...
> > "Mikito Harakiri" <mikharakiri_at_ywho.com> wrote in message
> > news:irT2a.10$O%2.40_at_news.oracle.com...
> > > Not
> > > for aggregates:
> > >
> > > PROJECTION*AGGREGATE != AGGREGATE*PROJECTION
> > >
> > > for example
> > >
> > > select distinct S from (
> > > select distinct SUM(SAL) from emp
> > > )
> > >
> > > is not the same as
> > >
> > > select distinct SUM(SAL) from (
> > > select distinct SAL from emp
> > > )
> > >
> > > where i'm using SQL with the "distinct" keyword merely as a surrogate
> for
> > > true relational syntax ("distinct" is redundant after aggregate
> operation,
> > > of course).
> >
> > Why should the sum of all salaries equal the sum of distinct salaries?

>

> In set model they don't. In the bag model, however, we rewrite the above
> identity without the "distinct" keyword:
>

> select S from (
> select SUM(SAL) S from emp
> )
>

> is the same as
>

> select SUM(SAL) from (
> select SAL from emp
> )
>

> Once again, this means that PROJECTION and AGGREGATE commute in the bag
> model.

With all due respect, Mikito,

The differences above are not about the model. They are about the questions asked. In your latter example, you are comparing the sum of all salaries with the sum of all salaries. In your former example, you compared the sum of all salaries with the sum of distinct salaries.

The sensible optimization for the first query in each comparison is to simply drop the outer select operation as a no-op.

select S from (

   select SUM(SAL) S from emp
)

is logically equivalent to

select SUM(SAL) S from emp

regardless whether we are dealing with sets or multisets.

I wonder why someone would request:

select SUM(SAL) from (

   select SAL from emp
)

when one wants the sum of all salaries. It seems like a lot of trouble to go to when one could just as easily write:

select SUM(SAL) from emp

which more naturally reflects a request for the sum of all employee salaries.

Of course, in a set based system, distinct wouldn't even be a keyword and everyone would know that:

select SAL from emp

means a relation with the distinct salaries in emp. Users would immediately realise that:

select SUM(SAL) from (

   select SAL from emp
)

means the sum of the distinct salaries in emp.

You might suggest that the inner query in each case above is a view. In that case, I would point out that most views are created by advanced users who should know better than to create a view of distinct salaries when users need a view of all salaries and vice versa--and the advanced users will generally know how to write either view. Received on Fri Feb 14 2003 - 03:18:42 CET

Original text of this message