Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Bob Badour <bbadour_at_golden.net>
Date: Sun, 9 Mar 2003 17:28:13 -0500
Message-ID: <JGPaa.414$DZ2.63328396_at_mantis.golden.net>


"Jan Hidders" <jan.hidders_at_REMOVE.THIS.ua.ac.be> wrote in message news:3e620dec.0_at_news.ruca.ua.ac.be...
>
> Lauri Pietarinen wrote:
> >

> >>>So actually your views are quite close to Date's, it seems! What you
are
> >>>describing sounds like Tutorial-D, to me. Does "SQL with set semantics"
> >>>mean what I think it means?
> >>
> >>Yes, it does.
> >>
> >Maybe I am missing something here, but what is it _exactly_ you
> >are criticizing Date of? Is it the issue of equating SQL with
> >bag algebra?
>
> Yes, that too, but mainly that he overestimates the complexity that is
added
> to the optimizer when bags are exposed to the user.

The optimizer may not be any more complex, but it is nowhere near as effective either. And in general, the results with duplicates are not meaningful to users. In Date's example of 12 "equivalent" queries that yield 9 different results, what do the duplicates mean to anyone?

> >>>The query
> >>>
> >>>SELECT CITY
> >>> FROM P
> >>>
> >>>answers the question
> >>>"what cities do parts come from?"
> >>
> >>Yes, it does, but under SQL semantics it also contains the information
how
> >>many parts come from each city.
> >>
> >In a sence, yes, but then you have to have a program to tidy up the
result
> >or count the duplicates. Why would you want to do that in a program?
>
> If that is what the user asked then that is what he or she will get. If
they
> wanted something else, they would have asked a different query.

Are you not now requiring that all users have expert level knowledge of the dbms internals? That seems unreasonable.

When I look at your statement above, I think: "Well that totally invalidates the argument that duplicate removal costs too much in performance." When the user gets an answer they didn't want, they will just have to run the query a second time.

> > Well, it was not used because users were advised against using it:
"Don't
> >use 'DISTINCT', it will result in a sort!". So it was kind of a vicious
> >circle.
>
> That's nonsense. DISTINCT is usually not used because you don't need to,
> because for example the SELECT clause contains a candidate key. It is very
> rare in my experience that users deliberately accept duplicates.

Jan, with all due respect, I cannot count how many times I have heard alleged database experts tell users to "Never use DISTINCT." If the result is already distinct, the keyword should have no cost. If the duplicates are meaningless and unacceptable to users, why should they have to deal with them?

> >>SQL is meant as a declarative language, and just for that reason alone
> >>already the optimization issue is very much different from most
> >>programming languages. That's another reason why it is such a bad
analogy.
> >>Just look up a few introductory books on query optimization in databases
> >>and on code optimization for compilers; the issues and techniques used
are
> >>really really very different.
> >>
> >Surely they are different kinds of optimisations, but the aim is the
same:
> >Raise the level of abstraction. Let the machine do more of the
"thinking".
>
> What makes you think that while-loops raise the level of abstraction?

What makes you think they don't?

> >Let's say for sake of argument that P has 1 row, and SP has 1000000 rows.
>
> That assumption makes the example extremely unrealistic because if that
were
> the case it is very unlikely that the user would accept that many
duplicates
> in the answer.

Why should they accept any duplicates? Received on Sun Mar 09 2003 - 23:28:13 CET

Original text of this message