Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Jan Hidders <jan.hidders_at_REMOVE.THIS.ua.ac.be>
Date: 2 Mar 2003 14:58:04 +0100
Message-ID: <3e620dec.0_at_news.ruca.ua.ac.be>


Lauri Pietarinen wrote:
>
>>>So actually your views are quite close to Date's, it seems! What you are
>>>describing sounds like Tutorial-D, to me. Does "SQL with set semantics"
>>>mean what I think it means?
>>
>>Yes, it does.
>>
>Maybe I am missing something here, but what is it _exactly_ you
>are criticizing Date of? Is it the issue of equating SQL with
>bag algebra?

Yes, that too, but mainly that he overestimates the complexity that is added to the optimizer when bags are exposed to the user.

>>>The query
>>>
>>>SELECT CITY
>>> FROM P
>>>
>>>answers the question
>>>"what cities do parts come from?"
>>
>>Yes, it does, but under SQL semantics it also contains the information how
>>many parts come from each city.
>>
>In a sence, yes, but then you have to have a program to tidy up the result
>or count the duplicates. Why would you want to do that in a program?

If that is what the user asked then that is what he or she will get. If they wanted something else, they would have asked a different query.

>>>I am not sure I understand your statement. The point (at least one of
>>>them) is that SQL gives the users lots of options that he/she does not
>>>need and causes extra work for implementors because they have to take
>>>care of special cases etc...
>>
>>I know, and my question was "how much extra work?". Just waving your hands
>>a little and muttering something about "extra special cases that have to
>>be considered" is not very convincing, especially since there is an easy
>>mapping from bags to sets. As a consequence you could use this mapping to
>>map the bags internally to sets and the answer to "how much extra work"
>>the internal optimizer would have to do would be "almost nothing".
>
>OK, the point here is that we let the user shoot himself in the foot.

So I keep hearing, but the arguments supporting this claim look rather weak to me, to say the least.  

>And, regarding "Distinct" support in DBMS's, you suggested that bad support
>resulted from the fact that the keyword was not used.

When did I say that?

> Well, it was not used because users were advised against using it: "Don't
>use 'DISTINCT', it will result in a sort!". So it was kind of a vicious
>circle.

That's nonsense. DISTINCT is usually not used because you don't need to, because for example the SELECT clause contains a candidate key. It is very rare in my experience that users deliberately accept duplicates.

>>>That means that the human programmer has done the optimizing himself.
>>
>>Of course! That is what programmers are supposed to do when they program in
>>an imperative programming language.
>>
>Well, clarity and maintainability top the list in my opinion. And
>productivity.

This is irrelevant for the question wheter GOTO code is more optimizable than WHILE code. I'm still waiting for (1) a precise definition of what that statement exactly means and (2) your arguments for the support of this claim.

>>SQL is meant as a declarative language, and just for that reason alone
>>already the optimization issue is very much different from most
>>programming languages. That's another reason why it is such a bad analogy.
>>Just look up a few introductory books on query optimization in databases
>>and on code optimization for compilers; the issues and techniques used are
>>really really very different.
>>
>Surely they are different kinds of optimisations, but the aim is the same:
>Raise the level of abstraction. Let the machine do more of the "thinking".

What makes you think that while-loops raise the level of abstraction?

>>>>>[...] I have the feeling that optimising over a cleaner language would
>>>>>have been easier in the first place. This involves also giving the
>>>>>user less "choice".
>>>>>
>>>>That could be a good argument if you had an example of an optimization
>>>>that was not found by the optimizer and that would have been much easier
>>>>to spot if it had not been working with bags.
>>>>
>>>Well,
>>>
>>>SELECT P#
>>> FROM P,SP
>>> WHERE P.P# = SP.P#
>>>
>>>cannot be modified to
>>>
>>>SELECT P#
>>> FROM P
>>> WHERE NOT EXISTS
>>> ( SELECT *
>>> FROM SP
>>> WHERE P.P# = SP.P# )
>>>
>>>unless the user adds 'DISTICT'.
>>
>>.. and removes the NOT. :-)
>>
>yes, even that is required of the user ;-)
>
>> But why do you think an equivalent optimization
>>is not possible in a bag algebra?
>>
>Let's say for sake of argument that P has 1 row, and SP has 1000000 rows.

That assumption makes the example extremely unrealistic because if that were the case it is very unlikely that the user would accept that many duplicates in the answer.

Kind regards,

  • Jan Hidders

PS. Is there a specific reason why your newsreader removes the attribution

    in you postings? That could sometimes lead to confusion about who said     what and some people find it impolite to quote their text in a posting Received on Sun Mar 02 2003 - 14:58:04 CET

Original text of this message