Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com>
Date: Sun, 02 Mar 2003 09:08:56 +0200
Message-ID: <3E61AE08.8070707_at_atbusiness.com>


>
>
>>So actually your views are quite close to Date's, it seems! What
>>you are describing sounds like Tutorial-D, to me.
>>Does "SQL with set semantics" mean what I think it means?
>>
>>
>
>Yes, it does.
>
Maybe I am missing something here, but what is it _exactly_ you are criticizing Date of? Is it the issue of equating SQL with bag algebra?

>>The query
>>
>>SELECT CITY
>> FROM P
>>
>>answers the question
>>"what cities do parts come from?"
>>
>>
>
>Yes, it does, but under SQL semantics it also contains the information how
>many parts come from each city.
>
In a sence, yes, but then you have to have a program to tidy up the result or
count the duplicates. Why would you want to do that in a program? Say somebody
asked you what countries you traveled to last year and you answered: "US, Sweden, US, Germany, US, Germany". That is clearly an intermediate answer, the final one being "US, Germany, Sweden"

>>I am not sure I understand your statement. The point (at least one of
>>them) is that SQL gives the users lots of options that he/she does not need
>>and causes extra work for implementors because they have to take care of
>>special cases etc...
>>
>>
>
>I know, and my question was "how much extra work?". Just waving your hands a
>little and muttering something about "extra special cases that have to be
>considered" is not very convincing, especially since there is an easy
>mapping from bags to sets. As a consequence you could use this mapping to
>map the bags internally to sets and the answer to "how much extra work" the
>internal optimizer would have to do would be "almost nothing".
>
OK, the point here is that we let the user shoot himself in the foot.  And, regarding "Distinct"
support in DBMS's, you suggested that bad support resulted from the fact that the keyword
was not used. Well, it was not used because users were advised against using it:
"Don't use 'DISTINCT', it will result in a sort!". So it was kind of a vicious circle.

>>That means that the human programmer has done the optimizing himself.
>>
>>
>
>Of course! That is what programmers are supposed to do when they program in
>an imperative programming language.
>
Well, clarity and maintainability top the list in my opinion. And productivity.
Why do people code with Java when they could get faster programs with assembly language?

I suppose you have heard of the Laws of optimisation?

1st Law of optimisation - Don't optimise. 2nd Law of optimisation - Don't optimise yet

>It doesn't make sense to put a lot of
>effort in writing a compiler that still generates efficient code when the
>programmer deliberately writes inefficient programs.
>
see above.

BTW, there exists an elegant construct that avoids GOTO's, but without using
structured programming. See: http://www.fortranlib.com/gotoless.htm  for details.

>SQL is meant as a declarative language, and just for that reason alone
>already the optimization issue is very much different from most programming
>languages. That's another reason why it is such a bad analogy. Just look up
>a few introductory books on query optimization in databases and on code
>optimization for compilers; the issues and techniques used are really really
>very different.
>
Surely they are different kinds of optimisations, but the aim is the same: Raise the level of abstraction. Let the machine do more of the "thinking".

>>>>[...] I have the feeling that optimising over a cleaner language would
>>>>have been easier in the first place. This involves also giving the user
>>>>less "choice".
>>>>
>>>>
>>>That could be a good argument if you had an example of an optimization
>>>that was not found by the optimizer and that would have been much easier
>>>to spot if it had not been working with bags.
>>>
>>>
>>Well,
>>
>>SELECT P#
>> FROM P,SP
>> WHERE P.P# = SP.P#
>>
>>cannot be modified to
>>
>>SELECT P#
>> FROM P
>> WHERE NOT EXISTS
>> ( SELECT *
>> FROM SP
>> WHERE P.P# = SP.P# )
>>
>>unless the user adds 'DISTICT'.
>>
>>
>
>.. and removes the NOT. :-)
>

yes, even that is required of the user ;-)

> But why do you think an equivalent optimization
>is not possible in a bag algebra?
>
Let's say for sake of argument that P has 1 row, and SP has 1000000 rows. How would you, using bag algebra, optimise the query

SELECT P#
 FROM P,SP
 WHERE P.P# = SP.P# best regards,
Lauri Pietarinen Received on Sun Mar 02 2003 - 08:08:56 CET

Original text of this message