Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Jan Hidders <jan.hidders_at_REMOVE.THIS.ua.ac.be>
Date: 1 Mar 2003 13:38:38 +0100
Message-ID: <3e60a9ce.0_at_news.ruca.ua.ac.be>


Lauri Pietarinen wrote:
>jan.hidders_at_REMOVE.THIS.ua.ac.be (Jan Hidders) wrote in message
>news:<3e5ccb0a.0_at_news.ruca.ua.ac.be>...
>> Lauri Pietarinen wrote:
>> >> Yes. But let me stress that I am not a fan of SQL; as a bag calculus
>> >> it is also less elegant then it could have been, to put it mildly.
>> >
>> >Can you point to some reference with what you think would be a decent
>> >query language? What would be the "ideal" db-language in your view?
>>
>> Nothing fancy. Basically SQL with set semantics, no NULLs (except as
>> explicit values in domains), two-valued logic, orthogonal in the sense
>> that everywhere where I can use a table I can also put a query, and to
>> top it off an EXISTS and FORALL that look more like their counterparts in
>> tuple calculus (so EXISTS (SELECT * FROM a WHERE ...) can be written as
>> for example EXISTS a : ...). After that I'd probably want some limited
>> form of recursion such as inflationary fixpoints or something.
>
>So actually your views are quite close to Date's, it seems! What
>you are describing sounds like Tutorial-D, to me.
>Does "SQL with set semantics" mean what I think it means?

Yes, it does.

>> >Taking the table 'P', having, say 1M rows, what is the meaning
>> >of the following query:
>> >
>> >select city
>> > from P
>>
>> Roughly the same as:
>>
>> SELECT DISTINCT city, COUNT(*)
>> FROM P
>> GROUP BY city;
>
>The query
>
>SELECT CITY
> FROM P
>
>answers the question
>"what cities do parts come from?"

Yes, it does, but under SQL semantics it also contains the information how many parts come from each city.

>> >> The optimization of imperative programs is very different from
>> >> optimizing declarative query languages, so such an analogy is

>> >> virtually meaningless to people who actually know a thing or two about
>> >> optimization.
>> >
>> >The point here is that the more stuff you have to consider the harder it
>> >is to do a good job. "Less is More"!
>>
>> That's just a general rule of thumb. The question at hand is if in this
>> particular case it is really as "more" as Date wants us to believe. I am
>> skeptical about that and I believe I already gave some arguments why that
>> is. For example, suppose we would "expose" bags to the user, but under
>> the hood we would translate them to true relations in the obvious way and
>> do the query optimization on those. Could you indicate how much "more"
>> that would cause?
>
>I am not sure I understand your statement. The point (at least one of
>them) is that SQL gives the users lots of options that he/she does not need
>and causes extra work for implementors because they have to take care of
>special cases etc...

I know, and my question was "how much extra work?". Just waving your hands a little and muttering something about "extra special cases that have to be considered" is not very convincing, especially since there is an easy mapping from bags to sets. As a consequence you could use this mapping to map the bags internally to sets and the answer to "how much extra work" the internal optimizer would have to do would be "almost nothing".

>> >> Btw. What makes you think that programs with GOTOs are hard to
>> >> optimize? Dijkstra did not mention that as an argument in his famous
>> >> letter.
>> >
>> >Am I wrong? Are programs with GOTOs as easy to optimise as programs
>> >not using GOTO's, assuming that we don't have any restrictions?
>>
>> That is a very complex question. How exactly would you define
>> optimizability? And do you take into account that programs with GOTOs are
>> often smaller than their counterparts with only WHILEs?
>
>That means that the human programmer has done the optimizing himself.

Of course! That is what programmers are supposed to do when they program in an imperative programming language. It doesn't make sense to put a lot of effort in writing a compiler that still generates efficient code when the programmer deliberately writes inefficient programs.

>I am sure that for _any_ given SQL query a (very) competent human could
>make a program doing the disk reads him/herself that executes the query
>faster than a DBMS.

SQL is meant as a declarative language, and just for that reason alone already the optimization issue is very much different from most programming languages. That's another reason why it is such a bad analogy. Just look up a few introductory books on query optimization in databases and on code optimization for compilers; the issues and techniques used are really really very different.

>> >[...] I have the feeling that optimising over a cleaner language would
>> >have been easier in the first place. This involves also giving the user
>> >less "choice".
>>
>> That could be a good argument if you had an example of an optimization
>> that was not found by the optimizer and that would have been much easier
>> to spot if it had not been working with bags.
>
>Well,
>
>SELECT P#
> FROM P,SP
> WHERE P.P# = SP.P#
>
>cannot be modified to
>
>SELECT P#
> FROM P
> WHERE NOT EXISTS
> ( SELECT *
> FROM SP
> WHERE P.P# = SP.P# )
>
>unless the user adds 'DISTICT'.

.. and removes the NOT. :-) But why do you think an equivalent optimization is not possible in a bag algebra?

  • Jan Hidders
Received on Sat Mar 01 2003 - 13:38:38 CET

Original text of this message