Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com>
Date: 1 Mar 2003 02:50:49 -0800
Message-ID: <e9d83568.0303010250.5de40fc8_at_posting.google.com>


jan.hidders_at_REMOVE.THIS.ua.ac.be (Jan Hidders) wrote in message news:<3e5ccb0a.0_at_news.ruca.ua.ac.be>...
> Lauri Pietarinen wrote:
> >> Yes. But let me stress that I am not a fan of SQL; as a bag calculus it
> >> is also less elegant then it could have been, to put it mildly.
> >
> >Can you point to some reference with what you think would be a decent query
> >language? What would be the "ideal" db-language in your view?
>
> Nothing fancy. Basically SQL with set semantics, no NULLs (except as
> explicit values in domains), two-valued logic, orthogonal in the sense that
> everywhere where I can use a table I can also put a query, and to top it off
> an EXISTS and FORALL that look more like their counterparts in tuple
> calculus (so EXISTS (SELECT * FROM a WHERE ...) can be written as for
> example EXISTS a : ...). After that I'd probably want some limited form of
> recursion such as inflationary fixpoints or something.

So actually your views are quite close to Date's, it seems! What you are describing sounds like Tutorial-D, to me. Does "SQL with set semantics" mean what I think it means?

>
> >> >- optimizing efforts will be concentrated on the most used features,
> >> > that being queries without disticnt.
> >>
> >> If those queries are the ones that are the most used, then that is what
> >> the users apparently want, and so indeed those should be optimized the
> >> most.
> >
> >So the user is always right!
>
> No, not always. Users sometimes contradict themselves or have conflicting
> requirements.
>
> >Tell me why any user would really want duplicates in the result of his/her
> >query. Just give me _one_ example applicable to a real world situation in
> >an actual program.
>
> I don't think users "really want" duplicates (although I'm not sure that
> this is really a well-defined notion). If duplicate elimination means a big
> performance hit and the user doesn't really need this then in some sense you
> could say that her or she "really wants" to be able to determine if the
> duplcates are eliminated or not.

That, I think, is the main reason SQL does not, by default, eliminate duplicates (i.e. performance). It might have been a reasonable design
consideration in the mid 70's, but perhaps short sighted. Duplicate elimination is in most cases unnecessary. I like the term "Duplicate removal avoidance" as in http://www.mcjones.org/System_R/bs12.html.

>
> >Taking the table 'P', having, say 1M rows, what is the meaning
> >of the following query:
> >
> >select city
> > from P
>
> Roughly the same as:
>
> SELECT DISTINCT city, COUNT(*)
> FROM P
> GROUP BY city;

The query

SELECT CITY
  FROM P answers the question
"what cities do parts come from?"

instead

   SELECT DISTINCT city, COUNT(*)
   FROM P
   GROUP BY city;

answers the question how many
parts come from each city.

Surely a different question!

>
> >> The optimization of imperative programs is very different from
> >> optimizing declarative query languages, so such an analogy is
> >> virtually meaningless to people who actually know a thing or two about
> >> optimization.
> >
> >The point here is that the more stuff you have to consider the
> >harder it is to do a good job. "Less is More"!
>
> That's just a general rule of thumb. The question at hand is if in this
> particular case it is really as "more" as Date wants us to believe. I am
> skeptical about that and I believe I already gave some arguments why that
> is. For example, suppose we would "expose" bags to the user, but under the
> hood we would translate them to true relations in the obvious way and do the
> query optimization on those. Could you indicate how much "more" that would
> cause?

I am not sure I understand your statement. The point (at least one of them) is that SQL gives the users lots of options that he/she does not need and causes extra work for implementors because they have to take care of special cases etc...

Date's position, as far as I can understand is that the relational model is a logical one, and *under* *the* *covers* the system can do whatever it wants, use bags etc.. as long as the result is correct. Just as, in the end of the day, a program without GOTO's is converted into machine language with pointers, GOTO's and all that stuff.

Date does not (in my view) oppose bags or anything else, as long as they are hidden from the user!!

>
> >> Btw. What makes you think that programs with GOTOs are hard to optimize?
> >> Dijkstra did not mention that as an argument in his famous letter.
> >
> >Am I wrong? Are programs with GOTOs as easy to optimise as programs
> >not using GOTO's, assuming that we don't have any restrictions?
>
> That is a very complex question. How exactly would you define
> optimizability? And do you take into account that programs with GOTOs are
> often smaller than their counterparts with only WHILEs?

That means that the human programmer has done the optimizing himself.

I am sure that for _any_ given SQL query a (very) competent human could make a program
doing the disk reads him/herself that executes the query faster than a DBMS. The other question is, does it make sense. Is there perhaps a more productive use of his/her time? And of course a cost based optimizer
in a DBMS take into account cardinalities, distributions etc. that change over time. So the poor programmer would have to go back and constantly reoptimize the query as time goes by.

>
> Perhaps we should ask this question to somebody who has written a classic on
> compiler design, like, ... Jeffrey Ullman?
>
> >[...] I have the feeling that optimising over a cleaner language would have
> >been easier in the first place. This involves also giving the user less
> >"choice".
>
> That could be a good argument if you had an example of an optimization that
> was not found by the optimizer and that would have been much easier to spot
> if it had not been working with bags.

Well,

SELECT P#
  FROM P,SP
  WHERE P.P# = SP.P# cannot be modified to

SELECT P#
   FROM P
   WHERE NOT EXISTS
    ( SELECT *

        FROM SP
        WHERE P.P# = SP.P# )

unless the user adds 'DISTICT'.

That _is_ an extra burden for the user
AND an extra burden for the DBMS-builders.

Here is a quote (by Hugh Darwen) from the link mentioned above:

"Because BS12 spurned duplicate rows, it was obliged to make "duplicate removal avoidance" a strong feature of its optimiser--something that SQL implementations are only now beginning to catch up with (in SQL terms, this means not always firing up the duplicate elimination mechanism just because the user said DISTINCT--you might be able instead to prove that there cannot be any duplicates)."

regards,
Lauri Pietarinen Received on Sat Mar 01 2003 - 11:50:49 CET

Original text of this message