Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com>
Date: Sun, 02 Mar 2003 20:45:54 +0200
Message-ID: <3E625162.4060203_at_atbusiness.com>


Jan Hidders wrote:

>>And, regarding "Distinct" support in DBMS's, you suggested that bad support
>>resulted from the fact that the keyword was not used.
>>
>>
>
>When did I say that?
>

[snipped from previous posting...]

>- optimizing efforts will be concentrated on the most used features, that
> being queries without disticnt.

If those queries are the ones that are the most used, then that is what the
users apparently want, and so indeed those should be optimized the most.

[end snip]

>>>>That means that the human programmer has done the optimizing himself.
>>>>
>>>>
>>>Of course! That is what programmers are supposed to do when they program in
>>>an imperative programming language.
>>>
>>>
>>>
>>Well, clarity and maintainability top the list in my opinion. And
>>productivity.
>>
>>
>
>This is irrelevant for the question wheter GOTO code is more optimizable
>than WHILE code. I'm still waiting for (1) a precise definition of what that
>statement exactly means and (2) your arguments for the support of this
>claim.
>
OK, I admit. I am no expert on (optimizing) compilers and I (perhaps falsly) assumed that it
was harder to optimize languages that used GOTO's. Maybe the case is not so.

I will offer one last example to try to clarify what I mean.

Take the standard S, SP and P -tables. I want to hide the joins from the user so I create a view:

CREATE VIEW V_S_SP_P
  ( S#, SNAME, QTY, P#, PNAME)
AS

SELECT S.S#, S.SNAME, SP.QTY, P.P#, P.PNAME   FROM S, SP, P
   WHERE S.S# = SP.S# AND
             SP.P# = P.P#;

There are lots of interesting questions that can be easily answered using this view.

The question "give me all products that are supplied by somebody" is easy to state:

SELECT P#, PNAME
  FROM V_S_SP_P; Oops! Got the same P#'s several times.
Got to add DISTINCT.

Ok, let's try

SELECT DISTINCT P#, PNAME
  FROM V_S_SP_P; Correct answer, but why is performance
so bad?

Oh, the DBMS materialises the whole join (even reading S, which is not needed at all) and then groups by P# and PNAME!

(Just tried it on SQLServer2000).

Why can't it transform this query into
SELECT P#, PNAME
  FROM P
  WHERE EXISTS

     (SELECT * FROM SP
         WHERE P.P# = SP.P#)?

Is this a useful view? It sure is!
Could I actually provide such a view
to my end users or programmers?

No I can't, because they have to

  1. remember to use DISTINCT
  2. worry about bad performance

So they end up coding the joins and
exists themselves.

My (non scientific) claim is that because users and DBMS-builders are used to thinking in terms of (SQL-)bags, they don't even come to consider (e.g.) the usefulnes of hiding complexity in views, at least not to the extent that it is possible.

So while you might be right in theory, i.e. that supporting two modes at once (sets and bags), is no burden for the DBMS optimiser, you are certainly wrong in practice! If it is so easy, why did they (the SQL-vendors) not support it a long time ago?

Date is trying to take the relational model to it's full potential and I would claim that the SQL-mode of thinking is hindering us from achieving this. And this (in my view) is ultimately Date's criticism of
Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom, DATABASE SYSTEM IMPLEMENTATION. kind regards,
Lauri Pietarinen Received on Sun Mar 02 2003 - 19:45:54 CET

Original text of this message