Re: Is mysql a RDBMS ?

From: Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com>
Date: Sun, 31 Aug 2003 03:06:54 +0300
Message-ID: <bire7g$md8$1_at_nyytiset.pp.htv.fi>


Hi Heikki,

Heikki Tuuri wrote:

>>>>>The example is not a real-world query.
>>>>>
>>>>>
>>>>What's not real about it? The fact that it is a self-join? That
>>>>
>>>>
>happens
>
>
>>>all
>>>
>>>The fact that it is not from some real application.
>>>
>>>
>>Self-join is not from some real application?!? Don't be stupid. I use
>>self-joins all the time, and so does everybody else. Of course, no real
>>application I write will allow duplicate rows, because I am neither stupid
>>enough nor ignorant enough to allow duplicates in the first place. But you
>>argue for duplicates, so you must accept them as real-world in your world.
>>
>>Since self-join comes from real applications and you argue for the value
>>
>>
>of
>
>
>>duplicates, I fail to see how the query is not a real-world query in your
>>universe of useful queries.
>>
>>
>
>of course self-joins occur in real-world applications. But we are still
>waiting for you to provide a real-world example where the join size is
>bloated with duplicate rows.
>
>

I sympatize with you wanting to see a real world example demonstrating the usefulness of
working with relations instead of bags regarding optimisation and it took me a while to find
an (in my view) convincing example.

I posted this previously and have pasted it here for easy reference:

http://groups.google.com/groups?q=g:thl3922164456d&dq=&hl=en&lr=&ie=UTF-8&selm=3E625162.4060203%40atbusiness.com&rnum=23

Take the standard S, SP and P -tables. I want to hide the joins from the user so I create a view:

CREATE VIEW V_S_SP_P
  ( S#, SNAME, QTY, P#, PNAME)
AS
SELECT S.S#, S.SNAME, SP.QTY, P.P#, P.PNAME   FROM S, SP, P
   WHERE S.S# = SP.S# AND
             SP.P# = P.P#;

There are lots of interesting questions that can be easily answered using this view.

The question "give me all products that are supplied by somebody" is easy to state:
SELECT P#, PNAME
  FROM V_S_SP_P; Oops! Got the same P#'s several times.
Got to add DISTINCT.

Ok, let's try
SELECT DISTINCT P#, PNAME
  FROM V_S_SP_P; Correct answer, but why is performance
so bad?

Oh, the DBMS materialises the whole join (even reading S, which is not needed at all) and then groups by P# and PNAME!

(Just tried it on SQLServer2000).

Why can't it transform this query into
SELECT P#, PNAME
  FROM P
  WHERE EXISTS

     (SELECT * FROM SP
         WHERE P.P# = SP.P#)?

Is this a useful view? It sure is!

Could I actually provide such a view
to my end users or programmers?

No I can't, because they have to

  1. remember to use DISTINCT
  2. worry about bad performance So they end up coding the joins and exists themselves.

My (non scientific) claim is that because users and DBMS-builders are used to thinking in terms of (SQL-)bags, they don't even come to consider (e.g.) the usefulnes of hiding complexity in views, at least not to the extent that it is possible.

Note that a common complaint about SQL is that we "have to do all these stupid joins". Now I don't find joins that hard to deal with but had we better optimisers and "non-bag" query languages we (e.g. DBA) could hide the joins in views and expose these to programmers and end users much more than we do today hence RAISING THE LEVEL OF ABSTRACTION. I am reminded of the old story of the drunken man who seeks his lost car-keys under the lamp-post because the light is better there.

regards,
Lauri Pietarinen Received on Sun Aug 31 2003 - 02:06:54 CEST

Original text of this message