Re: Is mysql a RDBMS ?

From: Lauri Pietarinen <lauri.pietarinen_at_atbusiness.com>
Date: Sun, 31 Aug 2003 23:41:55 +0300
Message-ID: <bitmj7$oq7$1_at_nyytiset.pp.htv.fi>


Heikki Tuuri wrote:

>Lauri,
>
>thank you! We get input to this discussion from a person who really knows
>practical database applications.
>
>"Lauri Pietarinen" <lauri.pietarinen_at_atbusiness.com> kirjoitti viestissä
>news:bire7g$md8$1_at_nyytiset.pp.htv.fi...
>
>
>>Take the standard S, SP and P -tables. I want to hide the joins from
>>the user so I create a view:
>>
>>CREATE VIEW V_S_SP_P
>> ( S#, SNAME, QTY, P#, PNAME)
>>AS
>>SELECT S.S#, S.SNAME, SP.QTY, P.P#, P.PNAME
>> FROM S, SP, P
>> WHERE S.S# = SP.S# AND
>> SP.P# = P.P#;
>>
>>
>>

[snipped]

>Hmm... there is at least one additional reason not to provide such a view to
>programmers. Since it is denormalized, they might try to update the
>attributes of a supplier for just one row, receiving an obscure error
>message about the non-updateteability of the view. Or, they could try to
>insert into the view.
>
>Another reason is that it is not an outer join. There may be parts which
>currently have no supplier. Users might be misled to thinking that the above
>query gives all parts, not just the parts supplied by somebody. Of course,
>these are well-known problems of a denormalized database.
>
OK, that's a fair criticisim. I'll give you a better example at a later time...

>> 2) worry about bad performance
>>So they end up coding the joins and
>>exists themselves.
>>
>>My (non scientific) claim is that because users and
>>DBMS-builders are used to thinking in terms of
>>(SQL-)bags, they don't even come to consider (e.g.) the usefulnes
>>of hiding complexity in views, at least not to the
>>extent that it is possible.
>>
>>
[snip]

>I do not see how a mathematical relation language would make the
>optimization procedure above easier than a multiset language. This is a
>typical example of determining the join order of tables in a nested loop
>join. Looks like it makes no difference in the optimization procedure
>whether the user formulates his query in a multiset language or a
>mathematical relation language. That has been my impression since studying
>this a bit a couple of years ago. Chris Date and some others claim the
>opposite, but they should provide practical evidence.
>
>
I think one has to take into consideration the commercial and sociological dimensions of this
issue.

Consider:

  • SQL is defined in mid 70's and duplicates are allowed (in fact there is no concept of primary keys, unique indexes do exist).
  • the keyword DISTINCT is provided so that duplicates can be eliminated
  • however, users of SQL (e.g. DB2) are strongly discouraged against using DISTINCT keyword (it causes a sort!) because the optimiser folks have not had the time to recognize DISTINCT no-ops.
  • so users avoid DISTINCT, hence creating a negative feedback loop to optimizer developing team that dedicates it's time to other matters.
    • etc...

Which really means that the "set" part of SQL (which could have been supported via users providing DISTINCT) is poortly supported up to this day. Hence it is still a good idea to avoid its use in modern SQL-databases.

So while "in theory" something might be easy to provide (good distinct support in SQL), "in practice" this has not been the case. And it is those practical systems we now have to live with.

On the other hand IF SQL had been designed with builtin DISTINCT (or at least DISTINCT as default instead of ALL) from the start, perhaps the optimizer-team would have been forced to think about optimizing DISTINCT no-ops more forcefully.

It is fair to note that it would have required also more from users: - they would have actually had to define all candidate keys (with the added disk space they required - in the 70's and 80's) - they would have had to understand what a relation is. (which, taking into account that they were using RELATIONAL DBMS's is maybe a fair requirement).

What would the users have got in exchange? They could have used views more easily and relied on them being properly optimised. I also believe there would have been less confusion because the elimination of meaningless duplicates. Because frankly, nobody in this newsgroup has been able to provide a sound practical example in which duplicates would be usefull.

regards,
Lauri Pietarinen

ps. I add once more a link describing a DBMS that Hugh Darwen was involved with in the early 80's

www.mcjones.org/System_R/bs12.html Received on Sun Aug 31 2003 - 22:41:55 CEST

Original text of this message