Path: news.easynews.com!newsfeed1.easynews.com!easynews.com!easynews!newsfeed1.earthlink.net!newsfeed.earthlink.net!stamper.news.pas.earthlink.net!stamper.news.atl.earthlink.net!harp.news.atl.earthlink.net!not-for-mail
From: Steve Kass <skass@drew.edu>
Newsgroups: comp.databases.theory
Subject: Re: Extending my question. Was: The relational model and relational
      algebra - why did SQL become the industry standard?
Date: Sun, 02 Mar 2003 20:56:29 -0500
Organization: MindSpring Enterprises
Lines: 172
Message-ID: <b3ucmh$37a$1@slb5.atl.mindspring.net>
References: <b208pe$27s0$1@toralf.uib.no> <e9d83568.0303010250.5de40fc8@posting.google.com> <3e60a9ce.0@news.ruca.ua.ac.be> <3E61AE08.8070707@atbusiness.com> <3e620dec.0@news.ruca.ua.ac.be> <3E625162.4060203@atbusiness.com>
NNTP-Posting-Host: 45.03.a9.dc
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
X-Server-Date: 3 Mar 2003 01:55:29 GMT
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.1) Gecko/20020826
X-Accept-Language: en-us, en
Xref: newsfeed1.easynews.com comp.databases.theory:25109
X-Received-Date: Sun, 02 Mar 2003 18:54:51 MST (news.easynews.com)



Lauri Pietarinen wrote:

>>
> OK, I admit.  I am no expert on (optimizing) compilers and I (perhaps 
> falsly) assumed that it
> was harder to optimize languages that used GOTO's.  Maybe the case is 
> not so.
>
> I will offer one last example to try to clarify what I mean.
>
> Take the standard S, SP and P -tables.  I want to hide the joins from
> the user so I create a view:
>
> CREATE VIEW V_S_SP_P
>  ( S#, SNAME,  QTY, P#, PNAME)
> AS
> SELECT S.S#, S.SNAME,  SP.QTY,  P.P#, P.PNAME
>  FROM S, SP, P
>   WHERE S.S# = SP.S# AND
>             SP.P# = P.P#;

This is the set of suppliers-products with the
additional columns containing the names of
the products and suppliers, which are drawn from
the supplier and product tables.  I call this a set
because I assume a constraint that a given
product-supplier relationship is specified only once.

>
>
> There are lots of interesting questions that can be
> easily answered using this view.
>
> The question "give me all products that are supplied
> by somebody" is easy to state:
>
> SELECT P#, PNAME
>  FROM V_S_SP_P;
>
> Oops!  Got the same P#'s several times.
> Got to add DISTINCT.

The problem is that you are trying to select a set
of products from a table source that is not a set
of products, but a set of product-supplier relationships.

SQL Server shouldn't optimize your query as you
wish, because your query is asking a different question.
Your query (without DISTINCT), asks for the product #
and name for each product-supplier relationship.

I don't understand why this is hard to understand, but
it seems to be the basis of many criticisms of thinking
in bags.  While it may be convenient to develop a theory
that makes it easy to select a particular set of products
from something other than a products table, I don't see
the benefit.

To criticise bags because they can't optimise a query
that selects information from the wrong table (or view)
is not a fair criticism of bags.  In your own words, the
query is supposed to "give me all products that are supplied
by somebody", which immediately translates to

SELECT P#, PNAME FROM P
<where the product is supplied by somebody>

and easily into the optimized result you desire.  

Why do you blame bags here when you should blame
the programmer who believes this question to be represented
by
SELECT P#, PNAME
FROM V_S_SP_P;
??

This latter query is of the form
SELECT information
FROM product-supplier relationships
...,
which is not a request for information about
a subset of products.  It is a request for information
about a subset of product-supplier relationships and
should not return the same result as the EXISTS query
on the product table.

If a requirement of the model is that it must be able to
read the programmer's mind when he or she incorrectly
represents a query and second-guess him or her, then
I will freely admit that bags do not meet the requirement.
But I'll also add that any model that meets that requirement
is a model I won't use.  If I want the product number and
name for each product-supplier relationship, I don't want
language "intelligence" to answer a different question.

SK

>
>
> Ok, let's try
>
> SELECT DISTINCT P#, PNAME
>  FROM V_S_SP_P;
>
> Correct answer, but why is performance
> so bad?
>
> Oh, the DBMS  materialises the whole join
> (even reading S, which is not needed at all)
> and then groups by P# and PNAME!
>
> (Just tried it on SQLServer2000).
>
> Why can't it transform this query into
> SELECT  P#, PNAME
>  FROM P
>  WHERE EXISTS
>     (SELECT * FROM SP
>         WHERE P.P# = SP.P#)?
>
> Is this a useful view?  It sure is!
> Could I actually provide such a view
> to my end users or programmers?
>
> No I can't, because they have to
>
>   1) remember to use DISTINCT
>   2) worry about bad performance
>
> So they end up coding the joins and
> exists themselves.
>
> My (non scientific) claim is that because users and
> DBMS-builders are used to thinking in terms of
> (SQL-)bags, they don't even come to consider (e.g.) the usefulnes
> of hiding complexity in views, at least not to the
> extent that it is possible.
>
> So while you might be right in theory, i.e. that
> supporting two modes at once (sets and bags), is no
> burden for the DBMS optimiser, you are certainly
> wrong in practice!  If it is so easy, why did they
> (the SQL-vendors) not support it a long time ago?
>
> Date is trying to take the relational model to it's full potential
> and I would claim that the SQL-mode of thinking
> is hindering us from achieving this.  And this (in my view)
> is ultimately Date's criticism of
> Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom,
> DATABASE SYSTEM IMPLEMENTATION.
>
> kind regards,
> Lauri Pietarinen
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

