Re: Extending my question. Was: The relational model and relational algebra - why did SQL become the industry standard?

From: Steve Kass <skass_at_drew.edu>
Date: Sun, 02 Mar 2003 20:56:29 -0500
Message-ID: <b3ucmh$37a$1_at_slb5.atl.mindspring.net>


Lauri Pietarinen wrote:

>>
> OK, I admit. I am no expert on (optimizing) compilers and I (perhaps
> falsly) assumed that it
> was harder to optimize languages that used GOTO's. Maybe the case is
> not so.
>
> I will offer one last example to try to clarify what I mean.
>
> Take the standard S, SP and P -tables. I want to hide the joins from
> the user so I create a view:
>
> CREATE VIEW V_S_SP_P
> ( S#, SNAME, QTY, P#, PNAME)
> AS
> SELECT S.S#, S.SNAME, SP.QTY, P.P#, P.PNAME
> FROM S, SP, P
> WHERE S.S# = SP.S# AND
> SP.P# = P.P#;

This is the set of suppliers-products with the additional columns containing the names of the products and suppliers, which are drawn from the supplier and product tables. I call this a set because I assume a constraint that a given product-supplier relationship is specified only once.

>
>
> There are lots of interesting questions that can be
> easily answered using this view.
>
> The question "give me all products that are supplied
> by somebody" is easy to state:
>
> SELECT P#, PNAME
> FROM V_S_SP_P;
>
> Oops! Got the same P#'s several times.
> Got to add DISTINCT.

The problem is that you are trying to select a set of products from a table source that is not a set of products, but a set of product-supplier relationships.

SQL Server shouldn't optimize your query as you wish, because your query is asking a different question. Your query (without DISTINCT), asks for the product # and name for each product-supplier relationship.

I don't understand why this is hard to understand, but it seems to be the basis of many criticisms of thinking in bags. While it may be convenient to develop a theory that makes it easy to select a particular set of products from something other than a products table, I don't see the benefit.

To criticise bags because they can't optimise a query that selects information from the wrong table (or view) is not a fair criticism of bags. In your own words, the query is supposed to "give me all products that are supplied by somebody", which immediately translates to

SELECT P#, PNAME FROM P
<where the product is supplied by somebody>

and easily into the optimized result you desire.

Why do you blame bags here when you should blame the programmer who believes this question to be represented by
SELECT P#, PNAME

FROM V_S_SP_P;
??

This latter query is of the form
SELECT information
FROM product-supplier relationships
...,
which is not a request for information about a subset of products. It is a request for information about a subset of product-supplier relationships and should not return the same result as the EXISTS query on the product table.

If a requirement of the model is that it must be able to read the programmer's mind when he or she incorrectly represents a query and second-guess him or her, then I will freely admit that bags do not meet the requirement. But I'll also add that any model that meets that requirement is a model I won't use. If I want the product number and name for each product-supplier relationship, I don't want language "intelligence" to answer a different question.

SK

>
>
> Ok, let's try
>
> SELECT DISTINCT P#, PNAME
> FROM V_S_SP_P;
>
> Correct answer, but why is performance
> so bad?
>
> Oh, the DBMS materialises the whole join
> (even reading S, which is not needed at all)
> and then groups by P# and PNAME!
>
> (Just tried it on SQLServer2000).
>
> Why can't it transform this query into
> SELECT P#, PNAME
> FROM P
> WHERE EXISTS
> (SELECT * FROM SP
> WHERE P.P# = SP.P#)?
>
> Is this a useful view? It sure is!
> Could I actually provide such a view
> to my end users or programmers?
>
> No I can't, because they have to
>
> 1) remember to use DISTINCT
> 2) worry about bad performance
>
> So they end up coding the joins and
> exists themselves.
>
> My (non scientific) claim is that because users and
> DBMS-builders are used to thinking in terms of
> (SQL-)bags, they don't even come to consider (e.g.) the usefulnes
> of hiding complexity in views, at least not to the
> extent that it is possible.
>
> So while you might be right in theory, i.e. that
> supporting two modes at once (sets and bags), is no
> burden for the DBMS optimiser, you are certainly
> wrong in practice! If it is so easy, why did they
> (the SQL-vendors) not support it a long time ago?
>
> Date is trying to take the relational model to it's full potential
> and I would claim that the SQL-mode of thinking
> is hindering us from achieving this. And this (in my view)
> is ultimately Date's criticism of
> Hector Garcia-Molina, Jeffrey D. Ullman, and Jennifer Widom,
> DATABASE SYSTEM IMPLEMENTATION.
>
> kind regards,
> Lauri Pietarinen
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
Received on Mon Mar 03 2003 - 02:56:29 CET

Original text of this message