Re: Is this a bug or a limitation of the SQL language

From: Glenn Paulley <gnpaulle_at_maytag.uwaterloo.ca>
Date: Wed, 26 Aug 1992 19:07:00 GMT
Message-ID: <BtLuFp.Kp6_at_watdragon.uwaterloo.ca>


In article <HELLERS.92Aug25181305_at_cleo.wisc.edu> hellers_at_wisc.edu (Joseph Hellerstein) writes:

> In article <BtJKIG.2K5_at_watdragon.uwaterloo.ca> gnpaulle_at_maytag.uwaterloo.ca (Glenn Paulley) writes:
> > [I wrote:]
> > |select * from emp
> > |where salary between
> > |(select salary from emp where ename = 'Larry')
> > |and
> > |(select salary from emp where ename = 'John')
 

> > You want to avoid subqueries whenever possible. To get what you need,
> > try the following:
>
> 2) Sahil's query, if optimized using subqueries for the where predicate,
> will probably execute much faster than doing a three-way join (the
> conjuncts in the where clause are single-value lookups; if ename is
> indexed, for example, the entire result can be found using two indexed
> retrievals, and a single table scan.... no joins, no sorts).
 

>This is incorrect. Subquery processing is identical to *one kind* of
>join processing, namely nested-loop with the subquery as the inner.
>Hence joins will always do at least as well as subqueries, and often
>better, since the space of paths for the optimizer to choose from is
>larger. To argue that subquery processing is simpler than join
>processing is just wrong. It's a kind of join processing itself, with
>restricted output semantics.

100% agreed. However, the scalar subqueries above are not correlated, so I still fail to see how join processing could perform better than (basically) a single scan- nested loops aren't required. Again, I'm assuming that ename is indexed, or in some way it's possible to determine Larry and John's salary quickly.

As an aside:

As pointed out by map_at_sequent.com (Michael Perry) these two scalar subqueries must return single rows, or a run-time error occurs. Also, a good point is made by kent_at_manzi.unx.sas.com (Paul Kent) that the semantics of BETWEEN are questionable. My copy of the ISO SQL2 standard (ISO/IEC JTC1/SC21 N5215, December 1, 1990) says that "x BETWEEN y AND z" is equivalent to "x>=y AND x<=z"; thus a NULL result will occur if John earns less than Larry (as correctly assumed by Paul).

Both of the above also affect your join version in the same way, so there's no problem there.

> And see the paper by Pirahesh, Hellerstein, and Hasan in SIGMOD '92
> which demonstrates that subquery-to-join *can* be done correctly (and
> is done automatically in Starburst) for all existential subqueries
> that are Boolean Factors (i.e. IN, ANY, or EXISTS subqueries that are
> outer conjuncts of the WHERE clause, not nested under NOT or OR, etc.)

I realize this (I have seen your paper, and others, on this subject). The point I was trying to make is that I don't believe such transformations are possible for *all* forms of subqueries (eg. universal quantification and/or negation), so the heuristic isn't true in all cases (but is true in this one).

> You have to be careful, but the rewrite I gave him will work
> correctly, may well speed up his query, and may even sneak by his
> limited parser.

Yes. Again, agreed.

I must say it's a pleasure to discuss these issues with someone of Mr. Hellerstein's reputation. I follow your results on Starburst query rewrite optimization, and other papers on Starburst, with interest.

Best regards.

--
-- G. N. (Glenn) Paulley                 | Computer Science Department
-- USENET: gnpaulle_at_bluebox.uwaterloo.ca | University of Waterloo
-- Phone: (519) 885-1211 x3490           | 200 University Avenue
-- Fax: (519) 885-1208   Office: DC3142  | Waterloo, Ontario, Canada N2L 3G1
Received on Wed Aug 26 1992 - 21:07:00 CEST

Original text of this message