Re: Is this a bug or a limitation of the SQL language
Date: 26 Aug 92 00:13:05 GMT
Message-ID: <HELLERS.92Aug25181305_at_cleo.wisc.edu>
In article <BtJKIG.2K5_at_watdragon.uwaterloo.ca> gnpaulle_at_maytag.uwaterloo.ca (Glenn Paulley) writes:
> > [I wrote:]
> > |select * from emp
> > |where salary between
> > |(select salary from emp where ename = 'Larry')
> > |and
> > |(select salary from emp where ename = 'John')
> > You want to avoid subqueries whenever possible. To get what you need,
> > try the following:
>
> 2) Sahil's query, if optimized using subqueries for the where predicate,
> will probably execute much faster than doing a three-way join (the
> conjuncts in the where clause are single-value lookups; if ename is
> indexed, for example, the entire result can be found using two indexed
> retrievals, and a single table scan.... no joins, no sorts).
> Avoiding
> subqueries altogether, or always attempting to transform subqueries to
> joins, is a dangerous heuristic.
A heuristic, yes, as noted below (duplicate elimination may be required). Dangerous? Not really. For exactly the reason I outlined above, subquery-to-join is a stable conversion, meaning that it will almost always do as well, and often better. To do duplicate semantics correctly, a sort or hash is sometimes required after transformation (SELECT DISTINCT instead of SELECT), and this may result in some slowdown. But the win from conversion can result in orders-of-magnitude improvement, while the loss of an extra sort (say) usually slows things down by much less (in the < 100% slowdown range).
> RDBMS optimizers should also determine
> if transforming a join into a subquery will result in a better execution
> plan.
This can be done, but as noted above, this logic is not essential
since subquery-to-join conversion is quite stable. Since optimization
time is already dangerously complex, adding this dimension is probably
not worth it.
>
> Having the *user* perform query rewrite for "optimization" is unacceptable
> (it's what query optimizers in RDBMS are all about). Sahil's query is
> straightforward and specifies exactly what he wants. It's not his fault
> if today's query optimizers fail to execute this query in the most
> optimal way.
Absolutely 100% agreed. This is a serious failing in the existing SQL products, namely that different expressions of the same query may have wildly different performance. As a result, if you open up a vendor's manual, you will usually find such "tips" as: "convert subqueries to joins in the case where blah blah blah". Naturally these tips will stymie all but the most sophisticated users. And every shop needs such a user, or they're in trouble. The Starburst research project at IBM Almaden may result in IBM's products having such optimizations be automatic some years from now.
> It's also a dangerous practice
> to generalize (that is, transform subqueries to joins) because not all
> transformations result in equivalent queries (particularly when the
> database contains null values). See the paper by Negri et al.
> in ACM TODS, Vol 17, No. 3, Sept. 1991 for formal semantics of entry-level
> SQL and query equivalence classes.
Happy rewriting.
Joe Hellerstein Received on Wed Aug 26 1992 - 02:13:05 CEST