| Oracle FAQ | Your Portal to the Oracle Knowledge Grid | |
Home -> Community -> Usenet -> comp.databases.theory -> Re: Performance question
And the nuances go further still. The biggest difference I've seen is whether the subquery can be "correlated" with the outer query. The WHERE clause of the subquery should contain, if at all possible, references to values from the outer query that will reduce the number of rows returned. Perhaps that is what you meant by WHERE Y = X? Programmers usually leave this out because: 1) The don't know how to do it and 2) they repeat conditions from the outer query to match the data, rather than using the key.
I believe a correlation can be used in almost any subquery, except an "in-line view" in Oracle. (An in-line view is a query that appears in your FROM clause, rather than your SELECT or WHERE clause.)
The syntax for correlating the subquery is usually:
SELECT outerid, outerm, outern
FROM tablex outerx -- outerx is the "correlation" name, in DB2, this used
to allow a single character.
WHERE outerm IN (
SELECT innerm
FROM tabley
WHERE innerid = outerx.outerid)
This may force the IN to resolved the subquery more than once, contrary to a previous statement. Clearly, an uncorrelated subquery that returns a small set of data would often be better. Even thought an EXISTS seems less efficient because it resolves for each outer row, if it has a path, like an index scan, to get to the first row quickly, it may perform best: some RDBMSs stop returning rows from the subquery as soon as data is found (e.g., Oracle's FIRST_ROWS method).
Warning: I believe people use subqueries to force procedural logic in a declarative language. This is often because they don't understand how to "declare" the desired result set with a JOIN. Most correlated subqueries can be easily transformed into a JOIN of some sort, sometimes an OUTER JOIN (e.g., if a NOT IN condition was used). In Oracle, this is expressed by placing a (+) on one side of the join condition to set the values from the table on that side to NULL if no row is found. In other languages, the syntax is part of the FROM clause, using words like LEFT JOIN or RIGHT JOIN (or FULL JOIN) in place of the commas between tables. I recently changed a query with 7 nested subqueries into a 3-table join. Both returned the same six rows. None of the subqueries were correlated, and all of them did a table scan (not a very large table, though). Changing the query took the response time from 45 minutes to 150 milliseconds.
Vincent Tillman
Senior Oracle DBA, OCP8
<markp7832_at_my-deja.com> wrote in message news:8559s4$2h4$1_at_nnrp1.deja.com...
> In article <ii3d4.1711$x4.186688_at_petpeeve.ziplink.net>,
> "David Cressey" <David_at_DCressey.com> wrote:
> > These answer to your question is that performance of queries is
specific to
> > the techniques used by the optimizer. Different relational DBMS
engines use
> > different optimization techniques. Over time, the optimizer in any
given
> > product evolves from one release of the product to the next.
> >
> > For Oracle RDBMS and Oracle RDB, the only two I know, the exists
construct
> > will deliver better results.
> > (assuming you use cost based optimization with Oracle RDBMS)
> >
> > Oracle RDBMS underwent significant changes to its optimizer between
version
> > 7 and version 8.
> >
> > idiot wrote in message <3874a09e_at_athene.hdm-stuttgart.de>...
> > >I'm confused of using two methods of corellating the subquery, i. e.
> > >" ... x in (select y ...) ... " and
> > >" ... exists (select * .... where y = x) ... "
> > >
> > >is that a fact that one of them always works faster than other or
there are
> > >nuances?
> > >
> > >thanx
> > >
> I also work with Oracle and it is my understanding that the diffence
> between an exists clause and an in clause is that the exists sub-query
> is fired once for every row returned in the main query while the in
> clause sub-query is executed once for the statement.
>
>
> >
![]() |
![]() |