Re: Performance question

From: VincentPTillman <VincentPTillman_at_email.msn.com>
Date: 2000/01/08
Message-ID: <uhplQ6iW$GA.274_at_cpmsnbbsa04>#1/1

And the nuances go further still. The biggest difference I've seen is whether the subquery can be "correlated" with the outer query. The WHERE clause of the subquery should contain, if at all possible, references to values from the outer query that will reduce the number of rows returned. Perhaps that is what you meant by WHERE Y = X? Programmers usually leave this out because: 1) The don't know how to do it and 2) they repeat conditions from the outer query to match the data, rather than using the key.

I believe a correlation can be used in almost any subquery, except an "in-line view" in Oracle. (An in-line view is a query that appears in your FROM clause, rather than your SELECT or WHERE clause.)

The syntax for correlating the subquery is usually:

  SELECT outerid, outerm, outern
  FROM tablex outerx -- outerx is the "correlation" name, in DB2, this used to allow a single character.
  WHERE outerm IN (
    SELECT innerm
    FROM tabley
    WHERE innerid = outerx.outerid)

This may force the IN to resolved the subquery more than once, contrary to a previous statement. Clearly, an uncorrelated subquery that returns a small set of data would often be better. Even thought an EXISTS seems less efficient because it resolves for each outer row, if it has a path, like an index scan, to get to the first row quickly, it may perform best: some RDBMSs stop returning rows from the subquery as soon as data is found (e.g., Oracle's FIRST_ROWS method).

Warning: I believe people use subqueries to force procedural logic in a declarative language. This is often because they don't understand how to "declare" the desired result set with a JOIN. Most correlated subqueries can be easily transformed into a JOIN of some sort, sometimes an OUTER JOIN (e.g., if a NOT IN condition was used). In Oracle, this is expressed by placing a (+) on one side of the join condition to set the values from the table on that side to NULL if no row is found. In other languages, the syntax is part of the FROM clause, using words like LEFT JOIN or RIGHT JOIN (or FULL JOIN) in place of the commas between tables. I recently changed a query with 7 nested subqueries into a 3-table join. Both returned the same six rows. None of the subqueries were correlated, and all of them did a table scan (not a very large table, though). Changing the query took the response time from 45 minutes to 150 milliseconds.

Vincent Tillman
Senior Oracle DBA, OCP8

<markp7832_at_my-deja.com> wrote in message news:8559s4$2h4$1_at_nnrp1.deja.com...
> In article <ii3d4.1711$x4.186688_at_petpeeve.ziplink.net>,
> "David Cressey" <David_at_DCressey.com> wrote:
> > These answer to your question is that performance of queries is
specific to
> > the techniques used by the optimizer. Different relational DBMS
engines use
> > different optimization techniques. Over time, the optimizer in any
given
> > product evolves from one release of the product to the next.
> >
> > For Oracle RDBMS and Oracle RDB, the only two I know, the exists
construct
> > will deliver better results.
> > (assuming you use cost based optimization with Oracle RDBMS)
> >
> > Oracle RDBMS underwent significant changes to its optimizer between
version
> > 7 and version 8.
> >
> > idiot wrote in message <3874a09e_at_athene.hdm-stuttgart.de>...
> > >I'm confused of using two methods of corellating the subquery, i. e.
> > >" ... x in (select y ...) ... " and
> > >" ... exists (select * .... where y = x) ... "
> > >
> > >is that a fact that one of them always works faster than other or
there are
> > >nuances?
> > >
> > >thanx
> > >
> I also work with Oracle and it is my understanding that the diffence
> between an exists clause and an in clause is that the exists sub-query
> is fired once for every row returned in the main query while the in
> clause sub-query is executed once for the statement.

> And it is not true that the exists sub-query is always faster than the
> in clause. It depends on the SQL statement and what you are testing.
> I do not know when one is better than the other, but I have worked with
> SQL statements where the explain plan and timed tests worked better
> with one clause over the other and another SQL worked the other way. I
> believe the determining factor has to do with the number of rows
> returned by the parent query and the number of values returned by the
> in clause select. As the number of rows returned by the parent
> increases the cost of the exists increases and when the result set of
> the in clause is small enough to fit in a few Oracle data blocks the in
> clause becomes superior.

> --
> Mark D. Powell -- The only advice that counts is the advice that
> you follow so follow your own advice --

>
>

> Sent via Deja.com http://www.deja.com/
> Before you buy.
Received on Sat Jan 08 2000 - 00:00:00 CET

This message: [ Message body ]
Next message: joe_celko_at_my-deja.com: "Re: Bill of materials / Groups in Groups"
Previous message: VincentPTillman: "Re: Primary Key and autonumber"
Maybe in reply to: idiot: "Performance question"
In reply to markp7832_at_my-deja.com: "Re: Performance question"
Next in thread: joe_celko_at_my-deja.com: "Re: Bill of materials / Groups in Groups"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message