Re: How can I tune this query?????

From: Richard J Avery <Richard_Avery_cnt40719_at_nt.com>
Date: 1996/04/01
Message-ID: <Richard_Avery_cnt40719-0104961408310001_at_nnsgm25.lon40.nt.com>#1/1


In article <4jc6k3$5ee_at_iserv.dnt.dialog.com>, cdye_at_dnt.dialog.com wrote:

>Andrea,
>
>One thing that will most likely work faster is to rewrite the query as follows:
>
> select id from cat c
> where exists (select keyval from dates d
> where year=1965 AND d.keyval=c.keyval)
> /
>
>Regards,
>Charles_Dye_at_corp.dialog.com
>
>In article <4jbpgi$ets_001_at_nuts.nwu.edu>, aboyes_at_nwu.edu (Andrea J Boyes)
writes:
[Snip - Performance tuning of query question]

EXISTS is _NOT_ faster that IN in _EVERY_ case. Both have their advantages and disadvantages and should be used where appropriate. In the case above I do not believe that using EXISTS will improve performance.

IN builds a temp table which is not indexed. Lookups into this table will be fast where only a few (up to a few hundred - depending on the particular case) rows are returned since the rows will tend to be in memory. However where many thousands of rows exist lookups will take a long time as the temporary table will be written to the temporary tablespace and there is no index on the temp table.

EXISTS will force a join into the (correlated) subquery for each row returned by the outer select - it does not construct a temporary table. This query will be able to use any indexes on the subquery tables. The disadvantage is that the index will have to be re-searched for each query and the relevant data block located.

IN will be faster where the selectivity the outer query is high (i.e. sub 10% of rows returned) and the selectivity of the inner query is low (i.e. greater than 60% returned). As these numbers change the performance benefits will move towards IN - your mileage will vary and you should always explain plan and run timings for examples.

Andreas example shows no where clauses in the outher select other than IN. I would therefore tend to think that IN was the correct solution. This however depends on what percentage of the rows in 'date' are after 1965. If each keyval exists only once in date then I would agree with the other respondents that a simple join is the best solution.

Richard Avery

-- 
The above posting does not in any way represent Nortel
Received on Mon Apr 01 1996 - 00:00:00 CEST

Original text of this message