optimizing problem: strange behaviour...

From: Gerhard Möller <moeller_at_offis.uni-oldenburg.de>
Date: Fri, 27 Aug 1999 18:47:50 +0200
Message-ID: <7q6ff4$1om@news.Informatik.Uni-Oldenburg.DE>

Dear collegues,

I try to optimize a standard intersection problem in Oracle 7.3. I have a large table R with relations beween two other (large) tables A and B: A 1-m R m-1 B.

I have to find all entries of A that correspond to a SET of B. This is rather simple, it can be stated as

SELECT * FROM a

 WHERE pk_a IN ( SELECT fk_a FROM r 
                  WHERE fk_b = <value_1> 
       INTERSECT SELECT fk_a FROM r 
                  WHERE fk_b = <value_2>
       INTERSECT ...
       INTERSECT SELECT fk_a FROM r
                  WHERE fk_b = <value_n> );

Unfortunatelly, this takes forever. (>>1 minute in my case.)

Obviously Oracle first builds all subsets and after that scans all subsets completely for matching. This is not really a good strategy in my case, I would prefer first building one subset and then doing the other SELECTs only on this subset, continously lowering the size of the subset. This would be especially a good strategy, as I can find out very quickly how many rows of A belong to an entry in B. (It is a direct lookup in a triggered column.)

How can I tell the optimizer to do exact that?

I was hoping, the optimizer would be smarter if I used nested subqueries as in

SELECT * FROM a WHERE pk_a IN (
SELECT fk_a FROM r WHERE fk_b = <value_1> AND fk_a IN ( SELECT fk_a FROM r WHERE fk_b = <value_2> AND fk_a IN ( ...
...SELECT fk_a FROM r WHERE fk_b = <value_n> )....);

Now the funny thing is that this **SOMETIMES** works as expected, giving the result in <1 sek. Unfortunatelly I have not found out a rule on how to order the subqueries! One might expect that they have to be ordered by size of the resulting subset, but this is not true. What works in *some* cases is to order the subqueries by LENGTH(fk_b). (No joke!) The same ordering with the INTERSECT sytax above does not influence the speed, btw. Some queries do not seem to be optimized at all. But all time results are repeatable, so it is not a question of cache or something like that.

Please help me to
(a) understand what's going on and
(b) possibly give a hint what to do to make the query execute fast.

Thank you ver much, Gerhard.

--
Gerhard Möller - moeller_at_offis.uni-oldenburg.de Received on Fri Aug 27 1999 - 11:47:50 CDT