Re: Hard time optimizing query
Date: Thu, 15 Aug 2002 08:56:56 GMT
Message-ID: <150820020957197065%cellis_at_clubi.ie>
In article <812a50f5.0208141403.fbe1a4d_at_posting.google.com>, David Deutsch <pizza2pizza_at_hotmail.com> wrote:
> Hello,
> I am having some strange performance problems with my SQL. In a
> nutshell, a statement that takes less than one second when I use the
> DISTINCT clause takes about 40 seconds without it. The statement
> (which returns 3 rows, whether or not DISTINCT is used) is:
>
> Select DISTINCT MP_PEOPLE.people_id
> from MP_PEOPLE
> where EXISTS
> (Select WORD
> from mp_people_name_search
> where MP_PEOPLE.people_id = mp_people_name_search.ID and WORD =
> 'SMITH'
> );
>
>
> Using EXPLAIN PLAN, the plan when DISTINCT is used is:
> SELECT STATEMENT Cost = 2139
> 2.1 SORT UNIQUE
> 3.1 NESTED LOOPS
> 4.1 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> 5.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCHWORD_IDX NON-UNIQUE
> 4.2 INDEX RANGE SCAN MP_PEOPLE_PK_PEOPLE_ID NON-UNIQUE
>
>
> The plan without DISTINCT is:
> SELECT STATEMENT Cost = 2587
> 2.1 FILTER
> 3.1 TABLE ACCESS FULL MP_PEOPLE
> 3.2 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> 4.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCH_IDX NON-UNIQUE
>
>
>
> Another (perhaps related) problem comes from using "OR" in my where
> clause. For example, the SQL:
> Select DISTINCT MP_PEOPLE.people_id
> from MP_PEOPLE
> where EXISTS
> (Select WORD
> from mp_people_name_search
> where MP_PEOPLE.people_id = mp_people_name_search.ID and WORD =
> 'SMITH'
> )
> OR
> EXISTS
> (Select WORD
> from mp_people_name_search
> where MP_PEOPLE.people_id = mp_people_name_search.ID and WORD =
> 'JONES'
> );
>
> This SQL takes about a minute, though I can run each individual clause
> (i.e. with "SMITH" or "JONES") in under a second each.
> The plan for this SQL is:
>
> SELECT STATEMENT Cost = 4183
> 2.1 SORT UNIQUE
> 3.1 FILTER
> 4.1 TABLE ACCESS FULL MP_PEOPLE
> 4.2 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> 5.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCH_IDX NON-UNIQUE
> 4.3 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> 5.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCH_IDX NON-UNIQUE
>
>
> ....So, what is Oracle doing here, and why is it doing it? More
> importantly, does anyone know how (perhaps via SQL hints?) I can force
> Oracle to execute all these statements the "right" way?
>
>
> Thanks in advance,
> Dave
Dave,
All the information you need to figure this out is in the query plans:
- With the DISTINCT clause, it uses the traditional (pre-cost-based optimizer approach) and does a full table scan of the PEOPLE table, applying the subquery filter to each row returned (i.e. all of them) and then does a DISTINCT filter on the resut (all 3 of them)
- Without the DISTINCT clause, Oracle is able to merge the predicates from the main query and the EXISTS subquery and decides that the predicate involving a WORD search is best, accessing the PEOPLE table (by people_id) only for those values of people_id returned by the search of MP_PEOPLE_NAME_SEARCH.
Which of these is best depends on how efficient the WORD search is and how selective PEOPLE_ID is on the MP_PEOPLE table (if it is a primary key, the DISTINCT keyword is not needed). However, the optimizer can only make a good decision if the statistics for all tables and indexes are reasonably accurate and, even then, it frequently get it wrong in my experience.
I have no idea of what you consider to be the "right" way, but I would have allowed the word search to drive the query, and not merely to be used as a filter thus:
select ...
from MP_PEOPLE
where people_id in
(select people_id
from MP_PEOPLE_NAME_SEARCH
where word = 'SMITH' (or word in ('SMITH','JONES'))
)
HTH Chrysalis Received on Thu Aug 15 2002 - 10:56:56 CEST
