Re: Hard time optimizing query

From: David Deutsch <pizza2pizza_at_hotmail.com>
Date: 15 Aug 2002 09:28:17 -0700
Message-ID: <812a50f5.0208150828.668bc7f2_at_posting.google.com>


Chrysalis <cellis_at_clubi.ie> wrote in message news:<150820020957197065%cellis_at_clubi.ie>...
> In article <812a50f5.0208141403.fbe1a4d_at_posting.google.com>, David
> Deutsch <pizza2pizza_at_hotmail.com> wrote:
>
> > Hello,
> > I am having some strange performance problems with my SQL. In a
> > nutshell, a statement that takes less than one second when I use the
> > DISTINCT clause takes about 40 seconds without it. The statement
> > (which returns 3 rows, whether or not DISTINCT is used) is:
> >
> > Select DISTINCT MP_PEOPLE.people_id
> > from MP_PEOPLE
> > where EXISTS
> > (Select WORD
> > from mp_people_name_search
> > where MP_PEOPLE.people_id = mp_people_name_search.ID and WORD =
> > 'SMITH'
> > );
> >
> >
> > Using EXPLAIN PLAN, the plan when DISTINCT is used is:
> > SELECT STATEMENT Cost = 2139
> > 2.1 SORT UNIQUE
> > 3.1 NESTED LOOPS
> > 4.1 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> > 5.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCHWORD_IDX NON-UNIQUE
> > 4.2 INDEX RANGE SCAN MP_PEOPLE_PK_PEOPLE_ID NON-UNIQUE
> >
> >
> > The plan without DISTINCT is:
> > SELECT STATEMENT Cost = 2587
> > 2.1 FILTER
> > 3.1 TABLE ACCESS FULL MP_PEOPLE
> > 3.2 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> > 4.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCH_IDX NON-UNIQUE
> >
> >
> >
> > Another (perhaps related) problem comes from using "OR" in my where
> > clause. For example, the SQL:
> > Select DISTINCT MP_PEOPLE.people_id
> > from MP_PEOPLE
> > where EXISTS
> > (Select WORD
> > from mp_people_name_search
> > where MP_PEOPLE.people_id = mp_people_name_search.ID and WORD =
> > 'SMITH'
> > )
> > OR
> > EXISTS
> > (Select WORD
> > from mp_people_name_search
> > where MP_PEOPLE.people_id = mp_people_name_search.ID and WORD =
> > 'JONES'
> > );
> >
> > This SQL takes about a minute, though I can run each individual clause
> > (i.e. with "SMITH" or "JONES") in under a second each.
> > The plan for this SQL is:
> >
> > SELECT STATEMENT Cost = 4183
> > 2.1 SORT UNIQUE
> > 3.1 FILTER
> > 4.1 TABLE ACCESS FULL MP_PEOPLE
> > 4.2 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> > 5.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCH_IDX NON-UNIQUE
> > 4.3 TABLE ACCESS BY INDEX ROWID MP_PEOPLE_NAME_SEARCH
> > 5.1 INDEX RANGE SCAN MP_PEOPLE_NAME_SEARCH_IDX NON-UNIQUE
> >
> >
> > ....So, what is Oracle doing here, and why is it doing it? More
> > importantly, does anyone know how (perhaps via SQL hints?) I can force
> > Oracle to execute all these statements the "right" way?
> >
> >
> > Thanks in advance,
> > Dave
>
> Dave,
> All the information you need to figure this out is in the query
> plans:
>
> 1) With the DISTINCT clause, it uses the traditional (pre-cost-based
> optimizer approach) and does a full table scan of the PEOPLE table,
> applying the subquery filter to each row returned (i.e. all of them)
> and then does a DISTINCT filter on the resut (all 3 of them)
> 2) Without the DISTINCT clause, Oracle is able to merge the
> predicates from the main query and the EXISTS subquery and decides that
> the predicate involving a WORD search is best, accessing the PEOPLE
> table (by people_id) only for those values of people_id returned by the
> search of MP_PEOPLE_NAME_SEARCH.
>
> Which of these is best depends on how efficient the WORD search is
> and how selective PEOPLE_ID is on the MP_PEOPLE table (if it is a
> primary key, the DISTINCT keyword is not needed). However, the
> optimizer can only make a good decision if the statistics for all
> tables and indexes are reasonably accurate and, even then, it
> frequently get it wrong in my experience.
>
> I have no idea of what you consider to be the "right" way, but I
> would have allowed the word search to drive the query, and not merely
> to be used as a filter thus:
>
> select ...
> from MP_PEOPLE
> where people_id in
> (select people_id
> from MP_PEOPLE_NAME_SEARCH
> where word = 'SMITH' (or word in ('SMITH','JONES'))
> )
>
> HTH
>
> Chrysalis

     Thanks for the input. What you say about the DISTINCT clause makes sense; however it seems that is is working just the OPPOSITE of how you described! i.e., without the DISTINCT clause, a full table scan/filter is performed, whereas with the DISTINCT clause it is not; therefore the DISTINCT makes the query run much faster. Also, I'm still puzzled why using an "OR" in the SQL takes about 50 times longer than running the two parts of the "OR" clause seperately.

    FWIW, it is not trivial for me to change the SQL statement as a whole, just the parts; in this particular application, the different parts of the SQL are created in different places.

                  
                Thanks again,
                       Dave
Received on Thu Aug 15 2002 - 18:28:17 CEST

Original text of this message