Re: can't get join on two large tables to use

From: DA Morgan <damorgan_at_psoug.org>
Date: Thu, 31 May 2007 15:41:54 -0700
Message-ID: <1180651312.485869@bubbleator.drizzle.com>

grasp06110 wrote:
> On May 31, 3:03 pm, grasp06110 <grasp06..._at_yahoo.com> wrote:

>> On May 31, 1:23 pm, DA Morgan <damor..._at_psoug.org> wrote:
>>
>>
>>
>>> grasp06110 wrote:
>>>> Hi Everybody,
>>>> I've looked for info on this from several directions (e.g. searches on
>>>> partition, bitmap index, etc.) but still can't seem to come up with a
>>>> solution.
>>>> On oracle 9i:
>>>> /* table names have been changed to protect the innocent */
>>>> I have a large table (about 10 million records).
>>> Small to medium ... not large.
>>>> There is a column CATAGORY with about 10 distinct values.
>>> SQL> select keyword from gv$reserved_words
>>>    2  where keyword like 'CAT%';
>>> KEYWORD
>>> ------------------------------
>>> CATEGORY
>>> SQL>
>>>> There is an other table PRODUCT that has about 1 million records.
>>>> PRODUCT has a foreign key to CATAGORY over catagory_id.
>>>> I have the following indexes (plus the one I get for the foreign key)
>>>> create unique index catagory_cid_pid on catagory (
>>>>   catagory_id,
>>>>   product_id
>>>> )
>>>> create unique index catagory_pid_cid on catagory (
>>>>   product_id,
>>>>   catagory_id
>>>> )
>>> I would recommend not building unique indexes but rather unique
>>> constraints: Far more flexible.
>>>> The following query is <1sec:
>>>> select
>>>>   *
>>>> from
>>>>   catagory cat
>>>> where
>>>>   cat.catagory_id = 'CAT_X'
>>>> I cannot get the following to  use any thing but hash joins and takes
>>>> several minutes.
>>>> select
>>>>   *
>>>> from
>>>>   catagory cat,
>>>>   product prod
>>>> where
>>>>   prod.product_id = cat.product_id
>>>>   and cat.catagory_id = 'CAT_X'
>>>> I've tried things like:
>>>> /*+ rule */
>>>> /*+ index(catagory catagory_cid_pid */
>>>> /*+ rule index(catagory catagory_cid_pid) */
>>>> /*+ rule use_nl index(catagory catagory_cid_pid) */
>>>> /*+ rule use_nl index(catagory catagory_cid_pid) index(product
>>>> product_id_index) */
>>>> /*+ use_nl */
>>> Have you tried EXPLAIN PLAN using dbms_xplan so that we can see
>>> what is actually happening? Or a 10053 trace and TKPROF? RULE hints
>>> are likely to be a waste of time as opposed to using DBMS_STATS to
>>> provide the optimizer with good numbers.
>>> Post the metrics and we'll see if they provide insight.
>>> --
>>> Daniel A. Morgan
>>> University of Washington
>>> damor..._at_x.washington.edu
>>> (replace x with u to respond)
>>> Puget Sound Oracle Users Groupwww.psoug.org
>> I'm working on getting the privs to run autotrace to get stats on the
>> query.  In the mean time hear is the plan for the query.  ccp is the
>> table that has about 10 million rows and an index on the property
>> column and the sid column.
>>
>> I'm wondering if part of the problem is that the index on the property
>> and sid column is not a bitmap index.  When the
>> optimizer sees that there are only a few values for property it
>> decides to ignore the index.  Also, if I execute something like
>> "select * from ccp where property = 'PROP'" the explain plan indicates
>> a full table scan rather than use any index.
>>
>> /* query and autotrace */
>> select
>>   *
>> from
>>   ccp,
>>   str
>> where
>>   ccp.sid = str.sid
>>   and ccp.property = 'PROP_X'
>>
>> SELECT STATEMENT Optimizer=CHOOSE (Cost=100356 Card=953924
>> Bytes=915767040)
>>   HASH JOIN (Cost=100356 Card=953924 Bytes=915767040)
>>     TABLE ACCESS (FULL) OF CCP (Cost=394 Card=953924 Bytes=21940252)
>>     TABLE ACCESS (FULL) OF STR (Cost=60153 Card=1293923
>> Bytes=1212405851)

>
> Woo-hoo, got it to work,
>
> If I reverse the columns in the index I get the performance I'm
> looking for. In other words the behavior I am observing is:
>
> If the column with the handful of distinct values is used first in the
> index the index is ignored in spite of several attemts to get it
> recognized.
>
> If the column with the number of distinct values of about 1/10 the
> number of rows in the table is used first in the index then the index
> is used if I do something like the following:
>
> select
> /*+ index(ccp ccp_sid_pid)*/
> *
> from
> str,
> ccp
> where
> str.sid = ccp.sid
> and property = 'FOO'
>
> SELECT STATEMENT Optimizer=CHOOSE (Cost=3941922 Card=953924
> Bytes=915767040)
> NESTED LOOPS (Cost=3941922 Card=953924 Bytes=915767040)
> TABLE ACCESS (FULL) OF STR (Cost=60153 Card=1293923
> Bytes=1212405851)
> TABLE ACCESS (BY GLOBAL INDEX ROWID) OF CCP (Cost=3 Card=1
> Bytes=23)
> INDEX (UNIQUE SCAN) OF CCP_SID_PID (UNIQUE) (Cost=2 Card=1)
>
> This query returns data in about 30 msec v several minutes!
>
> I am supprized at the cost (3941922) given that I get first rows so
> quickly.
>
> If I do a select count(*) from the above query and add RULE to the
> hint I get back the count (about 1million) in about 15 seconds (I'm
> guessing that this is a good approximation of how long it will take
> Oracle to access all of the rows?). It takes about 1 sec to get back
> the first 10k rows. Fortunately, most of the people looking at these
> data will only be interested in the first rows (generally < the first
> 10k rows.

Hit and miss is a winning strategy given infinite time and patience and being paid by the hour. <g>

Next time try using the tools provided: It works better.

-- 
Daniel A. Morgan
University of Washington
damorgan_at_x.washington.edu
(replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org

Received on Thu May 31 2007 - 17:41:54 CDT

Re: can't get join on two large tables to use_nl or indexes