Re: can't get join on two large tables to use

From: grasp06110 <grasp06110_at_yahoo.com>
Date: 31 May 2007 13:12:22 -0700
Message-ID: <1180642342.146508.315220@o5g2000hsb.googlegroups.com>

On May 31, 3:03 pm, grasp06110 <grasp06..._at_yahoo.com> wrote:
> On May 31, 1:23 pm, DA Morgan <damor..._at_psoug.org> wrote:
>
>
>
> > grasp06110 wrote:
> > > Hi Everybody,
>
> > > I've looked for info on this from several directions (e.g. searches on
> > > partition, bitmap index, etc.) but still can't seem to come up with a
> > > solution.
>
> > > On oracle 9i:
>
> > > /* table names have been changed to protect the innocent */
> > > I have a large table (about 10 million records).
>
> > Small to medium ... not large.
>
> > > There is a column CATAGORY with about 10 distinct values.
>
> > SQL> select keyword from gv$reserved_words
> > 2 where keyword like 'CAT%';
>
> > KEYWORD
> > ------------------------------
> > CATEGORY
>
> > SQL>
>
> > > There is an other table PRODUCT that has about 1 million records.
> > > PRODUCT has a foreign key to CATAGORY over catagory_id.
> > > I have the following indexes (plus the one I get for the foreign key)
>
> > > create unique index catagory_cid_pid on catagory (
> > > catagory_id,
> > > product_id
> > > )
>
> > > create unique index catagory_pid_cid on catagory (
> > > product_id,
> > > catagory_id
> > > )
>
> > I would recommend not building unique indexes but rather unique
> > constraints: Far more flexible.
>
> > > The following query is <1sec:
>
> > > select
> > > *
> > > from
> > > catagory cat
> > > where
> > > cat.catagory_id = 'CAT_X'
>
> > > I cannot get the following to use any thing but hash joins and takes
> > > several minutes.
>
> > > select
> > > *
> > > from
> > > catagory cat,
> > > product prod
> > > where
> > > prod.product_id = cat.product_id
> > > and cat.catagory_id = 'CAT_X'
>
> > > I've tried things like:
> > > /*+ rule */
> > > /*+ index(catagory catagory_cid_pid */
> > > /*+ rule index(catagory catagory_cid_pid) */
> > > /*+ rule use_nl index(catagory catagory_cid_pid) */
> > > /*+ rule use_nl index(catagory catagory_cid_pid) index(product
> > > product_id_index) */
> > > /*+ use_nl */
>
> > Have you tried EXPLAIN PLAN using dbms_xplan so that we can see
> > what is actually happening? Or a 10053 trace and TKPROF? RULE hints
> > are likely to be a waste of time as opposed to using DBMS_STATS to
> > provide the optimizer with good numbers.
>
> > Post the metrics and we'll see if they provide insight.
> > --
> > Daniel A. Morgan
> > University of Washington
> > damor..._at_x.washington.edu
> > (replace x with u to respond)
> > Puget Sound Oracle Users Groupwww.psoug.org
>
> I'm working on getting the privs to run autotrace to get stats on the
> query. In the mean time hear is the plan for the query. ccp is the
> table that has about 10 million rows and an index on the property
> column and the sid column.
>
> I'm wondering if part of the problem is that the index on the property
> and sid column is not a bitmap index. When the
> optimizer sees that there are only a few values for property it
> decides to ignore the index. Also, if I execute something like
> "select * from ccp where property = 'PROP'" the explain plan indicates
> a full table scan rather than use any index.
>
> /* query and autotrace */
> select
> *
> from
> ccp,
> str
> where
> ccp.sid = str.sid
> and ccp.property = 'PROP_X'
>
> SELECT STATEMENT Optimizer=CHOOSE (Cost=100356 Card=953924
> Bytes=915767040)
> HASH JOIN (Cost=100356 Card=953924 Bytes=915767040)
> TABLE ACCESS (FULL) OF CCP (Cost=394 Card=953924 Bytes=21940252)
> TABLE ACCESS (FULL) OF STR (Cost=60153 Card=1293923
> Bytes=1212405851)

Woo-hoo, got it to work,

If I reverse the columns in the index I get the performance I'm looking for. In other words the behavior I am observing is:

If the column with the handful of distinct values is used first in the index the index is ignored in spite of several attemts to get it recognized.

If the column with the number of distinct values of about 1/10 the number of rows in the table is used first in the index then the index is used if I do something like the following:

select
  /*+ index(ccp ccp_sid_pid)*/
  *
from
  str,
  ccp
where
  str.sid = ccp.sid
  and property = 'FOO'

SELECT STATEMENT Optimizer=CHOOSE (Cost=3941922 Card=953924 Bytes=915767040)
NESTED LOOPS (Cost=3941922 Card=953924 Bytes=915767040) TABLE ACCESS (FULL) OF STR (Cost=60153 Card=1293923 Bytes=1212405851)

TABLE ACCESS (BY GLOBAL INDEX ROWID) OF CCP (Cost=3 Card=1 Bytes=23)

INDEX (UNIQUE SCAN) OF CCP_SID_PID (UNIQUE) (Cost=2 Card=1)

This query returns data in about 30 msec v several minutes!

I am supprized at the cost (3941922) given that I get first rows so quickly.

If I do a select count(*) from the above query and add RULE to the hint I get back the count (about 1million) in about 15 seconds (I'm guessing that this is a good approximation of how long it will take Oracle to access all of the rows?). It takes about 1 sec to get back the first 10k rows. Fortunately, most of the people looking at these data will only be interested in the first rows (generally < the first 10k rows. Received on Thu May 31 2007 - 15:12:22 CDT

Re: can't get join on two large tables to use_nl or indexes