Re: why does this statement take so long to return?

From: Dave Wotton <Dave.Wotton_at_dwotton.nospam.clara.co.uk>
Date: 2000/06/27
Message-ID: <G1965.3353$Kb6.418854@nnrp3.clara.net>#1/1

gdas wrote in message <013b80b8.1e5f63d6_at_usw-ex0102-013.remarq.com>...
>Here's a better plan output
>
>Query Plan
>------------------------------------------------------------
>SELECT STATEMENT Cost = 3237
> SORT GROUP BY
> FILTER
> NESTED LOOPS
> NESTED LOOPS
> NESTED LOOPS
> TABLE ACCESS FULL ORG
> TABLE ACCESS BY INDEX ROWID SUBJECT
> INDEX RANGE SCAN SUBJECT_ORG_ID_IDX
> TABLE ACCESS FULL HIT
> TABLE ACCESS BY INDEX ROWID DATASTAGE
> INDEX RANGE SCAN DATASTAGE_UQ_ID_IDX
> COUNT STOPKEY
> VIEW
> SORT ORDER BY STOPKEY
> SORT GROUP BY
> HASH JOIN
> TABLE ACCESS FULL ORG
> HASH JOIN
> TABLE ACCESS FULL SUBJECT
> HASH JOIN
> TABLE ACCESS FULL DATASTAGE
> TABLE ACCESS FULL HIT
This query plan is best understood by the following pseudo-code:

for each row in ORG /* Full table scan {

for each row from SUBJECT

where SUBJECT.ORG_ID = current ORG.ORG_ID {

      for each row in HIT   /* Full table scan
      {
          reject if HIT.SUBJECT_ID != current SUBJECT.SUBJECT_ID

          for each row in DATASTAGE
              where DATASTAGE.UNIQUE_ID = current HIT.UNIQUE_ID
          {
              /* Since the key is unique, this loop only returns
              /* one row per invocation.

              check if the ORG.ORG_ID matches the one returned
              from the subquery. (The subquery is only executed
              once, not each time the check is made)

              if match then
              {
                 build output row from all currently available
                 columns
              }
           }
       }

}
}

The killer is the Full table scan on HIT in the outer query, which will be executed once for each row in SUBJECT. Why is it doing a FTS and not doing an indexed retrieval using the condition hit.subject_id = subject.subject_id?

From your earlier postings I can see that there is an index on HIT.SUBJECT_ID. Are the SUBJECT_ID columns on these two tables of the same datatype? Or is one a number and the other a varchar2, causing an implicit data conversion which prevents the use of the index? Is the status of the index 'VALID'?

Solving this will solve the biggest performance issue, but it's still not optimal. Effectively *every* row from the outer query will be constructed and tested against the subquery. Ideally you want this the other way round: ie. the subquery is executed first, and the single ORG_ID returned by it is used to drive the outer query.

Contrary to my previous suggestion (which was made without seeing the query plan), I'd now try the /*+ PUSH_SUBQ */ hint to try to force the subquery to execute first. Try also concatenating nulls or adding zeros to the where conditions (intelligently) to help force the query to optimise in the way you want.

Try this:

select /*+ PUSH_SUBQ */

       al4.org_name,
       al1.type,
       count (al2.hit_id)
  from admin.datastage al1,
       admin.hit al2,
       admin.subject al3,
       admin.org al4
 where al1.unique_id = al2.unique_id || null
   and al2.subject_id = al3.subject_id || null

and al3.org_id = al4.org_id || null
and al4.org_id =

       (select org_id from
          (
            select aa.org_id,
                   count(bb.hit_id) b
              from hit bb,
                   org aa,
                   datastage cc,
                   subject dd
             where (bb.unique_id = cc.unique_id
               and  bb.subject_id = dd.subject_id
               and  dd.org_id = aa.org_id)
               and (aa.type = 'Regional'
               and  cc.type in ('A','B','C','D')
               and  cc.account_id = 1)
             group by aa.org_id
             order by  2 desc
          )
       where rownum=1
      )

group by al1.type, al4.org_name

I've assumed that al4.org_id, al3.subject_id and al2.unique_id are all varchar2s. If they're numbers, replace the '|| null' by '+ 0'.

The idea is that the subquery should execute first (because of the PUSH_SUBQ hint), then the order of the joins to the other tables is forced by the '|| null's: the query cannot join to al3 before al4.org_id || null has been evaluated, similarly the join to al2 cannot be made until al3.subject_id || null has been evaluated etc.

So the join order should be:
subquery -> ORG (al4) -> SUBJECT (al3) -> HIT (al2) -> DATASTAGE (al1).

It may also be possible to optimise the subquery as well, but we'll tackle this if it's needed after the first tuning problem has been solved.

Dave.

--
If you reply to this posting by email, remove the "nospam" from my email
address first.

Received on Tue Jun 27 2000 - 00:00:00 CDT