Re: why does this statement take so long to return?

From: Dave Wotton <Dave.Wotton_at_dwotton.nospam.clara.co.uk>
Date: 2000/06/24
Message-ID: <AQ455.1196$Kb6.162093@nnrp3.clara.net>

[responses embedded ]

yong321_at_yahoo.com wrote in message <8j0ls0$m8d$1_at_nnrp1.deja.com>...
>
> "Dave Wotton" <Dave.Wotton_at_dwotton.nospam.clara.co.uk> wrote:
>>
>> Alternatively, is it possible to run your query as two separate
>> queries, first run the inner query standalone and put the found
>> org_id in a variable then pass this variable into the second query?
>>
>> It amazes me how often people create really complicated queries
>> which could be broken down into a number of smaller queries called
>> in sequence by the calling application. Of course, this depends
>> on what you are calling the query from. There are occasions where
>> a single (complicated) query is the only solution.
>
>One trivial caveat though. If you break the query into two,
>there's a finite probability that the second query is not reading the
>same picture as when you did it in one, atomic query.

Good point, but often the data isn't changing in the database fast enough for this to have a significant effect. I suspect that's true here: the organisation's name and type are unlikely to change in the second or two between identifying the organisation with the greatest number of hits, and looking up its details. The hit count may have changed, but only the query designer can tell us if that's significant to him.

>Another possible rewrite is, instead of having that bulk subquery,
>combine the top-level select with the inner query into one level
>select. You know what I mean. Instead of:
>
>select ename from emp where deptno = (select depno from dept where
>dname = 'accounting')
>
>rewrite to:
>
>select ename from emp a, dept b where a.deptno = b.deptno and b.dname
>= 'accounting';
>
>so that the optimizer may do a better job.

Actually it turns out to be quite a difficult query to optimise as a single query (as the original poster asserted). Those group-by conditions make merging the inner select with the outer one impossible (IMO). My suggestion of turning the query inside-out also turned out to be difficult.

However, I *think* I've done it using the USE_NL hint. As I don't have access to the tables, I've tried it with a simplified query using the SCOTT.EMP table:

select /*+USE_NL */

e1.empno, e1.ename
from ( select deptno

         from ( select deptno, sum(sal) from emp
                 group by deptno
                 order by 2 desc)
        where rownum = 1
     ) e2,
     emp e1

where e1.deptno = e2.deptno

This returns all the employees in the department which has the highest salary expenditure. So is similar in concept to the original problem). It's explain plan is:

Query Plan

SELECT STATEMENT Cost =
NESTED LOOPS
VIEW

      COUNT STOPKEY
        VIEW
          SORT ORDER BY STOPKEY
            SORT GROUP BY
              TABLE ACCESS FULL EMP
    TABLE ACCESS BY INDEX ROWID EMP
      INDEX RANGE SCAN EMP$DEPTNO

ie. The inner query is the driving "table" in a nested-loops execution plan: It is executed (once) and then the EMP table is accessed, using the EMP$DEPTNO index, for each deptno returned by the inner query (only one).

Applying this to the original poster's query, I suggest this optimisation:

select /*+ USE_NL */ distinct

       al4.org_name,
       al1.type,
       count (al2.hit_id)
  from ( select org_id
           from ( select aa.org_id,
                         count(bb.hit_id) b
                    from hit bb,
                         org aa,
                         datastage cc,
                         subject dd
                   where (bb.unique_id = cc.unique_id
                     and  bb.subject_id=dd.subject_id
                     and  dd.org_id=aa.org_id)
                     and (aa.type='Regional'
                     and  cc.type in ('A','B','C','D')
                     and  cc.account_id=1)
                   group by aa.org_id
                   order by  2 desc
                 )
           where rownum=1
       ) xx,
       admin.org al4,
       admin.subject al3,
       admin.hit al2,
       admin.datastage al1

where al4.org_id = xx.org_id

   and al3.org_id = al4.org_id
   and al2.subject_id = al3.subject_id
   and al1.unique_id = al2.unique_id

group by al1.type, al4.org_name

The important bit is the USE_NL hint and reordering the tables in the outer query. There might also be some benefit in re-ordering the tables in the inner-query. Without access to the data, the explain plans and the index definitions it's very difficult to tune SQL. Hopefully this will work, but I can't guarantee it. Returning to my original assertion, it's *much* easier to tune this query if it's split into two smaller queries.

PS. The 'DISTINCT' in the outer query is redundant, each row must be distinct because of the group-by clause. In fact, the whole query looks over-complicated. If the tables in the inner query are the same as the ones in the outer query, (it's not clear, the outer ones are prefixed with admin., the inner ones aren't) I suspect the whole query could be simplified.

Dave.

--
If you reply to this posting by email, remove the "nospam" from my email
address first.

Received on Sat Jun 24 2000 - 00:00:00 CDT