Re: theory of rewriting a query

From: Stephane Faroult <sfaroult_at_roughsea.com>
Date: Tue, 12 May 2009 22:03:11 +0200
Message-ID: <4A09D5FF.9060507_at_roughsea.com>



Joel,

   What really kills in this type of quey is the fact that the subqueries are dependent on each other - for instance the second subquery depends on d.effdt that is determined by the first subquery. You have the same type of unsound relationship between subqueries 3 and 4. In fact, you want, so to speak, a "greater value of greater value " (in that case the greatest effseq for the greatest effdt), which usually calls for analytic functions as Ken Naim suggested.

If you take the subqueries in isolation:

 (SELECT max(d_ed.effdt)

               FROM ps_job d_ed
               WHERE d.emplid = d_ed.emplid
                 AND d.empl_rcd = d_ed.empl_rcd
                 AND d_ed.effdt <= a.pay_end_dt)

(SELECT max(d_es.effseq)
               FROM ps_job d_es
               WHERE d.emplid = d_es.emplid
                 AND d.empl_rcd = d_es.empl_rcd
                 AND d.effdt = d_es.effdt)

(SELECT max(e_ed.effdt)
               FROM ps_job e_ed
               WHERE e.emplid = e_ed.emplid
                 AND e.empl_rcd = e_ed.empl_rcd
                 AND e_ed.effdt <= a.pay_end_dt)

(SELECT max(e_es.effseq)
               FROM ps_job e_es
               WHERE e.emplid = e_es.emplid
                 AND e.empl_rcd = e_es.empl_rcd
                 AND e.effdt = e_es.effdt)

You see that they all refer to the same table, that emplid and empl_rcd are the major link to the outer world, but that we are correlated to different entities. You can easily condense two queries into one, for instance:

 (SELECT max(d_ed.effdt)

               FROM ps_job d_ed
               WHERE d.emplid = d_ed.emplid
                 AND d.empl_rcd = d_ed.empl_rcd
                 AND d_ed.effdt <= a.pay_end_dt)

(SELECT max(d_es.effseq)
               FROM ps_job d_es
               WHERE d.emplid = d_es.emplid
                 AND d.empl_rcd = d_es.empl_rcd
                 AND d.effdt = d_es.effdt)

can become

select effdt, effseq
from (select j.effdt, j.effseq, rank() over (order by j.effdt desc, j.effseq desc) rnk

         from ps_job j
         where j.emplid = d.emplid
             and j.empl_rcd = j.empl_rcd
             and j.effdt <= a.pay_end_dt)
where rnk = 1

(you can also get something similar with an order by inside the inline view and a condition on rownum at the higher level, but an analytic function will probably be better downstream - for instance, perhaps what you'll really want ultimately is

select emplid, empl_rcd, effdt, effseq
from (select j.emplid, j.empl_rcd,

                    j.effdt, j.effseq,
                    rank() over (partition by j.eplid, j.empl_rcd,
                                      order by j.effdt desc, j.effseq
desc) rnk
         from ps_job j
         where j.effdt <= a.pay_end_dt)

where rnk = 1

)

At this stage you get two subqueries instead of four. Then you should ask yourself whether you couldn't, at the outer level, refer to each table once instead of twice, use an OR, multiple CASE ... END constructs and a GROUP BY to bring everything in a single pass - at which point you can perhaps reduce your two subqueries to one, partition by emplid and empl_rcd and so on ...

A few hours of work if you want to do it properly, but I hope you get the idea.

HTH Stéphane Faroult

Joel.Patterson_at_crowley.com wrote:
> Just off the top of anyones head going more for the theoretical side,
> can anyone suggest a better way to write the below peoplesoft query. I
> believe this uses a lot of temp tablespace, and something tells me the
> subqueries are an issue.
>
> Maybe not, but would you in general unravel the subqueries and do it
> another way? Same type of querie I see a lot uses union all half a
> dozen times.
>
> Just wondering
>
>
>
> SELECT a.company, a.union_cd, d.location, a.emplid, a.name, to_char(
> c.orig_hire_dt, 'YYYY-MM-DD'), f.name, a.erncd, sum(a.hours_crw)
> FROM ps_gl_transact_det a, ps_person c, ps_pers_srch_qry c1, ps_job
> d,
> ps_emplmt_srch_qry d1, ps_job e, ps_emplmt_srch_qry e1,
> ps_person_name f
> WHERE c.emplid = c1.emplid
> AND c1.rowsecclass = 'HRSALLU'
> AND d.emplid = d1.emplid
> AND d.empl_rcd = d1.empl_rcd
> AND d1.rowsecclass = 'HRSALLU'
> AND e.emplid = e1.emplid
> AND e.empl_rcd = e1.empl_rcd
> AND e1.rowsecclass = 'HRSALLU'
> AND a.emplid = c.emplid
> AND a.emplid = d.emplid
> AND d.empl_rcd = a.empl_rcd
> AND a.cmc_trans_type = 'C'
> AND a.debit_credit = 'DB'
> AND a.union_cd IN ('ADT', 'AD3', 'ADO')
> AND a.erncd IN ('REG', 'OVR')
> AND a.gl_acctng_period BETWEEN :1 AND :2
> AND d.effdt = (SELECT max(d_ed.effdt)
> FROM ps_job d_ed
> WHERE d.emplid = d_ed.emplid
> AND d.empl_rcd = d_ed.empl_rcd
> AND d_ed.effdt <= a.pay_end_dt)
> AND d.effseq = (SELECT max(d_es.effseq)
> FROM ps_job d_es
> WHERE d.emplid = d_es.emplid
> AND d.empl_rcd = d_es.empl_rcd
> AND d.effdt = d_es.effdt)
> AND d.reports_to = e.position_nbr
> AND e.effdt = (SELECT max(e_ed.effdt)
> FROM ps_job e_ed
> WHERE e.emplid = e_ed.emplid
> AND e.empl_rcd = e_ed.empl_rcd
> AND e_ed.effdt <= a.pay_end_dt)
> AND e.effseq = (SELECT max(e_es.effseq)
> FROM ps_job e_es
> WHERE e.emplid = e_es.emplid
> AND e.empl_rcd = e_es.empl_rcd
> AND e.effdt = e_es.effdt)
> AND e.emplid = f.emplid
> AND e.empl_status = 'A'
> AND f.name_type = 'PRI'
> GROUP BY a.company, a.union_cd, d.location, a.emplid, a.name,
> to_char(
> c.orig_hire_dt, 'YYYY-MM-DD'), f.name, a.erncd
> ORDER BY 2, 3, 7, 5
>
>
>
> --
> http://www.freelists.org/webpage/oracle-l
>
>
>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Tue May 12 2009 - 15:03:11 CDT

Original text of this message