Re: ridiculously time-consuming subqueries

From: David Fitzjarrell <oratune_at_yahoo.com>
Date: Fri, 14 Mar 2014 12:14:59 -0700 (PDT)
Message-ID: <1394824499.51987.YahooMailNeo_at_web124702.mail.ne1.yahoo.com>


A 'regular' join only returns data where the criteria is matched among all of the involved tables; an outer join of some sort is required to include unmatched records from the other joined tables:

SQL> select distinct deptno from emp;

    DEPTNO
----------
        30
        20
        10

SQL> select deptno from dept;

    DEPTNO
----------
        10
        20
        30
        40

SQL>

Notice there is a deptno value not used in the EMP table.  Using an inner join we won't see that unused deptno value:

SQL> select d.deptno, e.empno, e.ename, e.job
  2  from emp e join dept d on d.deptno = e.deptno;

    DEPTNO      EMPNO ENAME      JOB
---------- ---------- ---------- ---------
        20       7369 SMITH      CLERK
        30       7499 ALLEN      SALESMAN
        30       7521 WARD       SALESMAN
        20       7566 JONES      MANAGER
        30       7654 MARTIN     SALESMAN
        30       7698 BLAKE      MANAGER
        10       7782 CLARK      MANAGER
        20       7788 SCOTT      ANALYST
        10       7839 KING       PRESIDENT
        30       7844 TURNER     SALESMAN
        20       7876 ADAMS      CLERK

    DEPTNO      EMPNO ENAME      JOB
---------- ---------- ---------- ---------
        30       7900 JAMES      CLERK
        20       7902 FORD       ANALYST
        10       7934 MILLER     CLERK
        10       7939 DUKE       CEO
        10       7949 PRINCE     CFO
        10       7959 QUEEN      CIO
        10       7869 JACK       PRESIDENT

18 rows selected.

Re-writing the join to a right outer join provides the 'missing' deptno along with NULLs for all other values requested:

SQL> select d.deptno, e.empno, e.ename, e.job
  2  from emp e right join dept d on d.deptno = e.deptno;

    DEPTNO      EMPNO ENAME      JOB
---------- ---------- ---------- ---------
        20       7369 SMITH      CLERK
        30       7499 ALLEN      SALESMAN
        30       7521 WARD       SALESMAN
        20       7566 JONES      MANAGER
        30       7654 MARTIN     SALESMAN
        30       7698 BLAKE      MANAGER
        10       7782 CLARK      MANAGER
        20       7788 SCOTT      ANALYST
        10       7839 KING       PRESIDENT
        30       7844 TURNER     SALESMAN
        20       7876 ADAMS      CLERK

    DEPTNO      EMPNO ENAME      JOB
---------- ---------- ---------- ---------
        30       7900 JAMES      CLERK
        20       7902 FORD       ANALYST
        10       7934 MILLER     CLERK
        10       7939 DUKE       CEO
        10       7949 PRINCE     CFO
        10       7959 QUEEN      CIO
        10       7869 JACK       PRESIDENT
        40

19 rows selected.

SQL>


 
David Fitzjarrell
Primary author, "Oracle Exadata Survival Guide"




On Friday, March 14, 2014 12:04 PM, William Threlfall <William.Threlfall_at_albertahealthservices.ca> wrote:
 
Another update:
 
I pulled the subqueries out and did them as a separate query, as
follows:
 
SELECT PT.PAT_ID, SV.RECIPIENT_CMV_IGG, SV.RECIPIENT_EBV
FROM   PAT_TRANSPLANT PT,
OTTRSDF.SDF_SEROLOGY_VERIFICATION_TO SV,
       DOCUMENTS DOCS,
TX_DOCUMENTS TXD, LAST_TRANSPLANT_VIEW LTV
WHERE  PT.PAT_ID       
= DOCS.PAT_ID
AND   
PT.PAT_ID        = LTV.PAT_ID
AND    PT.TRANSPLANT_ID = TXD.TRANSPLANT_ID
AND    PT.DATE_OF_TRANS = LTV.LAST_TRANS_DATE
AND    DOCS.DOC_ID     
= TXD.DOC_ID
AND    DOCS.DOC_ID     
= SV.DOC_ID
AND    DOCS.DOC_KIND_ID = 10108
AND   PT.PAT_ID IN (<list of patient ID’s>)
 
This executed fine in less than 5 minutes.
But then I had to combine these results with the other results,
in Excel.
That’s when I discovered that the above query, for some reason
that I’m not understanding right now, doesn’t include pat_id’s
in the results set where the CMV and EBV are (null) (null).
Even if those patients don’t have any 10108 document or
any SV record, it should still return <patient_id> (null) (null), right?
 
I swear Oracle is trying to drive me insane.  LOL
 
Cheers, - Bill.
 
 
From:TJ Kiernan [mailto:tkiernan_at_pti-nps.com] 
Sent: Friday, March 14, 2014 10:42 AM
To: William Threlfall; Kim Berg Hansen
Cc: Oracle-L_at_freelists.org; TJ Kiernan
Subject: RE: ridiculously time-consuming subqueries
 
If you’re expecting a
large number of records back from a query, scalar subqueries are not your
friend, especially version 10, as there’ no scalar subquery caching, if I
recall correctly (this means that the subquery is executed once per row, so
more rows = more time spent).  You’ll probably be well served to
rewrite the scalar subqueries as inline views or factored subqueries.  
 
Your other potential problem
is what I’m guessing is a view (LTV = LAST_TRANSPLANT_VIEW). 
A view that is (I’m further guessing) running some pseudo-analytical
subqueries is probably not going to get merged into your main query.
 I’d be concerned that the scalar subqueries are being executed for
records that will eventually be filtered out.  An examination of the
execution plan would probably tell  you more about this.
 
As far as learning resources
go, Tom Kyte’s book is great if you’re learning about designing
applications, but for running reports, I’d suggest Pro Oracle SQL by
Karen Morton et.al.  It assumes that you know the basics of writing SQL
(which appears to be the case for you), and digs in to many of the more
advanced features & extensions that Oracle offers with lots of examples.  
 
HTH,
T. J.
 
 
From:oracle-l-bounce_at_freelists.org
[mailto:oracle-l-bounce_at_freelists.org] On Behalf Of William Threlfall
Sent: Friday, March 14, 2014 10:10 AM
To: Kim Berg Hansen
Cc: Oracle-L_at_freelists.org
Subject: RE: ridiculously time-consuming subqueries
 
Kim, 
  
Yes, of course, that was it exactly.
*sigh* 
Now I’m running it again, but
now it is taking forever to run again.  *sigh* 
  
Do you have any suggestions as to
general methodology for how I can get query results in a better way?  Any
recommendations for books or online resources to better understand how to do
the type of querying that I need to do on a regular basis? 
  
The requests I get almost always
require one row of one-value-per-patient results, which is why I am doing them
as several scalar subqueries.  
However, the results I want are
almost always buried behind several layers of one-to-many table relationships
(different ones for each subquery, of course), which is why the scalar subquery
method is extremely inefficient, since many complex joins are involved for each
patient over and over again. 
I am constantly struggling to avoid
that “single-row subquery returns multiple rows” error. 
  
I got the book ‘Expert
Oracle’ by Tom Kyte, as suggested by someone else on this list, but only
the chapter on Analytic Functions is of any use to me, and even that chapter is
not very in-depth.  It doesn’t even talk about the
“KEEP” and “FIRST” keywords, for example, and
doesn’t really explain very many of the analytic functions very well. 
  
The official Oracle SQL Reference
Manual is not helpful at all in understanding how to use various features,
never mind how BEST to use them. 
  
I am a smart guy, but Oracle is
kicking my a$$ so far.  I need to find a better way to get the results I
want. 
  
Cheers, - Bill. 
  
  
  
From:Kim Berg Hansen [mailto:kibeha_at_gmail.com] 
Sent: Friday, March 14, 2014 1:50 AM
To: William Threlfall
Cc: Oracle-L_at_freelists.org
Subject: Re: ridiculously time-consuming subqueries 
  
Bill, 
  
Your modified subquery I notice
you have added PATIENT_DEMO PD to the FROM list. 
In your original post it is not
present in the subquery FROM list, so it looks like PD is supposed to be from
the main query, right? 
  
So your new modified subquery does
not actually contain any reference to tables from the main query anymore, as
far as I can tell. 
It is not correlated anymore - it
evaluates the max expression on all patients that have doc_kind_id =
10108 and returns that same max expression for all rows in the main
query. 
I think you get POSITIVE on all
rows... 
I think you have been a little too
quick in a cut-and-paste? ;-) 
  
  
On a side note - this is nonsense: 
  
ORDER BY PT.DATE_OF_TRANS     DESC NULLS
LAST, 
                  
SV.RECIPIENT_CMV_IGG DESC NULLS LAST, 
                  ROWNUM ASC 
  
ROWNUM is not numbering rows in a
table - it is numbering the output rows in whatever order they happen to
be retrieved. 
In this case the output rows will be just one row - the MAX expression. 
That ROWNUM is always 1 in this
case and makes no sense to put in that ORDER BY expression. 
  
  
  
Regards 
  
  
Kim Berg Hansen 
  
http://dspsd.blogspot.com 
kibeha_at_gmail.com 
_at_kibeha 
  
  
  
On Thu, Mar 13, 2014 at 11:16 PM,
William Threlfall <William.Threlfall_at_albertahealthservices.ca>
wrote: 
Update for this issue: 
To answer the question, it is Oracle 10g EE (10.2.0.3.0) 
The original explain plans were extremely complex and
time-consuming.  
  
I modified the first subquery to use an analytic function, as
follows: 
  
  (SELECT MAX(SV.RECIPIENT_CMV_IGG) 
          KEEP
(DENSE_RANK FIRST 
          ORDER BY
PT.DATE_OF_TRANS     DESC NULLS LAST, 
                  
SV.RECIPIENT_CMV_IGG DESC NULLS LAST, 
                  
ROWNUM ASC) 
    FROM PATIENT_DEMO PD, PAT_TRANSPLANT PT,
DOCUMENTS DOCS, 
         TX_DOCUMENTS
TXD, OTTRSDF.SDF_SEROLOGY_VERIFICATION_TO SV 
    WHERE PD.PAT_ID = PT.PAT_ID 
    AND   DOCS.DOC_ID = TXD.DOC_ID 
    AND   DOCS.DOC_ID = SV.DOC_ID 
    AND   PT.TRANSPLANT_ID =
TXD.TRANSPLANT_ID 
    AND   DOCS.DOC_KIND_ID = 10108 
   ) AS CMV, 
  
This now runs in a reasonable amount of time. 
However, it is giving me incorrect results. 
For the last patient returned, it is returning
“POSITIVE”, but for that patient there is only one transplant (so
no multiples there), one document 10108 (so no multiples there), and one SV
record (so no multiples there), and the only CMV result in the database for
that patient in the SV table is “NEGATIVE”. 
I’m having trouble even imagining where the result
“POSITIVE” is coming from. 
  
Cheers, - Bill. 
 

________________________________
 
This message and any attached
documents are only for the use of the intended recipient(s), are confidential
and may contain privileged information. Any unauthorized review, use, retransmission,
or other disclosure is strictly prohibited. If you have received this message
in error, please notify the sender immediately, and then delete the original
message. Thank you.
--
http://www.freelists.org/webpage/oracle-l
Received on Fri Mar 14 2014 - 20:14:59 CET

Original text of this message