Re: Query taking too long

From: Dave Wotton <Dave.Wotton_at_dwotton.nospam.clara.co.uk>
Date: Sat, 06 Feb 1999 12:24:50 GMT
Message-ID: <mwWu2.187$cQ5.205@nnrp2.clara.net>

Lisa Lewis wrote in message <79fmjk$8s$1_at_usenet01.srv.cis.pitt.edu>... [juggled around and edited a bit :-) ]
>
> SELECT ....
> FROM Mars.patient p, Mars.patientnkin nr, Mars.Visit v,
> Mars.patientnkin ec
> WHERE p.acctid = nr.acctid (+)
> AND p.source = nr.source (+)
> AND nr.contactrole (+) = 'NR'
> AND p.acctid = ec.acctid (+)
> AND p.source = ec.source (+)
> AND ec.contactrole (+) = 'EC'
> AND p.acctid = v.acctid
> AND p.source = v.source
> AND v.updatedate = ( SELECT MAX(v2.updatedate)
> FROM visit v2
> WHERE v2.acctid = v.acctid
> AND v2.source = v.source );
>
>explain plan:
>
> FILTER
> NESTED LOOPS OUTER 7727 1
> NESTED LOOPS OUTER 7725 1
> NESTED LOOPS 7721 2
> TABLE ACCESS FULL VISIT 7715 3
> TABLE ACCESS BY INDEX ROWID PATIENT 2 455567
> INDEX UNIQUE SCAN PATIENT_PK_ACCTID 1 455567
> TABLE ACCESS BY INDEX ROWID PATIENTNKIN 2 105151
> INDEX UNIQUE SCAN PATIENTNKIN_PK 1 105151
> TABLE ACCESS BY INDEX ROWID PATIENTNKIN 2 105151
> INDEX UNIQUE SCAN PATIENTNKIN_PK 1 105151
> SORT AGGREGATE 1
> TABLE ACCESS BY INDEX ROWID VISIT 6 3
> INDEX RANGE SCAN VISIT_PK 3 3
>
> A little background: The patient table has about 500,000 rows.
> The visit table has about 2,000,000 rows and the patientnkin
> has about 450,000 rows.

I'm going to develop some of Jonathan Lewis's suggestions. I've never used the PUSH-SUBQ hint ( is it O8? ) but agree that it would help to get this subquery evaluated at the earliest opportunity ( if possible ). At the moment, the optimiser is creating a logical table from the join of VISIT, PATIENT, and PATIENTNKIN ( twice ) and filtering the results of this join through the subquery. As there are 2 million rows in the VISIT table, this means looking up 8 million rows to create the join ( each VISIT row joins to at least one PATIENT and 2 PATIENTNKIN rows ) and filtering the resulting join of at least 2 million rows through the subquery.

First thought: adjust your query to drive from the PATIENT table, not the VISIT table. At the moment, for each row of VISIT, the optimiser reads a row of PATIENT. This means that the PATIENT rows are being re-read multiple times. The number of row lookups from this join is 4 million ( 2 million VISITS + 2 million PATIENT lookups ). If you drive from PATIENT, each PATIENT row will only be read once, so the number of row lookups from the modified join will be 2,500,000 ( 500,000 PATIENT reads + 2 million VISIT lookups ). Ok, it probably won't make any difference to the physical I/O, but it will make a difference to the logical I/O. Logical I/O is not imaginary I/O, it still has a CPU and elapsed time cost, and with the figures you're dealing with it will be noticeable. Saving = 1,500,000 logical I/Os.

This is the point where you want to try to get the subquery to execute, to filter out most of those joined rows before looking up next-of-kin for the remainder. At the moment the query looks up both next-of-kin rows for each of the 2 million VISIT rows, before it discards most of them using the subquery. It probably discards 3,500,000 rows: there can only be one "most recent" visit for each PATIENT, so only 500,000 rows make it through the subquery. Potential saving: 7 million lookups. ( Each visit attempts to join to one of each type of next of kin )

Putting this together, try writing the query as follows:

SELECT /*+ PUSH-SUBQ */....
FROM Mars.patientnkin nr, Mars.patientnkin ec,

Mars.Visit v, Mars.patient p

    WHERE p.acctid + 0 = v.acctid
    AND   p.source + 0 = v.source
    AND   p.acctid + 0 = nr.acctid (+)
    AND   p.source + 0 = nr.source (+)

    AND nr.contactrole (+) = 'NR'
    AND p.acctid + 0 = ec.acctid (+)
    AND p.source + 0 = ec.source (+)
    AND ec.contactrole (+) = 'EC'
    AND v.updatedate = ( SELECT MAX(v2.updatedate)

                           FROM visit v2
                           WHERE v2.acctid = v.acctid
                           AND v2.source = v.source );

( Notes: not sure of the syntax of the PUSH-SUBQ hint, check the manual for this. I've re-arranged the order of the tables in the FROM clause and added + 0 to some of the WHERE conditions to try to force the optimiser to drive from the PATIENT table. This can also be achieved by hints, but I'm not sure how good the optimiser is at combining multiple hints )

Taking this approach further ( but somewhat contradicting the advice to execute the subquery as early as possible ), is to make sure that, once you're driving from the PATIENT table, to join to the next of kin tables BEFORE joining to the VISIT table. This ensures that next-of-kin rows are only looked up for each PATIENT, not for each VISIT. The following re-arrangement of your query should do it....

SELECT /*+ PUSH-SUBQ */....
FROM Mars.Visit v, Mars.patientnkin nr, Mars.patientnkin ec,

         Mars.patient p
    WHERE p.acctid + 0 = nr.acctid (+)
    AND p.source + 0 = nr.source (+)
    AND nr.contactrole (+) = 'NR'
    AND p.acctid + 0 = ec.acctid (+)
    AND p.source + 0 = ec.source (+)
    AND ec.contactrole (+) = 'EC'

    AND   p.acctid + 0 = v.acctid
    AND   p.source + 0 = v.source
    AND   v.updatedate = ( SELECT MAX(v2.updatedate)
                           FROM visit v2
                           WHERE v2.acctid = v.acctid
                           AND v2.source = v.source );

Check the access path using explain plan. Replace the PUSH-SUBQ hint with a RULE hint if it still doesn't optimise the way you intend.

What can you do if PUSH-SUBQ doesn't work, or your version of Oracle doesn't support it?

You could try this ...

CREATE VIEW MOST_RECENT_VISIT AS

       SELECT ....
         FROM VISIT V
        WHERE updatedate = ( SELECT MAX(v2.updatedate)
                               FROM visit v2
                              WHERE v2.acctid = v.acctid
                                AND v2.source = v.source );

    CREATE VIEW PATIENT_DETAILS AS
       SELECT .....
         FROM Mars.patientnkin nr, Mars.patientnkin ec,
              Mars.patient p
        WHERE p.acctid + 0 = nr.acctid (+)
          AND p.source + 0 = nr.source (+)
          AND nr.contactrole (+) = 'NR'
          AND p.acctid + 0 = ec.acctid (+)
          AND p.source + 0 = ec.source (+)
          AND ec.contactrole (+) = 'EC';

and then rewrite your query as ...

SELECT /*+ USE_MERGE( V, P ) */ ......

      FROM MOST_RECENT_VISIT V,
           PATIENT_DETAILS P
     WHERE p.acctid = v.acctid
       AND p.source = v.source;

again, I'm not certain of the hint syntax ( I'm writing this whole reply from home without access to a database or documentation! ). The hint might not even be necessary - this query might use MERGE JOIN anyway. The result of this should be that the subquery will be evaluated separately to the patient and next of kin details, and the two result sets merged. As a result, the next-of-kin details will only be looked up for each PATIENT, not for each VISIT, so should yield a big improvement.

You could also investigate the PARALLEL hints with this query structure to get the optimiser to evaluate both halves of the query in parallel, so producing even more improvement.

Other thoughts: Are you able to modify your application? Think how easy ( and quick! ) your query would be if you had a MOST_RECENT_FLAG column on your VISIT table which had the values Y|N. All it would require is a small change to the application ( better, an ON-INSERT trigger ) which set this flag to Y for the newly inserted row and simultaneously reset the flag to 'N' for the existing VISIT row for this patient. If the column were then indexed ( combine it with the (ACCTID,SOURCE) index ), you could quickly find the most recent visit for each patient. I suspect this requirement is quite common in your application and so could potentially yield performance improvements elsewhere too.

Something else you could do is persuade your DBA to occasionally physically sort your tables into (ACCTID,SOURCE) order. This would result in much reduced physical I/O and hence greater performance. The reason for this is that (eg.) when your query retrieves all the VISITs for a given PATIENT, they'll all be in the same block, or adjacent blocks. At the moment, the VISITs for a given PATIENT are scattered all over the VISIT table, so will require multiple physical block reads. ( OK, your DB_BLOCK_BUFFER cache will reduce this, but the quantities of data you're dealing with means you must either have a very large buffer cache, or consider these options. )

By applying some ( or maybe you'll have to apply all ) of these ideas, you should be able to reduce your query time to 1-2 hours or even less. Be ambitious! - Let me know how you get on.

Dave.
--
Remove "nospam" from my address to reply by email Received on Sat Feb 06 1999 - 06:24:50 CST

Re: Query taking too long - please help