Re: Query taking too long

From: Lisa Lewis <lewis_at_med.pitt.edu>
Date: Mon, 8 Feb 1999 09:19:22 -0500
Message-ID: <79mrul$jpd$1@usenet01.srv.cis.pitt.edu>

An update:

So far I tried Jonathan Lewis's suggestion to rewrite the query using a non-correlated subquery:

SELECT ....
FROM

            Mars.patient p,
            Mars.patientnkin nr,
            Mars.Visit v,
            Mars.patientnkin ec,
            (
                         SELECT v2.acctid, v2.source,MAX(v2.updatedate)
                          FROM visit v2
                        GROUP BY v2.acctid, v2.source
                        ) v3

    WHERE p.acctid = nr.acctid (+)
    AND p.source = nr.source (+)
    AND nr.contactrole (+) = 'NR'
    AND p.acctid = ec.acctid (+)
    AND p.source = ec.source (+)
    AND ec.contactrole (+) = 'EC'

    AND   p.acctid = v.acctid
    AND   p.source = v.source
   AND   v.acctid = v3.acctid

AND v.source = v3.source
AND v.updatedate = v3.updatedate
;

This improved performance tremendously...down to 30 minutes.

I also wanted to mention that some of you suggested that I have a flag to indicate that latest visit in the visit table and use that flag as part of the query to extract only where that flag is set and to keep the flag updated possibly via triggers. This could be part of my solution but I chose to go an alternative route for various reasons ( I did not explain all the issues in my original post only enough to get my problem stated). There are actually 2 parts to what I am trying to do, which are (1) perform an initial extraction and (2) extract only demographic information on new patients or patients whose demographic information has changed since the last incremental extraction (performed nightly). What I did was to perform the initial extraction ( with the helpful suggestion of Jonathan and others ). My solution for the incremental extractions was to keep a dynamic table that just holds the PK of those patients who are new or whose demographics have changed. I then link in that table with the rest of my query. There should be only 1,000 rows or so in this table at any one time. At the end of each day the incremental extraction query is performed and afterward the dynamic table that keeps track of updates is truncated. I could have done the same sort of thing with a flag in the patient table but I chose not to because I thought perhaps an update that resets all the flags after the incremental extraction would be a more costly operation than truncating the dynamic table ( I have run into problems before with updating very large tables). Also, since updates can continue to occur when the incremental extraction is taking place, I added a field to the temporary table, called PROCESSING, which indicates that these rows are the ones that the extraction is being executed for.

Thanks again for everybody's help. I will (time permitting) try some of the other suggestions as well. This was a great learning experience for me and I appreciate all the help!

Lisa Lewis wrote in message <79epvu$qnb$1_at_usenet01.srv.cis.pitt.edu>...
>Hi all,
>
>I am hoping someone can give me suggestions on how to get this query to run
>faster. This query is part of a pro*c program.
>
> EXEC SQL DECLARE c_adtdemo CURSOR FOR
> SELECT p.mrn, p.source, p.lname, p.fname, p.mname, p.dob, p.sex,
p.race,
> p.addr1, p.addr2, p.city, p.state, p.zip, p.country, p.hphone,
> p.bphone, p.acctid, p.ssn, v.domain,
> nr.lname, nr.fname, nr.addr1, nr.addr2,
> nr.city, nr.state, nr.zip, nr.country, nr.hphone,
> ec.lname, ec.fname, ec.addr1, ec.addr2,
> ec.city, ec.state, ec.zip, ec.country, ec.hphone
> FROM Mars.patient p, Mars.patientnkin nr, Mars.Visit v,
Mars.patientnkin
>ec
> WHERE p.acctid = nr.acctid (+)
> AND p.source = nr.source (+)
> AND nr.contactrole (+) = 'NR'
> AND p.acctid = ec.acctid (+)
> AND p.source = ec.source (+)
> AND ec.contactrole (+) = 'EC'
> AND p.acctid = v.acctid
> AND p.source = v.source
> AND v.updatedate = ( SELECT MAX(v2.updatedate)
> FROM visit v2
> WHERE v2.acctid = v.acctid
> AND v2.source = v.source );
>
>Note: the purpose of this query is to extract patient demographics along
>with data about the patients latest visit and some next of kin information.
>A little background: The patient table has about 500,000 rows. The visit
>table has about 2,000,000 rows and the patientnkin has about 450,000 rows.
>The pk for all of the tables has acctid and source as the first two
elements
>of the pk. There is no indexes defined on ec.contactrole. Note: This is
a
>decision support system with no users currently so there is no activity
>going on that should slow this down.
>
>I am obviously most concerned about the correlated subquery. What is
>strange is that the other day I ran this query against the tables described
>above and it took about 5 hours and produced about 460,000 rows. The next
>day I ran this query, based on the progress it was making, estimations
would
>put it at about 7 days to complete. The only thing I did between the two
>runs is to re-analyze some tables so that the statistics could be most up
to
>date. ( I thought that this would help but it seemed to have the opposite
>effect). I am now in the process of deleting some of my statistics to see
>what the impact of that would be. I know that I can create a temporary
>table to instead of using the correlated subquery, I will try that as a
>last resort. The strange thing is that this query did once run to
>completion in 5 hours.
>
>Please respond via e-mail.
>I would appreciate all help!!!
>
>Thanks
>Lisa
>
>
Received on Mon Feb 08 1999 - 08:19:22 CST

Re: Query taking too long - please help