RE: batch process runs slower and slower over time

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 7 Jun 2012 15:54:46 -0400
Message-ID: <039201cd44e7$6451f7c0$2cf5e740$_at_rsiz.com>



I've read this thread with interest. Most of the suggestions have focused on what could your PL/SQL be doing that would degrade overtime or what "leak" type bugs there might be and how you could diagnose them.

That is all good stuff.

I do note, however, that you actually asked "does anyone know how to table this sort of problem...:-?"

YES. The operational solution, to get this off your plate as a production issue while you proceed with scientific investigation to your heart's desire is to observe that you process about 1800 customers in the first hour. If the operational set-up to run the loop is relatively cheap, I'd have the process bail and resubmit itself every 1000 customers until a submission finds no work to do.
If the operational set-up to run the loop is expensive enough to care about make that 1800.
If there are no update dependency conflicts, you might consider a pre-job query to establish the rough customer (id number? alpha range id?) range conditions such that you might run 10 copies of this in parallel (not parallel degree, parallel jobs) such that each of the jobs does roughly 10% of the work, each at the "fast" rate and you get the entire job done in about 33 minutes. (Well, that's if there is no de-scaling effect. Still, I'd expect the whole thing to be done in less than an hour that way.)

This will NOT diagnose what the technology complex failure is, but changes the operation to something that probably will work.

Even if there are concurrency issues per customer so you cannot do the disjoint customer ranges in parallel, running smaller chunks head to tail should achieve your objective.

Regards,

mwf

-----Original Message-----

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Ls Cheng
Sent: Thursday, May 17, 2012 2:49 PM
To: Oracle Mailinglist
Subject: batch process runs slower and slower over time

Hi all
My developers have a pretty complex batch process (written entirely in PL/SQL packages & functions) running in 10.2.0.5 RAC database (Solaris 10), this process runs in around 12 hours time, we have noticed that the process gets slower and slower over time.

This process treats customers information, roughly 10000 customers, during the first hour it takes around 2 seconds per customer, after 1 hour it tajes 3, after 2 hours 4 seconds and so on, the funny thing is that if we abort the process and restart again the elapsed time per customer drops again to 2 seconds but after 1 hour it starts increasing again. I have add debug information, after each customer treatment we capture v$sesstat and v$session_event to see what statistics or wait event is increasing when the elapsed time goes up, the only statistics which increases over time is "CPU used by this session", basically the process is burning CPU, the rest of statistics doesnt not vary.

It is very puzzling and I cannot find more information to debug, does anyone know how to tackle this sort of problem... :-?

TIA PD: I ran dtrace (http://www.brendangregg.com/DTrace/procsystime) in the last test run but the only call which had sustantial variation between a fast execution and slow execution is pollsys

--

http://www.freelists.org/webpage/oracle-l

--

http://www.freelists.org/webpage/oracle-l Received on Thu Jun 07 2012 - 14:54:46 CDT

Original text of this message