Re: How to read parent session data - without forcing a commit in the parent

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Wed, 22 Oct 2008 10:59:28 -0700 (PDT)
Message-ID: <e36763f0-bebf-48aa-afd8-f0920a50f88c@l76g2000hse.googlegroups.com>


On Oct 22, 1:26 pm, nis..._at_gmail.com wrote:
> On 21 Oct, 19:04, joel garry <joel-ga..._at_home.com> wrote:
>
>
>
> > On Oct 21, 6:26 am, nis..._at_gmail.com wrote:
>
> > > Hi Folks,
>
> > > I have an interesting problem. Basically, I have an oracle job that I
> > > need to speed up as we are exceeding our allocated time window. The
> > > oracle job consists of a large number of small changes across multiple
> > > 'source' tables and then one of my functions is called to try and
> > > process this source data in the fastest possible time - in order to
> > > create a single huge table from the uncommitted data. If there is an
> > > error in my processing, I simply raise an exception that causes a
> > > rollback of the source data. So far so good.
>
> > > Now here's the problem. The 'huge' table that I am building consists
> > > of 2 distinct sets of SELECT queries on the uncomitted source data
> > > tables. There is no problem in doing this serially - other than it
> > > takes us 4 hours which is too long. To speed things up, I wanted to
> > > run the 2 distinct SELECT queries in parallel - and therein lies the
> > > problem. I cannot get parallel SELECT functions to run without them
> > > forcing the source data to become committed. Is there any way to get
> > > around this?? Below is a cut down version of the top level function
> > > that I used to kick off the parallel functions that build the huge
> > > table from the source tables. It works - but we cannot rollback
> > > because the parallel functions force the source data to become
> > > comitted!!! is there no way for a child session to have access to its
> > > parents uncomitted data?
>
> > > Thanks folks..
> > > Nis.
>
> > > PROCEDURE testParallel
> > >   IS
>
> > >     p_ba varchar2(100) :='CSR';
> > >     p_no number :=100;
>
> > >     v_command1 varchar2(200):= 'nis_test_parallel.func1(' || '''' ||
> > > '''' || p_ba || '''' || '''' || ',' || p_no || ')';
> > >     v_command2 varchar2(200):= 'nis_test_parallel.func2(' || '''' ||
> > > '''' || p_ba || '''' || '''' || ',' || p_no || ')';
>
> > >   BEGIN
>
> > >     -- When this function is called, then at this point we have got
> > > uncomitted data in tables. These tables are
> > >     -- need to be read by
>
> > >     -- The start_background_job command uses the
> > > dbms_scheduler.create_job command to kick off the thread
> > >     start_background_job( 'BUILD_TABLE_PART1',
> > >                           v_command1,
> > >                           'Background Job to build part1 of the
> > > results table');
>
> > >     start_background_job( 'BUILD_TABLE_PART2',
> > >                           v_command2,
> > >                           'Background Job to build part2 of the
> > > results table');
>
> > >   END testParallel;
>
> > Have you looked to see _why_ it is so slow?  The hot button phrase
> > "select uncommitted data" is often a code for "not understanding how
> > oracle does read consistency."  It may be other things are happening
> > to that data, or maybe even other data not related to it.  Since that
> > leads more gently into what Sybrand says, I don't even want to suggest
> > other possibilities, like spitting the data into another transaction
> > that... I'd say go ask Tom, as well as being more specific about the
> > code you are using.  There's a reason most performance problems turn
> > out to be the code.
>
> > jg
> > --
> > @home.com is bogus.http://forums.oracle.com/ECVtest.html
>
> Hi jg,
>
> The speed of the SELECT operations themselves is not actually slow -
> but there's other work that is going on in the functions both before
> and after the select statement, so the totality of the individual
> operations in each function is slow (2hrs per function so 4 hours in
> total if run serially). Since the operations in each function are not
> dependent on each other - being only dependent on the common
> uncommitted 'source' data, then it looked like a perfect candidate for
> parallel operations.In the past I've worked on multi-threaded C
> programs where a child thread is just forked off from the parent and
> each individual thread can see the current state of data in their
> common ancestor - as well as having their own data stack. Looks like
> Oracle does things differently. I've looked at other options such as
> using the PARALLEL clause and compiler hints (i.e. nologging, append)
> when running each query but that won't have much effect as the other
> 'non select' operations in each function are taking time and the
> collective total is high. I don't like the idea of trying to use
> flashback queries to restore the database if something goes wrong -
> seems long winded and possibly leaves more room for problems down the
> line. The system I'm working on is a data warehouse, and basically the
> folks in charge of the data can't risk data being committed to the
> source tables unless my operations on them are also successful. Hope
> that explains the background a bit. Thanks for all the input!

Is that "other work" your procedures carry out computationally intensive? If so, did you consider native compilation of your PL/SQL? It may dramatically increase the speed of non-SQL work, especially if most of it is number crunching.

Also, since your two processes work on the same source data and feed the same target table, why don't you combine them? I mean, select once, process as necessary, insert once. This may not have big impact on performance if the processing consumes most of the run time, but at least you will definitely half the amount of work Oracle has to perform fetching and writing your data. Did you profile the processes to find out where the most time is spent? Is it that extra processing you do or is it fetching the source data or maybe writing out the results (and possibly doing some heavy index housekeeping/space management/temp allocation behind the scene?)

Too many open questions here for us to come up with a sensible advice that doesn't involve some exotic approaches at rolling back committed work. :) For example, did you consider creating another instance, replicating the source data to it and processing it there knowing it's dedicated to just that and you can safely scrap the data and restart if anything goes wrong (with license costs of running that second instance set aside?) Can this process be made incremental rather than recurring (that is, identify and work only on new/changed data instead of full refresh every night?) Finally, is the result of this process of any value to the business or is it just waste load as per Cary Millsap's definition and nobody will actually use it or take note when it becomes stale or even goes for good?

Regards,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com

p.s. Indeed, Oracle is doing things very differently from a multithreaded  program (even though it is multi-threaded itself, at least on Windows, though Oracle still refers to these threads as "processes.") This is because all threads in a multi-threaded program work with single data set that cannot be legitimately changed by some external process, only by this process and its children themselves (and then you need to use semaphores, mutexes and other tricks to synchronize these threads and do not allow one of them to overwrite the work of the others or read the data currently being changed.) In Oracle each process is working independently from any other processes that might run concurrently, and threading inside these processes is not currently allowed (well, maybe in release 42, because technically it is possible, and even practically possible if you resort to Java. Not sure if Oracle VM will schedule these threads to different cores though, so it may well be multiple threads actually executed serially on single core with extra time spent switching between them thus being even less efficient then true serial execution...) Received on Wed Oct 22 2008 - 12:59:28 CDT

Original text of this message