RE: Parallel Distributed CTAS

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 4 Feb 2011 13:16:19 -0500
Message-ID: <355DE797ACAB4DF3A9C7D92DA5A3713F_at_rsiz.com>



Exactly. So if you did have two or more distinct "pipes" (yes, I believe he was referring to networks) unbonded between the remote hosts, and if, despite your understanding about the network not being an issue, it actually was, then separate sessions each fielding half the source data (again presuming there is a reasonable way to write that query) then you might improve the speed with two (or more) non-parallel sessions.  

IF network is actually the bottleneck, then it might be even worth it to dump half the data into the real destination, half into another "fake" destination and follow up with a local CTAS (which would be a simple select * in bulk) from "fake" to final for that half. Of course if you have partitioning and the pieces can usefully be dumped directly into sensible (sub)partitions you avoid that as well.  

Oracle does a damn fine job of PX, but there has to be some overhead for coordination. If the job is big enough to look at eliminating that overhead and looking shows that running separate sessions does not require more effort in figuring out how to do it than the savings possible then you win. Most of the time (Greg, I hope I'm saving you from having to add this back in), perhaps even the vast preponderance of the time if Oracle's PX engineering built in the product can't come pretty doggone close to the theoretical maximum throughput, then you have some resource obstacle to parallelism that is gating Oracle's performance and not the other way around.  

If you DO have two "pipes" available, then Oracle's using only one within a single session is an exception. Of course you might be able to remove that obstacle by "bonding" the "pipes." Your mileage may vary. As far as I know there is no way other than "bonding" network resources so they appear to Oracle to be a single resource to utilize multiple "pipes" within an Oracle session in a way that can be exploited by PX. [Again, I think Greg said that same thing a little differently and much shorter.]  

The exception to the effort expenditure is when the cost of time is different for preparation versus execution. For example if you have months to prepare for a move and an uncomfortably small window for the production move, then putting in a lot of effort to do better for your specific case than Oracle has engineered to handle the general case might be worthwhile.  

That's where I come in. (wink)  

If you don't have 2 pipes, then what is making you think PX would speed this up anyway? Is the source data query complex and slow? Oh, and does the query involve some grouping or ordering that means it has to do the extra work of putting the pieces all back together before it can start spilling the results across the pipe?  

mwf


From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Greg Rahn
Sent: Friday, February 04, 2011 11:36 AM To: Laimutis.Nedzinskas_at_seb.lt
Cc: oracle-l_at_freelists.org
Subject: Re: Parallel Distributed CTAS  

What happens here is that you get a parallel select on the remote db, a parallel insert on the local db, but the communication between the two nodes over the db link is done via the QC so its 1:1 (or serial).  

On Fri, Feb 4, 2011 at 4:23 AM, <Laimutis.Nedzinskas_at_seb.lt> wrote:

What is a pipe ? You mean network ?
My understanding is that network is not an issue.

It is how oracle handles it. Execution plans do differ for remote and local, that's true:  

-- 
Regards,
Greg Rahn
http://structureddata.org



--
http://www.freelists.org/webpage/oracle-l
Received on Fri Feb 04 2011 - 12:16:19 CST

Original text of this message