Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Why does Oracle buffer the result from the select for a parallel INSERT INTO SELECT FROM?

Re: Why does Oracle buffer the result from the select for a parallel INSERT INTO SELECT FROM?

From: Jaap W. van Dijk <j.w.vandijk.removethis_at_hetnet.nl>
Date: Wed, 29 Nov 2006 19:27:35 GMT
Message-ID: <456dd563.1377218@news.hetnet.nl>


On 27 Nov 2006 17:11:58 -0800, "joel garry" <joel-garry_at_home.com> wrote:

>
>Jaap W. van Dijk wrote:
>> On Wed, 22 Nov 2006 14:14:00 GMT, j.w.vandijk.removethis_at_hetnet.nl
>> (Jaap W. van Dijk) wrote:
>>
>> >On 22 Nov 2006 03:54:40 -0800, "sybrandb" <sybrandb_at_gmail.com> wrote:
>> >
>> >>
>> >>
>> >>On Nov 22, 11:23 am, j.w.vandijk.removet..._at_hetnet.nl (Jaap W. van
>> >>Dijk) wrote:
>> >>> Hi,
>> >>>
>> >>> I'm on Oracle 9.2.0.5, Open VMS.
>> >>>
>> >>> I perform an
>> >>>
>> >>> INSERT INTO target
>> >>> SELECT * FROM source
>> >>>
>> >>> If I do this noparallel, the result from the select is immediately
>> >>> inserted into the target.
>> >>>
>> >>> If I do this in parallel, the result of the selects from a slave set
>> >>> are first buffered in TEMP. After this is done, a slave set starts to
>> >>> insert the contents from the buffers into the target.
>> >>>
>> >>> Why this timeconsuming buffering? Can anything be done?
>> >>>
>> >>> While I am writing this, I wonder: I let Oracle decide which degree of
>> >>> parallellization to use. Maybe buffering only takes places if the
>> >>> degree differs between SELECT and INSERT? (I should test myself, but
>> >>> don't have the opportunity right now).
>> >>>
>> >>> Regards, Jaap.
>> >>
>> >>IMHO parallelization is pretty useless if the affected tables aren't
>> >>striped over several disks.
>> >>I remember Tom Kyte stating somewhere you should use parallelization
>> >>only if you can't get rid of full table scans at all.
>> >>Also parallelization always will use parallel server slaves and a query
>> >>coordinator (the original session), so I bet this is what you are
>> >>looking at right now: data is retrieved by slaves and inserted by the
>> >>coordinator.
>> >>And yes, this is a 'feature', while you seem to qualify it as a bug.
>> >>
>> >>--
>> >>Sybrand Bakker
>> >>Senior Oracle DBA
>> >>
>> >
>> >Maybe the coordinator wants to coordinate between select and insert ,
>> >but it doesn't insert the records itself: that is also done by a slave
>> >set.
>> >
>> >I took a look at the Concept Manual. There inter-operation
>> >parallellism is discussed, where data flows directly from one slave
>> >set to the next (fig 18-3 on page 18-7). What is pictured there - two
>> >slave sets doing a select and a sort - could IMO as easily have been
>> >two slave sets doing a select and an insert, without any buffering. As
>> >I said before: maybe the optimizer needs to know beforehand that the
>> >degree of select and insert is the same, so when I have the
>> >opportunity, I perform a test.
>> >
>> >Regarding the parallel gain: selecting also takes an amount of CPU,
>> >and parallellizing that improves the select noticably, even if
>> >everything is read from the same disk, especially if the select is
>> >part of the transformation step of the load of a datawarehouse, with
>> >lots of complex case statements in the select list, consuming lots of
>> >CPU.
>> >
>> >Regards, Jaap.
>>
>> After some more reading and thinking: I thought that data got PIPED
>> through the slave set processes, so the second slave set consumes
>> immediately what the first set produces. This certainly is what figure
>> 18-3 in the Concept manual is suggesting. But instead produce from the
>> first slave set is buffered in totality first, and then this is
>> processed by the second slave set. A similar difference as between the
>> ordinary table function and the pipelined table function.
>>
>> I don't see any functional impossibilities, so I wonder why Oracle
>> didn't implement the pipelined version, which would be much faster.
>>
>> Regards,
>>
>> Jaap.
>
>Silly as it sounds, you can sometimes roll your own parallel by simply
>having multiple sessions like
>
>insert into target select from source where somevalue < constant
>insert into target select from source where somevalue >= constant
>
>Degree of silliness is proportional to actual needs. If you don't have
>to put things back together in a particular order you can pipeline
>easier. See
>http://download-west.oracle.com/docs/cd/B19306_01/server.102/b14223/usingpe.htm#CACCBEJC
>for the considerations (and read the whole chapter, it may clarify your
>question).
>
>jg
>--
>@home.com is bogus.
>http://www.signonsandiego.com/uniontrib/20061124/news_1m24baby.html
>

Hi Joel,

I tried your 'silly solution', but we insert /*+ append */, and then Oracle takes out a lock on the table, that cannot be shared among sessions. One could get around this I suppose, by dividing the table in multiple partitions, based on a dummy column, letting every process insert into a different partition, but this is getting very contrived.

Jaap. Received on Wed Nov 29 2006 - 13:27:35 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US