Re: Why does Oracle buffer the result from the select for a parallel INSERT INTO SELECT FROM?

Home -> Community -> Usenet -> c.d.o.server -> Re: Why does Oracle buffer the result from the select for a parallel INSERT INTO SELECT FROM?

From: Jaap W. van Dijk <j.w.vandijk.removethis_at_hetnet.nl>
Date: Sat, 25 Nov 2006 16:49:57 GMT
Message-ID: <4568724d.2651984@news.hetnet.nl>

On Wed, 22 Nov 2006 14:14:00 GMT, j.w.vandijk.removethis_at_hetnet.nl (Jaap W. van Dijk) wrote:

>On 22 Nov 2006 03:54:40 -0800, "sybrandb" <sybrandb_at_gmail.com> wrote:
>
>>
>>
>>On Nov 22, 11:23 am, j.w.vandijk.removet..._at_hetnet.nl (Jaap W. van
>>Dijk) wrote:
>>> Hi,
>>>
>>> I'm on Oracle 9.2.0.5, Open VMS.
>>>
>>> I perform an
>>>
>>> INSERT INTO target
>>> SELECT * FROM source
>>>
>>> If I do this noparallel, the result from the select is immediately
>>> inserted into the target.
>>>
>>> If I do this in parallel, the result of the selects from a slave set
>>> are first buffered in TEMP. After this is done, a slave set starts to
>>> insert the contents from the buffers into the target.
>>>
>>> Why this timeconsuming buffering? Can anything be done?
>>>
>>> While I am writing this, I wonder: I let Oracle decide which degree of
>>> parallellization to use. Maybe buffering only takes places if the
>>> degree differs between SELECT and INSERT? (I should test myself, but
>>> don't have the opportunity right now).
>>>
>>> Regards, Jaap.
>>
>>IMHO parallelization is pretty useless if the affected tables aren't
>>striped over several disks.
>>I remember Tom Kyte stating somewhere you should use parallelization
>>only if you can't get rid of full table scans at all.
>>Also parallelization always will use parallel server slaves and a query
>>coordinator (the original session), so I bet this is what you are
>>looking at right now: data is retrieved by slaves and inserted by the
>>coordinator.
>>And yes, this is a 'feature', while you seem to qualify it as a bug.
>>
>>--
>>Sybrand Bakker
>>Senior Oracle DBA
>>
>
>Maybe the coordinator wants to coordinate between select and insert ,
>but it doesn't insert the records itself: that is also done by a slave
>set.
>
>I took a look at the Concept Manual. There inter-operation
>parallellism is discussed, where data flows directly from one slave
>set to the next (fig 18-3 on page 18-7). What is pictured there - two
>slave sets doing a select and a sort - could IMO as easily have been
>two slave sets doing a select and an insert, without any buffering. As
>I said before: maybe the optimizer needs to know beforehand that the
>degree of select and insert is the same, so when I have the
>opportunity, I perform a test.
>
>Regarding the parallel gain: selecting also takes an amount of CPU,
>and parallellizing that improves the select noticably, even if
>everything is read from the same disk, especially if the select is
>part of the transformation step of the load of a datawarehouse, with
>lots of complex case statements in the select list, consuming lots of
>CPU.
>
>Regards, Jaap.

After some more reading and thinking: I thought that data got PIPED through the slave set processes, so the second slave set consumes immediately what the first set produces. This certainly is what figure 18-3 in the Concept manual is suggesting. But instead produce from the first slave set is buffered in totality first, and then this is processed by the second slave set. A similar difference as between the ordinary table function and the pipelined table function.

I don't see any functional impossibilities, so I wonder why Oracle didn't implement the pipelined version, which would be much faster.

Regards,

Jaap. Received on Sat Nov 25 2006 - 10:49:57 CST