Re: Sql*load with named piped giving no performance increase. What's wrong?

From: Billy Verreynne <vslabs_at_onwe.co.za>
Date: Thu, 2 Aug 2001 13:15:09 +0200
Message-ID: <9kbcvm$bpd$1@ctb-nnrp1.saix.net>

"daven" <davenport_at_hotmail.com> wrote

> we have a unix shell script that does 2 processes, a cut command of a file
> (takes out a chuck of a file) and then a execution of sql*loader program that
> loads the result of that cut.
>
> With out a named pipe process takes ~20 minutes, the cut runs first then the
> sqlloader program starts after cut finishes
>
> we create a named pipe and then have the cut program feed the sqlloader
> program we notice that the sqlloader program starts up right away. however
> the total time is still ~20 are we doing anything worng?

Nope, nothing wrong from the sounds of it. In essence you are doing parallel processing. But there are various factors to consider.

Let's say the data formatting process takes 2 minutes and the loading process takes 18 minutes. This means a 20 minutes elapse for running process 1 and then process 2 sequentially.

If you run process 1 and process 2 in parallel, you are in fact forcing process 1 slower. It needs to provide data output at the input rate of process.

You are not speeding up process 2 in anyway either. In ideal circumstances, doing process 1 and 2 in parallel, you have shaved of 2 minutes of processing time. Down from 20 minutes to 18 minutes. Not a significant increase in performance.

If process 1 takes 10 minutes and process 2 takes 15 minutes for example, the 25 minutes for running these in serial, will be cut down to 15 minutes when run in parallel. So as you can see, the execution times of each individual process are important.

In your case, I think the cut process's time contributes very little to the overall time - the loading of the data is the bottleneck in this case. Running these two in parallel thus results in no real performance increase.

To describe a scenario where making use of pipes have significantly reduced runtime. FTP'ing a 4GB EBCDIC file from a mainframe over a 10Mbit ethernet. Then converting that data to ASCII using dd. Then loading that data in Oracle. Each of these 3 processes are slow and processes every single byte (read and write) from a 4GB data chunk. Run them in parallel and you have more than a 2.7 times increase in performance. FTP into a pipe. Run dd to read from the FTP pipe and write it to another pipe. Run SQL*Loader to read from the dd's ASCII pipe and load the data into Oracle. Ain't Unix great. :-)

--
Billy

Received on Thu Aug 02 2001 - 06:15:09 CDT