RE: Fasted Way to Unload Data - Straight SQL query vs. PL/SQL and dbms_output vs. PL/SQL and utl_file

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Mon, 4 Oct 2010 07:00:09 -0400
Message-ID: <B067254A39DF46C492F21433272289F3_at_rsiz.com>



Have you tried the concatenation into a single column in the select?  

As in

   select dummy||'|'||dummy from dual;  

?  

Presuming the requirement for the vertical bar (aka "pipe") is that pipe is forbidden in the output data and is used for parsing later in the process downstream from you, this should give you the minimum output length per row and does not require changing colsep. I'm quite surprised changing colsep changes the performance, by the way.  

Is your client present directly on the server?  

Have you set pagesize 0 so you just get the datarows out with no heading?

Are you redirecting the output to a file, or are you spooling?

Is termout on or off?  

Regards,  

mwf  


From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Mark Strickland
Sent: Monday, October 04, 2010 3:00 AM
To: oracle-l_at_freelists.org
Subject: Re: Fasted Way to Unload Data - Straight SQL query vs. PL/SQL and dbms_output vs. PL/SQL and utl_file  

Followup: I put on my cowboy hat and decided to quickly upgrade to 11gR2 just to see if there would be a performance improvement. Indeed, the improvement is about 3x. It seems that 11gR2 SQL*Plus is faster than 10gR2 at processing Unicode data. Here are the scenarios I tested with 10g:  

Scenario                          Duration     Writes/sec Output File Size

=============================== ============= =========== ================

1) As-Is                          24:01 mins      850-KB      1,054,839-KB

2) nunset NLS_LANG                 5:34 mins     4400-KB        948,144-KB

3) set colsep '|'               1:14:54 hours    4700-KB     17,600,222-KB

4) set colsep '|'

   with explicit column lengths 47:27 mins 1050-KB 2,680,108-KB

5) set arraysize 20000 25:15 mins 805-KB 1,054,839-KB

As you can see, handling Unicode data is actually where the bottleneck is. Concatenation of pipes is also a bottleneck, but the increase in throughput by setting colsep was offset by the increase in the size of the output.

So, I'm satisfied with the 3x performance improvement of 11gR2.

One note about third-party commercial/free share/shareware Oracle unload tools, they would have to be executable in Cygwin. I tried one tool and it wouldn't run in Cygwin and I didn't have time to figure out why.

Mark

On Wed, Sep 29, 2010 at 9:49 AM, Mark Strickland <strickland.mark_at_gmail.com> wrote:

Oracle 10gR2 on Windows 2008R2 with Cygwin and Bash Scripts, 4 CPUs and 4-GB RAM I've written a horrible set of ETL bash scripts for a customer project and the final script does the "transforms". It's a set of 14 queries that each join 10-12 tables and write the output in pipe-delimited format. Lines are up to 1000 bytes wide. I've set DOP on the tables to 16 in order to get more PGA for hash joins. The queries themselves seems to run reasonably fast (I've tuned them as well as I can, I think) but the output is the problem. I have to concatentate '|' between the fields because the next step in the process requires that. Straight SQL output can write about 775-KB per second to disk. I got a PL/SQL procedure from Laurent Schneider's blog that uses dbms_output and, once it starts writing, it writes about 3.5-MB per second, a vast improvement. However between the time when the query heavy lifting "appears" to have completed (buffer gets stop increasing, TEMP tablespace writes drop to nothing, USERS tablespace reads drop to nothing) and when the output starts writing, several hours pass. I'm assuming that it is buffering all the output (about 8-Gb) before it starts writing. I see a lot of writes to the Windows pagefile, but little or no writes to the TEMP tablespace. There certainly isn't enough memory to buffer 8-GB. I will also try the utl_file approach. So, I'm looking for at least anecdotal evidence for the best approach here. I'm under a very tight deadline and am doing as much of my own testing as I can get done, but I'm hoping that others in the list will know right off how to approach this the best way. Thanks in advance.

Mark Strickland
Seattle, WA
'  

--
http://www.freelists.org/webpage/oracle-l
Received on Mon Oct 04 2010 - 06:00:09 CDT

Original text of this message