Home » RDBMS Server » Server Utilities » SQL Loader parallel sessions (Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production)
SQL Loader parallel sessions [message #659930] Sun, 05 February 2017 16:54 Go to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
Hi Everyone,

I have requirement where I have to load 30-40 millions rows from a file into the database.I am supposed to use SQL Loader and the best I can think of is implementing parallel and direct path loads. Now, I am trying the skip and load rows options to create multiple parallel SQL Loader sessions. For eg:

Sql loader 1 : load first 1000000 rows
Sql loader 1 : skip = 1000000 load 10000000
...... and so on

I have the below parameters for show parallel:

NAME                                               TYPE        VALUE                                                                                                
-------------------------------------------------- ----------- ---------------------------------------------------------------------------------------------------- 
fast_start_parallel_rollback                       string      LOW                                                                                                  
parallel_adaptive_multi_user                       boolean     FALSE                                                                                                
parallel_automatic_tuning                          boolean     FALSE                                                                                                
parallel_degree_level                              integer     100                                                                                                  
parallel_degree_limit                              string      CPU                                                                                                  
parallel_degree_policy                             string      AUTO                                                                                                 
parallel_execution_message_size                    integer     16384                                                                                                
parallel_force_local                               boolean     FALSE                                                                                                
parallel_instance_group                            string                                                                                                           
parallel_io_cap_enabled                            boolean     FALSE                                                                                                
parallel_max_servers                               integer     128                                                                                                  
parallel_min_percent                               integer     0                                                                                                    
parallel_min_servers                               integer     24                                                                                                   
parallel_min_time_threshold                        string      AUTO                                                                                                 
parallel_server                                    boolean     FALSE                                                                                                
parallel_server_instances                          integer     1                                                                                                    
parallel_servers_target                            integer     48                                                                                                   
parallel_threads_per_cpu                           integer     1                                                                                                    
recovery_parallelism                               integer     4                                                                                                    


Currently I am loading data for the tables in chunk sizes of 2000000. So it takes around 10-12 parallel sessions and works fine. But I am just wondering if I can still reduce the chunks to say 100k and then run parallel sessions? From the above parameters is it that I can run at least 128 parallel sessions in a given instance?
I need an optimum number and the optimum parallel which would work magic. Could someone advise.
Just to add, I already have no indexes, no constraints etc on the tables and it is a one time activity and the target tables are all empty.

Regards
Re: SQL Loader parallel sessions [message #659931 is a reply to message #659930] Sun, 05 February 2017 17:19 Go to previous messageGo to next message
BlackSwan
Messages: 25716
Registered: January 2009
Location: SoCal
Senior Member
>I can run at least 128 parallel
Explain in detail how one accesses a single file with 128 parallel processes.
You might have a serious bottleneck pulling data from a single file.
Re: SQL Loader parallel sessions [message #659934 is a reply to message #659931] Sun, 05 February 2017 19:01 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
This process is part of a migration project and I am extracting huge files from the source database and placing them in unix machines which also hold the oracle server client. Then I make connections to the oracle DB and load the files.
And Sorry..my bad!!! I don't exactly want 128 parallel sessions. I just need advise on the number of parallel sessions I can initiate? For eg, my current file has 30 million records. I am currently 10 parallel SQL Loader process each loading 3 million data. It runs ok but I am wondering is it feasible if I create more parallel sessions with lesser chunk of rows and considering my current setup for parallel parameters, will it hold maybe 20-30 parallel processes? There are cases where my file size is nearly 100 million so I would definitely need parallel processes and I need to take considerations for the bottleneck also
Re: SQL Loader parallel sessions [message #659941 is a reply to message #659934] Mon, 06 February 2017 00:18 Go to previous messageGo to next message
Littlefoot
Messages: 21239
Registered: June 2005
Location: Croatia, Europe
Senior Member
Account Moderator
What is the source database? Oracle? MS SQL Server? DB2? ...? Maybe you could migrate data in a different manner (such as a database link). I don't have experience in that (so I don't know whether it would be faster or not; I hope so!), I'm just asking.
Re: SQL Loader parallel sessions [message #659944 is a reply to message #659930] Mon, 06 February 2017 01:05 Go to previous messageGo to next message
Michel Cadot
Messages: 65201
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

1/ What about PARALLEL option of SQL*Loader
2/ What is the maximum throughput of your disk(s) and IO subsystem?
3/ How many CPU cores have you?
4/ What is the concurrent workload?
...

Re: SQL Loader parallel sessions [message #659961 is a reply to message #659944] Mon, 06 February 2017 10:04 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
@LittleFoot : Its a sybase database. Also, I am not sure if we can create a db link between oracle and sybase.Will check that.
@Michel : Here are my comments:

1. I am using direct=true parallel=true and multithreading=true while invoking concurrent sql loaders. Do i need to do more for the parallel option?
2. I have to check with the DBA.
3. I did query v$osstat and the NUM_CPUS has value 48 and NUM_CPU_CORES 24. So i think its 24.
4. I hope you meant to ask if there would be concurrent sessions running on the db at that time. If yes then just wanted to inform that this process would be a weekend task where no process would be triggered except the above.

[Updated on: Mon, 06 February 2017 10:22]

Report message to a moderator

Re: SQL Loader parallel sessions [message #659963 is a reply to message #659961] Mon, 06 February 2017 10:23 Go to previous messageGo to next message
BlackSwan
Messages: 25716
Registered: January 2009
Location: SoCal
Senior Member
PERL (scripting language) can inter-operate against both DBs concurrently.
It can read a row from Sybase & then INSERT a row into Oracle.
Re: SQL Loader parallel sessions [message #659964 is a reply to message #659961] Mon, 06 February 2017 10:36 Go to previous messageGo to next message
John Watson
Messages: 7150
Registered: January 2010
Location: Global Village
Senior Member
abhi_orcl wrote on Mon, 06 February 2017 16:04

<snip>
1. I am using direct=true parallel=true and multithreading=true while invoking concurrent sql loaders. Do i need to do more for the parallel option?
<snip>
.
Yes, you do need more. First, you must include DIRECT=TRUE and second you must have multiple input data files and third you must launch one sqlldr session for each file.
Re: SQL Loader parallel sessions [message #659965 is a reply to message #659964] Mon, 06 February 2017 10:43 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
@John : yes. I am doing all the mentioned ones and I do get parallel sessions. Its just that I want to get an optimum value for the loads for the scenarios mentioned above.
@BlackSwan: Perl sounds promising. Will try to get more onto that.
Re: SQL Loader parallel sessions [message #659966 is a reply to message #659965] Mon, 06 February 2017 10:45 Go to previous messageGo to next message
John Watson
Messages: 7150
Registered: January 2010
Location: Global Village
Senior Member
Oh, right. Well, why not just run a few tests? Much quicker than days of speculation.
Re: SQL Loader parallel sessions [message #660035 is a reply to message #659966] Tue, 07 February 2017 15:36 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
@John: OK so here is what I tried:
Tablesize is around 2GB
total rows in table : 18431559
The datafile size is around 2-2.1GB
Dropped all indexes and constraints are all disabled all indexes. And finally tried by using the skip and load technique initially with:
2 million :- Time taken was around 195 seconds.
1 million :- Time taken was around 240 seconds.

I tried even with raising the load & skip to 4 million but that took around 5 minutes to load.
Using the below parameters:

multithreading=true direct=true parallel=true skip=<value as stated above> load=<value as stated above> readsize=20000000 streamsize=10048576 columnarrayrows=10000

I tried by changing the readsize and streamszie and columnarrayrows but no help. If i remove readsize,streamsize and columnarrays it takes nearly 190-195 seconds.
Is there anything I can do more to speed up this process. Any advise on the parameters would be appreciated.
Re: SQL Loader parallel sessions [message #660039 is a reply to message #660035] Tue, 07 February 2017 16:00 Go to previous messageGo to next message
BlackSwan
Messages: 25716
Registered: January 2009
Location: SoCal
Senior Member
Is bottleneck pulling data out of the file?
Is bottleneck pushing data into the table?

It appears you can't make it faster than just over 3 minutes.

Is input file sitting on same disk volume as Oracle DB files?

What is duration when you eliminate all the"fancy" options; like multithreading, parallel, skip, load, etc.?

What does EXPLAIN PLAN reveal?
Re: SQL Loader parallel sessions [message #660041 is a reply to message #660039] Tue, 07 February 2017 16:10 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
@BlackSwan: Thanks. I am not sure how to confirm if the bottleneck is reading the huge data file or loading the data into the table. Will look into this aspect and update. Also, input file doesn't sit on DB disk files. We have a separate unix machine with Oracle client which connects to a DB server residing on another unix server and then crate the input file on the client machine.
As stated above, if I eliminate the fancy options, it remains at the same time +-10 seconds.
Finally, I could not get the explain plan stuff you mentioned? How do I get an explain plan on a SQL Loader?
Re: SQL Loader parallel sessions [message #660042 is a reply to message #660041] Tue, 07 February 2017 16:18 Go to previous messageGo to next message
BlackSwan
Messages: 25716
Registered: January 2009
Location: SoCal
Senior Member
The SERIAL network link between client & DB Server could be bottleneck.
ELIMINATE all of Oracle completely.
What is duration to do simple OS network file copy between client & DB Server system?

Remember, you can't push a string.
Re: SQL Loader parallel sessions [message #660043 is a reply to message #660042] Tue, 07 February 2017 16:32 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
" We have a separate unix machine with Oracle client which connects to a DB server residing on another unix server and then crate the input file on the client machine." - This is wrong. Actually we have the input file is being extracted from Sybase database which resides on the same machine as the Oracle client. Once the input file is ready, we use the SQL Loader to connect to the DB server which resides on another unix machine. So basically, it seems, I can't get rid of the client server logic here as the source input files are being extracted from a different DB server.
"What is duration to do simple OS network file copy between client & DB Server system?" - Sorry for being naive but does this mean FTP/SFTP etc or using external tables concepts?
Re: SQL Loader parallel sessions [message #660045 is a reply to message #660043] Tue, 07 February 2017 18:06 Go to previous messageGo to next message
BlackSwan
Messages: 25716
Registered: January 2009
Location: SoCal
Senior Member
> Sorry for being naive but does this mean FTP/SFTP
I want a measure of how long it takes to copy the content of the Sybase text file to the DB Sever.
You can use FTP, SFTP, scp, or any other utility that moves the file content from the client to the DB Server.
If this transfer takes around 180 - 190 seconds, then the network transfer is the bottleneck & NOTHING you do with SQLLDR can reduce this duration.
Is the NIC saturated while the transfer occurs?
What is the maximum actual throughput that you can measure between these 2 systems?

BTW - you can measure the maximum rate you can pull data out of the Sybase text file by making a script with content below & then invoking it

date
cp sybase_text_file /dev/null
date

What is the duration measured by the difference between the 2 timestamps generated by the script?
You'll NEVER get data loaded faster than this duration.
Re: SQL Loader parallel sessions [message #660055 is a reply to message #660045] Wed, 08 February 2017 01:47 Go to previous messageGo to next message
s234blog
Messages: 5
Registered: February 2017
Junior Member
Check the core usage on from where you run SQL*Loader.
- are the sessions using almost fully one core each ? Are there more idle cores available ? then increasing parallelism should help.
- are the sessions using almost fully one core each ? Are there no more idle cores available ? Then this is the limiting factor - get more cores ...
- are the cores only partially used ?
This is the interesting case. Because then the bottle neck could be somewhere else.
1) Database server ? What is the top wait event in awr ?
2) network ? what bandwidth do you have ? 1 GbE can be pretty quickly exhausted - like by 4 parallel sessions
3) access to local file - although relatively unlikely, check IO.

Chris
Re: SQL Loader parallel sessions [message #660057 is a reply to message #660055] Wed, 08 February 2017 02:15 Go to previous messageGo to next message
Michel Cadot
Messages: 65201
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Quote:
2) network ? what bandwidth do you have ?
Why network? The one who does such load across network is a fool.

Quote:
3) access to local file - although relatively unlikely, check IO.
Why unlikely? And most systems I have seen IO are the bottleneck, not CPU.


Re: SQL Loader parallel sessions [message #660059 is a reply to message #660057] Wed, 08 February 2017 02:52 Go to previous messageGo to next message
s234blog
Messages: 5
Registered: February 2017
Junior Member
On 2), yes I in principle agree, but there may be situations where you have to load over the network. And if the author has say 10GbE I doubt network is his bottleneck.
On 3) - simply from experience. See here for example [...] - up to 40 SQL*Loader sessions and the bottleneck stays client CPU, not client IO, network or database (of course it depends on the environment).

Best,
Chris



[Edit MC: remove spam for blog]

[Updated on: Wed, 08 February 2017 02:55] by Moderator

Report message to a moderator

Re: SQL Loader parallel sessions [message #660060 is a reply to message #660059] Wed, 08 February 2017 02:57 Go to previous messageGo to next message
Michel Cadot
Messages: 65201
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator

Your example is for Exadata X6 and is irrelevant to this topic.

Tell me which case you encountered that required big load across network, I'd be interested in knowing this.

[Updated on: Wed, 08 February 2017 03:00]

Report message to a moderator

Re: SQL Loader parallel sessions [message #660061 is a reply to message #660035] Wed, 08 February 2017 02:58 Go to previous messageGo to next message
John Watson
Messages: 7150
Registered: January 2010
Location: Global Village
Senior Member
Have you set the tables to NOLOGGING ? That can help quite a lot with direct loads. It is also worth checking whether you are in fact getting direct load. There are various circumstances (I cannot enumerate them) that cause SQL Loader to revert to conventional path load when you don't expect it.
Re: SQL Loader parallel sessions [message #660062 is a reply to message #660060] Wed, 08 February 2017 03:16 Go to previous messageGo to next message
s234blog
Messages: 5
Registered: February 2017
Junior Member
Well - on the Exadata - yes, the server was Exadata, but the client (in part of the tests) a poor simple VM ... still CPU bottleneck and not local io. Even when having the file on an NFS share - io not the bottleneck.

Case where one needs to load huge over network - "needs" is always relative, but the hosting provider for your database may not allow any access via ftp or mounts to the machine hosting the database (in particular in a multitenant setup) - so you have to stick with SQL*net. And with 10 GbE you can load like 5TB per hour - this is even for an Exadata reaching the limits.
But coming back to the topic - key is that the author tries to find the bottleneck as described.

Best,
Chris
Re: SQL Loader parallel sessions [message #660184 is a reply to message #660062] Fri, 10 February 2017 22:44 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
Ok Guyzz.. It is really all messed up for me now Confused . To start with here are few of my findings which I could find till now:
The Sql loader logs state that the direct path with parallel option has been used - "Path used: Direct - with parallel option."
But as suggested I did try to query the explain plan being used and it says:

SQL_ID 6xdtp2w0zdjsq
--------------------
INSERT /*+ SYS_DL_CURSOR */ INTO "TABLE_X" 
("ACCT_ID","INSTR_ID","I_TRAN","D_TRD","I_REG","I_EXTL_TRAN","C_TX_CDE",
"A_TX_COST_TRD","D_SET","A_TRD_PRC","Q_TRD","A_TRD_DOLLAR","C_TRD_TRAN_T
YPE","A_BKR_COMSN","BKR_ID","A_ACCR_INT_AMT","A_REAL_GAIN_LOSS","S_ROW_U
PD","N_SYB_DB_USER","I_ACCT_ID") VALUES (trim (:KLLUBV_1),trim 
(:KLLUBV_2),trim (:KLLUBV_3),TO_TIMESTAMP(:KLLUBV_4, 'mon dd yyyy 
HH:mi:ss:ff3 AM'),trim (:KLLUBV_5),trim (:KLLUBV_6),trim 
(:KLLUBV_7),trim (:KLLUBV_8),TO_TIMESTAMP(:KLLUBV_9, 'mon dd yyyy 
HH:mi:ss:ff3 AM'),trim (:KLLUBV_10),trim (:KLLUBV_11),trim 
(:KLLUBV_12),trim (:KLLUBV_13),trim (:KLLUBV_14),trim (:KLLUBV_15),trim 
(:KLLUBV_16),trim (:KLLUBV_17),TO_TIMESTAMP(:KLLUBV_18, 'mon dd yyyy 
HH:mi:ss:ff3 AM'),trim (:KLLUBV_19),trim (:KLLUBV_20))
 
 
-------------------------------------------------------------
| Id  | Operation                | Name             | Cost  |
-------------------------------------------------------------
|   0 | INSERT STATEMENT         |                  |     1 |
|   1 |  LOAD TABLE CONVENTIONAL | TABLE_X|       |
-------------------------------------------------------------
 
Note
-----
   - cpu costing is off (consider enabling it)
   - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold
But the wait in v$session shows direct path load.
I am really not able to understand this parallel concept now. Is my Sql Loader really using the direct with parallel option as the logs and the explain plan contradict each other? Is it because I am trying to initiate 10-20 parallel sessions. Also, I can see in session tables that at any point of time, I can't see more than 2-3 insert statements related to my Sql Loader running. Seems like OS CPU issue (Linux OS has cpu as 2)???
Also, for other tables, I did try to load the tables after moving the files to the DB server using FTP and it did run pretty fast. So definitely there is some network issue. But the above is really killing me now. Pleaseeeee suggest

[Updated on: Fri, 10 February 2017 23:17]

Report message to a moderator

Re: SQL Loader parallel sessions [message #660185 is a reply to message #660184] Fri, 10 February 2017 23:22 Go to previous messageGo to next message
BlackSwan
Messages: 25716
Registered: January 2009
Location: SoCal
Senior Member
>Pleaseeeee suggest
Trust what YOU can measure & reproduce.
We can't change your reality.
Re: SQL Loader parallel sessions [message #660186 is a reply to message #660185] Fri, 10 February 2017 23:29 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
@BlackSwan : Thanks for the earlier suggestion you gave me. When I tried to transfer the file though SFTP to the DB server it took 185-190 seconds. So, there seems to be a network issue. And moreover in the wait events in v$session, it shows "SQL*Net message from client" and keeps on waiting on that.
But, the question largely remains for me is the contradiction I see above between the Sql Loader logs and the explain plan and whether really Direct parallel load is being implemented.
Re: SQL Loader parallel sessions [message #660187 is a reply to message #660186] Fri, 10 February 2017 23:54 Go to previous messageGo to next message
abhi_orcl
Messages: 39
Registered: December 2016
Member
I wish I could create a new thread for this but again I try to restate the scenario:
1. The target table has not indexes, constraints are disabled and the table is set to nologging.
2. The data file is a bit huge (2-3 GB) and resides in a different server (this has only the Oracle client) than the Oracle DB server.
3. The parallel parameters are stated in the initial post that I have mentioned. Just to mention the Linux OS has just 2 CPUs.
4. I invoke Sql loader using direct and parallel methods using skip and load methodology in the sql loader commands and try to create 10-20 parallel Sql Loader sessions as background process in the unix shell scripts.
5. While the process is running, I don't see more than 2-4 Sql Loader sessions active in the DB though in the UNIX OS all background sessions are active.
6. The SQL loader logs say it is using direct with parallel option but the explain plan shows conventional path and DOP as 1.

So, is my sql loader really using direct path insert. If yes then what does the explain plan actually imply?
Or, are ALL the processes getting serialized or only the ones which could not be paralleled because there are no resources to initiate the parallel threads? In this case, what exactly is lacking? The NUM_CPUS shows 24, CPU_COUNT parameter shows 12 and the actual CPU in the OS is 2. Parallel parameters are pretty good in numbers I suppose but still the process is lagging.
I would be greatful if someone can help me on this.
Re: SQL Loader parallel sessions [message #660188 is a reply to message #660187] Sat, 11 February 2017 01:32 Go to previous messageGo to next message
s234blog
Messages: 5
Registered: February 2017
Junior Member
Ok, you have done good tests and they show that you are limited by the network, in particular as you can load much faster locally.Did you find out what is your bandwidth ?
Regarding the direct and parallel . The only place where there is parallel is on the client side -the different loaders you run in parallel. There is not and there is not supposed to be any parallelism at the server side. All the parallel parameters at the db side don't matter.
It's using direct. The SYS_DL_CURSOR shows.i admin the " conventional path" is a bit misleading".
So all looks like your limitation is bandwidth - ask your network guys. Is guess 1 GbE ?
Chris
Re: SQL Loader parallel sessions [message #660189 is a reply to message #660188] Sat, 11 February 2017 01:35 Go to previous messageGo to next message
s234blog
Messages: 5
Registered: February 2017
Junior Member
And sorry to complete the picture can you still tell us how many cores you have at the client side ( where you run the loader ) and how busy they are during load ? (top)
Re: SQL Loader parallel sessions [message #660192 is a reply to message #660189] Sat, 11 February 2017 10:59 Go to previous messageGo to next message
BlackSwan
Messages: 25716
Registered: January 2009
Location: SoCal
Senior Member
>I am really not able to understand this parallel concept now.
Are you unwilling or incapable to actually Read The Fine Manual yourself?

https://docs.oracle.com/database/121/SUTIL/GUID-3081A258-0C23-40B0-8487-9C7A0D248E23.htm#SUTIL921

You can easily test "parallel" processing at OS level with what you have now.

1) make a copy of the raw input file on the client
2) use "split" command to produce 4 equal sized files.
3) create script file that contains 4 sftp command; where the first 3 are invoked into the background & the 4th sftp is done in foreground.
4) measure total elapsed time for all four sftp to complete & post it back here.
Re: SQL Loader parallel sessions [message #664673 is a reply to message #659930] Thu, 27 July 2017 02:36 Go to previous message
maranchi83
Messages: 1
Registered: November 2010
Location: Kolkata
Junior Member

All of it depends on following,

1. N/W Bandwidth
2. Average ROW Size
3. How fast read is possible at Source.
4. How fast is your Target (In case of Exa its best Smile )

I am using Named Pipe Approach to do the Data Migration from DB2 to Oracle, along with some other performance improvement mechanism like Partitioning the Target Tables and Loading the partitions using multiple SQLLDR sessions. With 1 GBPS N/W bandwidth, i can migrate 100 M rows having Avg Row size as 50-60 bytes in 6-8 Minutes. Number of parallel SQLLDR sessions are 10.

No Index, Key on the Target Table but Table is in LOGGING mode.

There is No Intermediate File getting created.
Previous Topic: issue wit impdp
Next Topic: datapump export import
Goto Forum:
  


Current Time: Tue Nov 21 20:02:31 CST 2017

Total time taken to generate the page: 0.09171 seconds