Re: Performance of Multiple Inserts

From: Thomas J Kyte <tkyte_at_us.oracle.com>
Date: 1995/08/11
Message-ID: <40fvkj$lvr_at_inet-nntp-gw-1.us.oracle.com>


Will Kooiman, kooiman_at_interserv.com wrote:

>> droberts_at_seas.gwu.edu (David Roberts) writes:
>> A friend of mine has written a simple Oracle program that he thinks
>> should run in a couple of hours, that now appears to need 48 days to
>> complete! I offered to ask for advice here....
>>
>> The program is a simple load utility, loading a lot of payroll data into
>> a database. The program is generating test data, so it just does
>> inserts. At present, it COMMITs after every insert. In an effort to
>> speed things up, the program was run simultaneously from four clients
>> accessing the same server, and there was about 3 x speedup, but we are
>> still talking about 16 days to build our test database!
>>
>> A couple of questions--
>>
>> 1. I suspected that the program might run faster if it committed about a
>> whole pageful of data and then COMMITted, rather than issuing a COMMIT
>> after each INSERT. What do you think?

You should use array inserts in PRO*C, especially in a networked environment. Some psuedo code:

EXEC SQL BEGIN DECLARE SECTION;
    int empno[100];
    char ename[30][100];
EXEC SQL END DECLARE SECTION;     for( i = 0; i < n; i++ )
    {

        empno[i] = ....;
        strcpy( ename[i], "fdafda" );

    }
    EXEC SQL FOR :n INSERT INTO T values (:empno, :ename );     EXEC SQL COMMIT WORK; This way, 100 rows (or less) = one network message, not 1 row = 1 message. The pl/sql below does more or less the same, but would try to update and then insert. If you are just loading, using the above insert, without pl/sql, would be faster (only one statement every gets executed to insert 100 rows, and all 100 rows get sent over). Orders of magnitude speedup will be achieved by doing this.

Batching by blocksize would have little affect. It would be very difficult for you to determine how many rows fit on a block anyway. See below on how to limit contention for new blocks.

>>
>> 2. Would that same strategy help for the multiple-client case? Are the
>> multiple clients each going to work with the same page of data, hence
>> have to wait on each other, or will Oracle give each of them a separate page?
>>

For doing N concurrent inserts into a table, it is advised you look at the freelist parameter on the create table. In general, people won't wait for eachother to get new blocks, only when the table needs to get a free block will they wait on eachother (and ***only as long as it takes to get the block***, ****not***** for the whole transaction). By setting up multiple freelists, you remove this small wait all together. If you want 5 loaders going, set freelists to 5. Zero contention for any resource at that point.

>> Thanks
>>
>> Dave Roberts
>>
>>>>>
 

>Here are a couple of ideas:
 

>If you are inserting only (i.e. not updating rows), use SQL*Loader, preferrably with the
>direct-path option. I know of a site that loaded 60 million rows in 45 minutes using this
>option.

Agreed most wholeheartedly. The above speed is not abnormal nor exceptional. With the direct path loader, you will get massive amounts of data into the database very fast. Look at parallel direct path on SMP machines (even single cpu machines will benefit from parallel direct path loading due to IO contention).

Also, the conventional path loader does ARRAY inserts as coded above. You have complete control over the batch size (you control the array size and how often commits are performed). SQL*Loader can load virtually any input file and generates a really nice little report after its done to show you what it did. Also, it can handle 'bad' records, records rejected by the database without stopping (it will record them to a bad file, you can clean them up and then load them).

Using the conventional path loader, you can achive 100's of rows per second into heavily indexed tables that are concurrently being read and updated by other processes.

>If you need to update the rows if they exist, otherwise insert, use Pro*C with PL/SQL
>and array processing. Here's some pseudocode:
 

>while( not done )
>{ read 500 records
 

> exec sql execute
> begin
> for i = 1 .. 500 loop
> update table
> set col1 = :col1(i),
> col2 = :col2(i),
> .
> .
> where pk_col = :pk_col(i);
 

> if sql%rowcount = 0 then
> insert into table
> (col1, col2, ...)
> values
> (:col1(i), :col2(i));
> end if;
> end loop;
 

> commit;
> end;
> end-exec;
>)
 

>Array processing helps performance a lot. I've seen it reduce execution time in
>a networked environment by 2-3 times.

Or even more....

>By all means, do not commit after each insert. You should see a significant improvement
>by "batching" commits.

Very true

>You can drop the indexes, load the data, and recreate the indexes, but this
>will only improve performance a little. If you drop the indexes, it's always possible
>they might not get recreated. I'd trade peace of mind and simplicity for a couple of percent
>of performance any day.

>A BIG mistake to avoid is delete/insert. I've seen a lot of code delete the row they want
>to insert (in case it already exists), and then insert the row. This will hurt your performance
>a lot. If you have existing data in the table, it's much faster to update the row, and then insert
>if not found like in the example above.
 

>Another thing to check is indexes. If you are doing an update...where, make sure you are
>doing indexed searches on the update..

>Hope this all makes sense.
 

>If you have any questions, feel free to email me.
 

>Will Kooiman,
>Computer Systems Authority.

Thomas Kyte
tkyte_at_us.oracle.com
Oracle Government Received on Fri Aug 11 1995 - 00:00:00 CEST

Original text of this message