Re: Inserting with billion of rows faster

From: Lok P <loknath.73_at_gmail.com>
Date: Sun, 28 Mar 2021 11:29:46 +0530
Message-ID: <CAKna9Vb=rCiPvpg7kJhg1gbh-HDjpEeXxZjPPx8zD7i6fDkuYQ_at_mail.gmail.com>



Below is the sample bulk insert which we were trying to do but was running slow and thus end up committing partial data to the target is as below. So wondering what went wrong and what else can be done to make it faster? as because with the normal '/*+APPEND parallel(16)*/' hint we used to load daily ~40-50million rows into this table without any issue in presence of all those 4 indexes within ~30minutes. But now when trying the bulk collect approach with append_values , it is taking ~30minutes just for 1million rows? or is it that previously it was happening in parallel thread and now that it's happening in single thread is making all this difference in time? And if it's true then is it possible that by just putting a parallel(16) hint along with 'append_values' in the below query will give us the same throughput as it was giving us in daily run?

DECLARE   c_limit PLS_INTEGER := 1000000;

    l_done boolean;

    l_cnt number:=1;

    l_processed number:=0;

  CURSOR mycur

  IS

     SELECT ..........        FROM p, t, b

      WHERE ......;   TYPE myARRAY IS TABLE OF mycur%ROWTYPE;

  tab_data myARRAY;

BEGIN      dbms_application_info.set_module('Insert ','Starting...');

  OPEN mycur;

  LOOP          dbms_application_info.set_action('Processing '||l_cnt||' thru '||(l_cnt+c_limit-1)||' ('||l_processed||')');

     FETCH mycur BULK COLLECT INTO tab_data LIMIT c_limit;

        l_done := mycur%notfound;

     FORALL indx IN 1 .. tab_data.COUNT

        INSERT /*+ append_values nologging */

              INTO txn_pan (....)

             VALUES (tab_data (indx).c1,

                     tab_data (indx).c2,

                     tab_data (indx).c3,

                     tab_data (indx).c4,

                     tab_data (indx).c5,

                     tab_data (indx).c6.....);


     COMMIT;


         l_processed:=l_processed + SQL%ROWCOUNT;


         exit when (l_done);


        l_cnt := l_cnt + c_limit;


  END LOOP;     dbms_application_info.set_action('Processed '||l_processed||' Inserts');

END; /

On Sun, Mar 28, 2021 at 11:14 AM Lok P <loknath.73_at_gmail.com> wrote:

>
> The delete which we are now using is something as below for removing the
> duplicates from the target, but it's taking a lot of time, So thinking if
> there is any faster way to achieve it?
>
> DECLARE
>
>
> type rowidArray is table of rowid index by binary_integer;
>
> type emRec is record
>
> (rowid rowidArray);
>
>
> l_record emRec;
>
>
> l_array_size number:=10000;
>
> l_done boolean;
>
> l_cnt number:=1;
>
>
> l_processed number:=0;
>
>
> cursor c is select
>
> rp.rowid
>
> from dp, rp
>
> where .....................;
>
>
> BEGIN
>
> dbms_application_info.set_module('Deletes ','Starting...');
>
>
> open c;
>
> loop
>
> dbms_application_info.set_action('Processing '||l_cnt||' thru
> '||(l_cnt+l_array_size-1)||' ('||l_processed||')');
>
>
> fetch c bulk collect into l_record.rowid LIMIT l_array_size;
>
> l_done := c%notfound;
>
>
> forall i in 1 .. l_record.rowid.count
>
> delete from ...........
>
> where rowid = l_record.rowid(i);
>
>
> l_processed:=l_processed + SQL%ROWCOUNT;
>
>
> commit;
>
>
> exit when (l_done);
>
>
> l_cnt := l_cnt + l_array_size;
>
> end loop;
>
>
> dbms_application_info.set_action('Processed '||l_processed||' deletes
> FROM .......');
>
>
> end;
>
> /
>
>
>
> On Sun, Mar 28, 2021 at 11:07 AM Lok P <loknath.73_at_gmail.com> wrote:
>
>> Hi Listers, we have one process in which we load ~50million daily (using
>> direct path insert.. INSERT APPEND ) to a target table which holds
>> ~20billion rows(~1TB in size)) and is list -hash composite partitioned. ~2
>> list partition and ~2048 hash subpartitions. It has 4 indexes out of those
>> one is a composite primary key comprising of 5 columns.We have that data
>> load job failing since ~15-20 days without notice and it thus accumulated
>> ~billion rows and the load process is now failing with (Ora-01628 max
>> extent 32765 reached for rollback segment).
>>
>> So we thought of running it in a bulk collect method(append_values hint)
>> and commit in chunks of ~1million. But during that process we endup seeing
>> the data load is significantly slower , it was inserting ~1million rows in
>> ~40minutes. And we were not able to understand the reason but the wait
>> events were all showing "cell single block physical read" and the object
>> was the target load object. Means it was the INSERt which was struggling.
>> So we now endup having partial data loaded to the target i.e. around
>> ~40-50million loaded to the target. And as it has a primary key , we have
>> to delete the ~40-50million rows from the target table and then reload it.
>>
>> I was thinking if we should do the delete in bulk method with a chunk of
>> 1million , but again that will happen in a single thread and will be
>> significantly slow. So what is the best way to have those ~40-50million
>> data deleted from the target table(which holds a total ~20billion rows)?
>>
>> And then I was also thinking , if we could make the index unusable and
>> perform the delete and then data load(which would happen with almost zero
>> UNDo in absence of INDEXes) , but then in that case it seems that DELETE
>> will need the PK index to fetch the rowids so we cant get rid of the PK
>> index then. So what is the best approach to go for the delete and data
>> load here without breaking data consistency in the target table?
>>
>> So wanted to understand what is the best/faster approach to go for delete
>> and data load in this situation?
>>
>> Regards
>> Lok
>>
>

--
http://www.freelists.org/webpage/oracle-l
Received on Sun Mar 28 2021 - 07:59:46 CEST

Original text of this message