RE: Big Update on Busy Table

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Fri, 18 Jul 2014 17:37:23 -0400
Message-ID: <034401cfa2d0$76f52bb0$64df8310$_at_rsiz.com>



IF you do implement this on a busy OLTP system be very careful to set parallel_level only so high that it leaves plenty of headroom for interactive use.  

However, since this now exists it probably is worthwhile even if you use parallel_level 1, because it does the chunking for you. So sorry I forgot they built this and sent you off on the old way.  

Job and JL, thanks for the reminder. 11.2+, right? (Old habits die hard, I've been generating chunks since 5, and chunks by block with disappearing indexes since 6.)  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Job Miller (Redacted sender "jobmiller_at_yahoo.com" for DMARC) Sent: Friday, July 18, 2014 2:52 PM
To: dmarc-noreply_at_freelists.org; ORACLE-L Subject: Re: Big Update on Busy Table  

Charlotte,  

That's what dbms_parallel_execute is for:  

This package enables the user to incrementally update table data in parallel, in two high level steps:

  1. Group sets of rows in the table into smaller sized chunks.
  2. Run a user specified statement on these chunks in parallel, and commit when finished processing each chunk.

This package introduces the notion of parallel execution task. This task groups the various steps associated with the parallel execution of a PL/SQL block, which is typically updating table data.  

Examples

The following examples run on the Human Resources (HR) schema of the Oracle Database Sample Schemas. It requires that the HR schema be created with the JOB SYSTEM privilege.

Chunk by ROWID

This example shows the most common usage of this package. After calling the <http://docs.oracle.com/cd/E11882_01/appdev.112/e40758/d_parallel_ex.htm#CHD IBHHB> RUN_TASK Procedure, it checks for errors and re-runs in the case of error.

DECLARE

  l_sql_stmt VARCHAR2(1000);
  l_try NUMBER;
  l_status NUMBER;

BEGIN  
  • Create the TASK DBMS_PARALLEL_EXECUTE.CREATE_TASK ('mytask');
  • Chunk the table by ROWID DBMS_PARALLEL_EXECUTE.CREATE_CHUNKS_BY_ROWID('mytask', 'HR', 'EMPLOYEES', true, 100);
  • Execute the DML in parallel l_sql_stmt := 'update /*+ ROWID (dda) */ EMPLOYEES e SET e.salary = e.salary + 10 WHERE rowid BETWEEN :start_id AND :end_id'; DBMS_PARALLEL_EXECUTE.RUN_TASK('mytask', l_sql_stmt, DBMS_SQL.NATIVE, parallel_level => 10);
  • If there is an error, RESUME it for at most 2 times. L_try := 0; L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask'); WHILE(l_try < 2 and L_status != DBMS_PARALLEL_EXECUTE.FINISHED) LOOP L_try := l_try + 1; DBMS_PARALLEL_EXECUTE.RESUME_TASK('mytask'); L_status := DBMS_PARALLEL_EXECUTE.TASK_STATUS('mytask'); END LOOP;
  • Done with processing; drop the task DBMS_PARALLEL_EXECUTE.DROP_TASK('mytask');

END;
/  


From: Charlotte Hammond <dmarc-noreply_at_freelists.org> To: ORACLE-L <oracle-l_at_freelists.org> Sent: Thursday, July 17, 2014 4:36 PM
Subject: RE: Big Update on Busy Table  

Everyone - thanks for your suggestions!  

No, unfortunately I can't get any downtime on this table. It's very busy 24x7 - running the update as a single statement would take well over an hour and lock many rows. The update is about 400 million rows of a 2 billion row table.  

Mark - I'm doing something similar to what you suggest although my driving table uses ROWIDs rather than blocks. I'm updating 20,000 rows each time and setting a flag against these ROWIDs when it completes. One blessing is that I don't need to care what's already in the column I'm updating so it doesn't matter if it changes before or after I run each individual chunk transaction or that the total update is spread over many hours.  

Your suggestion of using blocks sounds better as that way I can localize the disk I/O for each chunk although the way I've selected the ROWIDs kind of does that anyway (but more by chance than design). I'll take your suggestions on board and look at improving this to squeeze a bit more throughput.  

I'm also looking to parallelize this by having a few processes each with its own driving table with a subset of the overall rows/blocks to be updated.  

Thanks!

Charlotte      

RE: Big Update on Busy Table

  • From: "Mark W. Farnham" <mwf_at_xxxxxxxx>
  • To: <fuzzy.graybeard_at_xxxxxxxxx>, <oracle-l_at_xxxxxxxxxxxxx>
  • Date: Thu, 17 Jul 2014 05:37:21 -0400

Hans identified a key question in 2) below.      

IF there is an easy way to identify chunks remaining to be updated and especially IF there is an efficient way to group rows in the same database block together, then doing so in chunks of at least 1,000 differs from slow-by-slow by 3 orders of magnitude. IF the driving select is cheap, this should be sufficient.      

IF there is a lot of work to the select to identify the rows that need to be updated but you can rely on the OLTP portion of the job not updating this column in a way that means you should no longer be updating it in your batch job, then creating an interim table containing the rowids to be updated indexed by a nullable key of the block of each row to be updated is a pretty nifty way to do this. Since only you will be updating the table of the list to be updated in the real table, your first select then only operates as a select on the OLTP table.      

What you deposit in the (at this point unindexed) "list of rows to be updated" table is the rowid, two columns containing the block id extracted from the rowid, one initially null, and, if variable, the new value for the column for each row, however you functionally determine that. Let's call this the ACTION table with columns RID, ALL_BLOCKS, BLOCK_ACTION, [NEW_VALUE].       Then create single column indexes on ACTION.ALL_BLOCKS and ACTION.BLOCK_ACTION. At this point it helps to know the minimum and maximum rows per block to be updated and the average.  

If you don't mind the commit blocks being a bit variable, just update the stats and use the average. For a useful total number of rows, update BLOCK_ACTION to the ALL_BLOCKS value and the ALL_BLOCKS value to NULL where ALL_BLOCKS is not null. Then run your actual update where rowid in select RID from ACTION where BLOCK_ACTION is not null, update ACTION setting BLOCK_ACTION to null where BLOCK_ACTION is not null. COMMIT. Rinse and repeat until all ALL_BLOCKS is null. [IF you are updating to a constant NEW_VALUE, just leave that column out of this whole thing and use the constant.]      

I've suspended disbelief that you cannot find a useful window to do this as a monolith and/or that a monolith would just be too doggone big. (IF you can find an off hours window and the row needs to be updated frequency is not horrible, the CTAS solution already mentioned is likely good. IF you are only updating several million rows of a 10 billion row table that is probably not the case unless partition pruning comes into play.)      

Block oriented disappearing index batch nibblers are one of the design answers to avoiding slow-by-slow or excessive sized monolith operations. Remember that the general case limitation is no one else can be allowed to change the required new_value or whether a row should be updated until all the selected rows are complete. They CAN change anything else about the row. Often (but not always) when a batch operation needs to be done on a column these criteria are met.      

Slow-by-slow (TKYTE) has badness about it. Sooner or later you do need to do chunks of set operations in reasonable sizes. What is reasonable changes over time.      

mwf      

From: oracle-l-bounce_at_xxxxxxxxxxxxx [ <mailto:oracle-l-bounce_at_xxxxxxxxxxxxx> mailto:oracle-l-bounce_at_xxxxxxxxxxxxx] On Behalf Of Hans Forbrich
Sent: Wednesday, July 16, 2014 5:36 PM
To: oracle-l_at_xxxxxxxxxxxxx
Subject: Re: Big Update on Busy Table      

A couple of thoughts:  

  1. Is it really going to take too long to just get it over and done with on the live table? Perhaps an off-hours one-shot transaction? I've occasionally rethought my bulk operations and realized that the overall hit might not be 'that bad', but your situation is, of course, unknown to me.
  2. How are you planing on getting those 'few thousand rows at a time'? And how are you planning on identifying those which have been done and are not do be repeated?
  3. Is this something that can be handled through PL/SQL's bulk operations? Perhaps with a "SELECT FOR UPDATE OF"? ( <http://morganslibrary.com/reference/plsql/array_processing.html#apbc> http://morganslibrary.com/reference/plsql/array_processing.html#apbc and other areas of Morgan's Library)

/Hans    

On 16/07/2014 3:07 PM, Charlotte Hammond wrote:  

Hi All,      

I need to run an update of one column of millions of rows in a busy OLTP table. To prevent the transaction blocking other sessions for too long I'm planning to break it into updates of a few thousand rows at a time with a commit in between. This will prevent any one row being locked for more than about a second. However it is tending towards slow-by-slow processing: is there a better way of updating this table without creating long lived TX locks?      

Thanks!  

Charlotte        

--
http://www.freelists.org/webpage/oracle-l
Received on Fri Jul 18 2014 - 23:37:23 CEST

Original text of this message