RE: Big Update on Busy Table

From: Mark W. Farnham <mwf_at_rsiz.com>
Date: Thu, 17 Jul 2014 05:37:21 -0400
Message-ID: <05a301cfa1a2$b657b950$23072bf0$_at_rsiz.com>



Hans identified a key question in 2) below.  

IF there is an easy way to identify chunks remaining to be updated and especially IF there is an efficient way to group rows in the same database block together, then doing so in chunks of at least 1,000 differs from slow-by-slow by 3 orders of magnitude. IF the driving select is cheap, this should be sufficient.  

IF there is a lot of work to the select to identify the rows that need to be updated but you can rely on the OLTP portion of the job not updating this column in a way that means you should no longer be updating it in your batch job, then creating an interim table containing the rowids to be updated indexed by a nullable key of the block of each row to be updated is a pretty nifty way to do this. Since only you will be updating the table of the list to be updated in the real table, your first select then only operates as a select on the OLTP table.  

What you deposit in the (at this point unindexed) "list of rows to be updated" table is the rowid, two columns containing the block id extracted from the rowid, one initially null, and, if variable, the new value for the column for each row, however you functionally determine that. Let's call this the ACTION table with columns RID, ALL_BLOCKS, BLOCK_ACTION, [NEW_VALUE].   Then create single column indexes on ACTION.ALL_BLOCKS and ACTION.BLOCK_ACTION. At this point it helps to know the minimum and maximum rows per block to be updated and the average.

If you don't mind the commit blocks being a bit variable, just update the stats and use the average. For a useful total number of rows, update BLOCK_ACTION to the ALL_BLOCKS value and the ALL_BLOCKS value to NULL where ALL_BLOCKS is not null. Then run your actual update where rowid in select RID from ACTION where BLOCK_ACTION is not null, update ACTION setting BLOCK_ACTION to null where BLOCK_ACTION is not null. COMMIT. Rinse and repeat until all ALL_BLOCKS is null. [IF you are updating to a constant NEW_VALUE, just leave that column out of this whole thing and use the constant.]  

I've suspended disbelief that you cannot find a useful window to do this as a monolith and/or that a monolith would just be too doggone big. (IF you can find an off hours window and the row needs to be updated frequency is not horrible, the CTAS solution already mentioned is likely good. IF you are only updating several million rows of a 10 billion row table that is probably not the case unless partition pruning comes into play.)  

Block oriented disappearing index batch nibblers are one of the design answers to avoiding slow-by-slow or excessive sized monolith operations. Remember that the general case limitation is no one else can be allowed to change the required new_value or whether a row should be updated until all the selected rows are complete. They CAN change anything else about the row. Often (but not always) when a batch operation needs to be done on a column these criteria are met.  

Slow-by-slow (TKYTE) has badness about it. Sooner or later you do need to do chunks of set operations in reasonable sizes. What is reasonable changes over time.  

mwf  

From: oracle-l-bounce_at_freelists.org [mailto:oracle-l-bounce_at_freelists.org] On Behalf Of Hans Forbrich
Sent: Wednesday, July 16, 2014 5:36 PM
To: oracle-l_at_freelists.org
Subject: Re: Big Update on Busy Table  

A couple of thoughts:

  1. Is it really going to take too long to just get it over and done with on the live table? Perhaps an off-hours one-shot transaction? I've occasionally rethought my bulk operations and realized that the overall hit might not be 'that bad', but your situation is, of course, unknown to me.
  2. How are you planing on getting those 'few thousand rows at a time'? And how are you planning on identifying those which have been done and are not do be repeated?
  3. Is this something that can be handled through PL/SQL's bulk operations? Perhaps with a "SELECT FOR UPDATE OF"? (http://morganslibrary.com/reference/plsql/array_processing.html#apbc and other areas of Morgan's Library)

/Hans

On 16/07/2014 3:07 PM, Charlotte Hammond wrote:

Hi All,  

I need to run an update of one column of millions of rows in a busy OLTP table. To prevent the transaction blocking other sessions for too long I'm planning to break it into updates of a few thousand rows at a time with a commit in between. This will prevent any one row being locked for more than about a second. However it is tending towards slow-by-slow processing: is there a better way of updating this table without creating long lived TX locks?  

Thanks!

Charlotte  

--
http://www.freelists.org/webpage/oracle-l
Received on Thu Jul 17 2014 - 11:37:21 CEST

Original text of this message