Re: perform expensive calculation on a number of rows without locking all rows until all complete

From: Martin <martin.j.evans_at_gmail.com>
Date: Thu, 29 Sep 2011 10:11:24 -0700 (PDT)
Message-ID: <d6493b0e-68d6-44a2-9366-2fdba17a3ee1_at_i33g2000yqm.googlegroups.com>

On Sep 29, 5:26�pm, joel garry <joel-ga..._at_home.com> wrote:
> On Sep 29, 1:18�am, Martin <martin.j.ev..._at_gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Sep 28, 8:54�pm, Andreas Leitgeb <a..._at_gamma.logic.tuwien.ac.at>
> > wrote:
>
> > > Martin <martin.j.ev..._at_gmail.com> wrote:
> > > > I have a table entry which contains a column called recalc (value 0 or
> > > > 1). Sometimes code sets the recalc column to 1 indicating to a job
> > > > which runs at regular intervals it should update another table history
> > > > with expensive calculations for the entry. When the calculation is
> > > > complete (or when the rows with entry.recalc=1 are selected so we know
> > > > which entries to perform the calculation on) the entry.recalc column
> > > > needs to be reset to 0. As the calculation is expensive I do not want
> > > > to keep the rows in the entry table locked whilst the calculation is
> > > > performed on each entry although keeping one entry locked whilst its
> > > > calculation is performed is not so bad. Additionally, other code might
> > > > want to set entry.recalc to 1 at any time.
>
> > > Some idea:
> > > let the job run something like the following pseudocode:
> > > � �loop
> > > � � � update ... set recalc=0 where recalc=1 � AND ROWNUM=1 � returning id
> > > � � � commit
> > > � � � if none found then exit from loop
> > > � � � perform longish task for id
> > > � �end loop
>
> > > It will then find only one row, process it, then search anew
> > > for another such row. �From your description, it seems like
> > > the longish task would run long enough, that the cost of redoing
> > > the select for recalc=1 from scratch each time doesn't really
> > > matter all that much. (you do have an index on recalc, don't you?)
>
> > Interesting idea and thanks for answering but the problem is that
> > whilst the calculations are taking place per entry other processes
> > will be setting the recalc=1 in other entries so there is a
> > possibility my job would never finish. I could count how many recalc=1
> > rows there are first (and only loop at most that many times) then
> > follow your suggestion but I've no way of ensuring that if more rows
> > are set reclac=1 whilst I'm looping then I might keep missing ones at
> > the end. I guess I could fix that by changing recalc from an integer
> > to a timestamp and order by oldest first but then other processes
> > might reset the recalc column to a newer timestamp. To prevent that I
> > could change the code which sets recalc to only set it if it is null.
>
> > So, maybe there is something to look in to here.
>
> > If any one else has any ideas I'd appreciate it.
>
> > Thanks.
>
> > Martin
>
> I think if multiple recalcs can happen on a row, you need some kind of
> queuing. �Perhaps add one for each calculation, setting to zero when
> the last one is done. �I hesitate to say more, else we'll be writing a
> full-blown transaction processor.
>
> jg

Nice idea, why didn't I think of that.

If I increment recalc each time it needs a recalc then do:

select entry_id, recalc from entry where recalc > 0 into a global temp table
update entry set recalc = recalc - (select recalc from global temp table)
loop through global temp table doing expensive processing

Thanks

Martin Received on Thu Sep 29 2011 - 12:11:24 CDT

This message: [ Message body ]
Next message: geos: "sqlplus variables"
Previous message: Martin: "Re: perform expensive calculation on a number of rows without locking all rows until all complete"
Maybe in reply to: Martin: "perform expensive calculation on a number of rows without locking all rows until all complete"

Contemporary messages sorted: [ by date ] [ by thread ] [ by subject ] [ by author ]

Original text of this message