Re: perform expensive calculation on a number of rows without locking all rows until all complete

From: joel garry <joel-garry_at_home.com>
Date: Thu, 29 Sep 2011 09:26:06 -0700 (PDT)
Message-ID: <885c8748-66b3-467d-9495-95e2b41cd681_at_fx14g2000vbb.googlegroups.com>



On Sep 29, 1:18 am, Martin <martin.j.ev..._at_gmail.com> wrote:
> On Sep 28, 8:54 pm, Andreas Leitgeb <a..._at_gamma.logic.tuwien.ac.at>
> wrote:
>
>
>
> > Martin <martin.j.ev..._at_gmail.com> wrote:
> > > I have a table entry which contains a column called recalc (value 0 or
> > > 1). Sometimes code sets the recalc column to 1 indicating to a job
> > > which runs at regular intervals it should update another table history
> > > with expensive calculations for the entry. When the calculation is
> > > complete (or when the rows with entry.recalc=1 are selected so we know
> > > which entries to perform the calculation on) the entry.recalc column
> > > needs to be reset to 0. As the calculation is expensive I do not want
> > > to keep the rows in the entry table locked whilst the calculation is
> > > performed on each entry although keeping one entry locked whilst its
> > > calculation is performed is not so bad. Additionally, other code might
> > > want to set entry.recalc to 1 at any time.
>
> > Some idea:
> > let the job run something like the following pseudocode:
> >    loop
> >       update ... set recalc=0 where recalc=1   AND ROWNUM=1   returning id
> >       commit
> >       if none found then exit from loop
> >       perform longish task for id
> >    end loop
>
> > It will then find only one row, process it, then search anew
> > for another such row.  From your description, it seems like
> > the longish task would run long enough, that the cost of redoing
> > the select for recalc=1 from scratch each time doesn't really
> > matter all that much. (you do have an index on recalc, don't you?)
>
> Interesting idea and thanks for answering but the problem is that
> whilst the calculations are taking place per entry other processes
> will be setting the recalc=1 in other entries so there is a
> possibility my job would never finish. I could count how many recalc=1
> rows there are first (and only loop at most that many times) then
> follow your suggestion but I've no way of ensuring that if more rows
> are set reclac=1 whilst I'm looping then I might keep missing ones at
> the end. I guess I could fix that by changing recalc from an integer
> to a timestamp and order by oldest first but then other processes
> might reset the recalc column to a newer timestamp. To prevent that I
> could change the code which sets recalc to only set it if it is null.
>
> So, maybe there is something to look in to here.
>
> If any one else has any ideas I'd appreciate it.
>
> Thanks.
>
> Martin

I think if multiple recalcs can happen on a row, you need some kind of queuing. Perhaps add one for each calculation, setting to zero when the last one is done. I hesitate to say more, else we'll be writing a full-blown transaction processor.

jg

--
_at_home.com is bogus.
Give a man a fish, you've fed him for a day.
Teach a man to fish, and you will have misleading database
aggregation.  http://www.signonsandiego.com/news/2011/sep/26/two-popular-socal-bass-fisheries-falling-apart/
Received on Thu Sep 29 2011 - 11:26:06 CDT

Original text of this message