Re: perform expensive calculation on a number of rows without locking all rows until all complete

From: Martin <martin.j.evans_at_gmail.com>
Date: Thu, 29 Sep 2011 01:18:40 -0700 (PDT)
Message-ID: <60ebade9-4ba5-4ba9-aca9-cf322db3b585_at_de2g2000vbb.googlegroups.com>



On Sep 28, 8:54 pm, Andreas Leitgeb <a..._at_gamma.logic.tuwien.ac.at> wrote:
> Martin <martin.j.ev..._at_gmail.com> wrote:
> > I have a table entry which contains a column called recalc (value 0 or
> > 1). Sometimes code sets the recalc column to 1 indicating to a job
> > which runs at regular intervals it should update another table history
> > with expensive calculations for the entry. When the calculation is
> > complete (or when the rows with entry.recalc=1 are selected so we know
> > which entries to perform the calculation on) the entry.recalc column
> > needs to be reset to 0. As the calculation is expensive I do not want
> > to keep the rows in the entry table locked whilst the calculation is
> > performed on each entry although keeping one entry locked whilst its
> > calculation is performed is not so bad. Additionally, other code might
> > want to set entry.recalc to 1 at any time.
>
> Some idea:
> let the job run something like the following pseudocode:
>    loop
>       update ... set recalc=0 where recalc=1   AND ROWNUM=1   returning id
>       commit
>       if none found then exit from loop
>       perform longish task for id
>    end loop
>
> It will then find only one row, process it, then search anew
> for another such row.  From your description, it seems like
> the longish task would run long enough, that the cost of redoing
> the select for recalc=1 from scratch each time doesn't really
> matter all that much. (you do have an index on recalc, don't you?)

Interesting idea and thanks for answering but the problem is that whilst the calculations are taking place per entry other processes will be setting the recalc=1 in other entries so there is a possibility my job would never finish. I could count how many recalc=1 rows there are first (and only loop at most that many times) then follow your suggestion but I've no way of ensuring that if more rows are set reclac=1 whilst I'm looping then I might keep missing ones at the end. I guess I could fix that by changing recalc from an integer to a timestamp and order by oldest first but then other processes might reset the recalc column to a newer timestamp. To prevent that I could change the code which sets recalc to only set it if it is null.

So, maybe there is something to look in to here.

If any one else has any ideas I'd appreciate it.

Thanks.

Martin Received on Thu Sep 29 2011 - 03:18:40 CDT

Original text of this message