Re: perform expensive calculation on a number of rows without locking all rows until all complete
From: Martin <martin.j.evans_at_gmail.com>
Date: Thu, 29 Sep 2011 10:11:24 -0700 (PDT)
Message-ID: <d6493b0e-68d6-44a2-9366-2fdba17a3ee1_at_i33g2000yqm.googlegroups.com>
On Sep 29, 5:26 pm, joel garry <joel-ga..._at_home.com> wrote:
> On Sep 29, 1:18 am, Martin <martin.j.ev..._at_gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Sep 28, 8:54 pm, Andreas Leitgeb <a..._at_gamma.logic.tuwien.ac.at>
> > wrote:
>
> > > Martin <martin.j.ev..._at_gmail.com> wrote:
> > > > I have a table entry which contains a column called recalc (value 0 or
> > > > 1). Sometimes code sets the recalc column to 1 indicating to a job
> > > > which runs at regular intervals it should update another table history
> > > > with expensive calculations for the entry. When the calculation is
> > > > complete (or when the rows with entry.recalc=1 are selected so we know
> > > > which entries to perform the calculation on) the entry.recalc column
> > > > needs to be reset to 0. As the calculation is expensive I do not want
> > > > to keep the rows in the entry table locked whilst the calculation is
> > > > performed on each entry although keeping one entry locked whilst its
> > > > calculation is performed is not so bad. Additionally, other code might
> > > > want to set entry.recalc to 1 at any time.
>
> > > Some idea:
> > > let the job run something like the following pseudocode:
> > > loop
> > > update ... set recalc=0 where recalc=1 AND ROWNUM=1 returning id
> > > commit
> > > if none found then exit from loop
> > > perform longish task for id
> > > end loop
>
> > > It will then find only one row, process it, then search anew
> > > for another such row. From your description, it seems like
> > > the longish task would run long enough, that the cost of redoing
> > > the select for recalc=1 from scratch each time doesn't really
> > > matter all that much. (you do have an index on recalc, don't you?)
>
> > Interesting idea and thanks for answering but the problem is that
> > whilst the calculations are taking place per entry other processes
> > will be setting the recalc=1 in other entries so there is a
> > possibility my job would never finish. I could count how many recalc=1
> > rows there are first (and only loop at most that many times) then
> > follow your suggestion but I've no way of ensuring that if more rows
> > are set reclac=1 whilst I'm looping then I might keep missing ones at
> > the end. I guess I could fix that by changing recalc from an integer
> > to a timestamp and order by oldest first but then other processes
> > might reset the recalc column to a newer timestamp. To prevent that I
> > could change the code which sets recalc to only set it if it is null.
>
> > So, maybe there is something to look in to here.
>
> > If any one else has any ideas I'd appreciate it.
>
> > Thanks.
>
> > Martin
>
> I think if multiple recalcs can happen on a row, you need some kind of
> queuing. Perhaps add one for each calculation, setting to zero when
> the last one is done. I hesitate to say more, else we'll be writing a
> full-blown transaction processor.
>
> jg
Date: Thu, 29 Sep 2011 10:11:24 -0700 (PDT)
Message-ID: <d6493b0e-68d6-44a2-9366-2fdba17a3ee1_at_i33g2000yqm.googlegroups.com>
On Sep 29, 5:26 pm, joel garry <joel-ga..._at_home.com> wrote:
> On Sep 29, 1:18 am, Martin <martin.j.ev..._at_gmail.com> wrote:
>
>
>
>
>
>
>
>
>
> > On Sep 28, 8:54 pm, Andreas Leitgeb <a..._at_gamma.logic.tuwien.ac.at>
> > wrote:
>
> > > Martin <martin.j.ev..._at_gmail.com> wrote:
> > > > I have a table entry which contains a column called recalc (value 0 or
> > > > 1). Sometimes code sets the recalc column to 1 indicating to a job
> > > > which runs at regular intervals it should update another table history
> > > > with expensive calculations for the entry. When the calculation is
> > > > complete (or when the rows with entry.recalc=1 are selected so we know
> > > > which entries to perform the calculation on) the entry.recalc column
> > > > needs to be reset to 0. As the calculation is expensive I do not want
> > > > to keep the rows in the entry table locked whilst the calculation is
> > > > performed on each entry although keeping one entry locked whilst its
> > > > calculation is performed is not so bad. Additionally, other code might
> > > > want to set entry.recalc to 1 at any time.
>
> > > Some idea:
> > > let the job run something like the following pseudocode:
> > > loop
> > > update ... set recalc=0 where recalc=1 AND ROWNUM=1 returning id
> > > commit
> > > if none found then exit from loop
> > > perform longish task for id
> > > end loop
>
> > > It will then find only one row, process it, then search anew
> > > for another such row. From your description, it seems like
> > > the longish task would run long enough, that the cost of redoing
> > > the select for recalc=1 from scratch each time doesn't really
> > > matter all that much. (you do have an index on recalc, don't you?)
>
> > Interesting idea and thanks for answering but the problem is that
> > whilst the calculations are taking place per entry other processes
> > will be setting the recalc=1 in other entries so there is a
> > possibility my job would never finish. I could count how many recalc=1
> > rows there are first (and only loop at most that many times) then
> > follow your suggestion but I've no way of ensuring that if more rows
> > are set reclac=1 whilst I'm looping then I might keep missing ones at
> > the end. I guess I could fix that by changing recalc from an integer
> > to a timestamp and order by oldest first but then other processes
> > might reset the recalc column to a newer timestamp. To prevent that I
> > could change the code which sets recalc to only set it if it is null.
>
> > So, maybe there is something to look in to here.
>
> > If any one else has any ideas I'd appreciate it.
>
> > Thanks.
>
> > Martin
>
> I think if multiple recalcs can happen on a row, you need some kind of
> queuing. Perhaps add one for each calculation, setting to zero when
> the last one is done. I hesitate to say more, else we'll be writing a
> full-blown transaction processor.
>
> jg
Nice idea, why didn't I think of that.
If I increment recalc each time it needs a recalc then do:
select entry_id, recalc from entry where recalc > 0 into a global temp
table
update entry set recalc = recalc - (select recalc from global temp
table)
loop through global temp table doing expensive processing
Thanks
Martin Received on Thu Sep 29 2011 - 12:11:24 CDT