Re: Managing transations involving thousands of records

From: <xhoster_at_gmail.com>
Date: 04 Jan 2006 01:06:16 GMT
Message-ID: <20060103200616.145$G8@newsreader.com>

"jortizclaver" <jortizclaver_at_gmail.com> wrote:
> Hi,
>
> My application needs to upload some thousands of records into a
> database.

Where are they before they are in the database?

> Each record insertion involves some validation process and
> more than one insert operation into the database. Each record is
> independent of each other so if one insertion fails,

Why would an insertion fail? (I don't mean in general, I mean in your specific case.)

> process can go on
> with the rest of the records.

This statement indicates that the logical transaction is the individual record, at least for the database they are being stuck into. But what about where they are coming from? Is the logical transaction at that end each record or is it the entire batch?

>
> In this scenario, four different ways of implementing transaction
> management came to my mind:

> A. Single transaction
> B. Multiple transaction
> - For each record
> C. Commit in blocks
> D. Using savepoint

> A. Process takes too long to complete so a rollback at the end would
> imply to restart the process.

Even worse, it never gets done. Why restart if the same thing is just going to happen again, causing another rollback?

> Also, I suppose a commit of thousands of
> change could be really painfull for the database.

Nope. Holding the undo until the commit, and holding the locks until the commit, could be painful. Doing the commit would not be.

> B. The use of one transaction for each record is the easiest way to
> implement it but it sounds like pretty intensive for the database.

Yes, it will be intensive. Then again, using a transactional database in the first place is intensive. Often that intensity is worth it. Anyway, I wouldn't sell it short before actually trying it (or at least doing a back of the envelope calculation.) Provided, of course, that it is compatible with the logical transaction on the other end of the pipeline (i.e. where the records are coming from.)

> C. This option allows me to make a commit each 100 records, so it seems
> like it improves the performance of the process. In the the other hand,
> if a record fails, I'd reject some other records valid for my system.

A is simply C with N set to 10,000 (or however many records you have) rather than 100. However you cure A so that it would work would also cure C. (I suspect this is what you mean by combination between C and D).

>
> D. I perceive this last one, or at least a combination between C and D,
> like the best one but I have some concerns about the performance of the
> use of the savepoints feature.

Yes, savepoints will slow things done somewhat, but not nearly as much as constantly committing. Probably less than 5% for realistic data. I wouldn't worry about it.

Xho

-- 
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service                        $9.95/Month 30GB

Received on Tue Jan 03 2006 - 19:06:16 CST