Re: Tips for migration of Records Scaling to 200 Million Rows

From: <seshan.parameswaran_at_gmail.com>
Date: Fri, 30 Nov 2007 06:49:59 -0800 (PST)
Message-ID: <ce8b3705-e202-4476-89aa-8c9987e99c69@f3g2000hsg.googlegroups.com>

On Nov 29, 6:13 pm, DA Morgan <damor..._at_psoug.org> wrote:
> seshan.parameswa..._at_gmail.com wrote:
> > Hello Group
> > Excuse the blast please. I am seeking an expert
> > opinion in Migrating data from an existing Oracle RDBMS to another.
> > The existing system is proprietory and can be accessed only via a JAVA
> > API Interface provided.
>
> If it is an Oracle system this is not true on its face. On what basis
> do you think this to be the case ... and in what version of Oracle?
> --
> Daniel A. Morgan
> Oracle Ace Director & Instructor
> University of Washington
> damor..._at_x.washington.edu (replace x with u to respond)
> Puget Sound Oracle Users Groupwww.psoug.org

Thank you all for the reply. I want to make the case more clear. So I am presenting the facts in the bulleted order

Oracle Version I would be using is 9i, and not 10g as per the latest information. The client is still trying to upgrade to 10g and doing so could change this approach to a considerable extent.

The system from which the records are migrated is propritory and even though the data structure is known via published documentation, there is some encrypted information (Hash keys) that is key to the whole migration that can be decrypted only using the provided java API. Decompiling the API is against the contract the client I am working with has with the vendor.
I have used JDBC in the past in a java multithreaded environment extensively and my experience with that was not very welcoming. I am talking about accomplishing migration of 10 million records per day at least so that the migration could be completed with in a month. I am almost positive from my intensive experience in multithreaded coding the JDBC and the Drivers for the database that for multithreaded java vm or a group of them for that matter to scale such a large scale migration requires a lot of memory, without which accomplishing such a feat is almost next to impossible, unless someone else has a different story to share.
Hence I was planning to rely on the database itself and some kind of a utility like Oracle Data Pump or for that matter simple sql prompt itself so that the java layer or any client layer other than sql can be avoided. I can use the java layer in the multithreaded mode to create sql files with parallel sql insert statements and use sql prompt to load data with the nologging option to avoid undo/redo.
In addition, I was planning to use partitioning to facilitate this operation. The reason I am trying to do this is that since I would be migrating records spanning 30 years, as the records gets inserted and say the migration reaches a target of 25 million or even before (I am not very sure of the watermark here and I am looking for the experts here for an opinion on this as well), the subsequent insertions could slow down drastically, thus affecting the migration. Hence I was planning to adopt the approach of what one would do to edit a large table with 200 million records. The recommended and the best way to do this is to create a temporary table with the records from the partition being modified and doing the modification in that temporary table. Once the modification is done, the records in the partition can be replaced by the one in the temporary table and the indexes rebuilt by resorting to a job. However, I am not sure how this applies to the insertion process.

I appreciate all your valuable inputs and hope to hear from you all on your thoughts.

Thanks in advance

Seshan Received on Fri Nov 30 2007 - 08:49:59 CST