Re: Merging 2 "almost" identical databases.

From: DA Morgan <damorgan_at_psoug.org>
Date: Tue, 30 Sep 2008 07:52:11 -0700
Message-ID: <1222786327.983030@bubbleator.drizzle.com>


Neomusashi wrote:
> On 29 sep, 14:38, DA Morgan <damor..._at_psoug.org> wrote:

>> Neomusashi wrote:
>>> Hi,
>>> So my work for the next few weeks is sorting out how to merge 2 almost
>>> identical databases used by 2 almost identical applications into one
>>> application with one database. These applications were developed for a
>>> client.
>>> Why 2 DB' s and 2 Applications? Well it’s a legacy decision. My
>>> predecessors decided it should be that way and now it ' s up to me to
>>> fix this. They needed small application/functional differences for
>>> other distribution channels and made a clone of the original app. .
>>> Now several years later they would like to merge the two apps into
>>> one, but of course they grew apart in functionality and structure. No
>>> big differences, but enough to make this a nice project ... .
>>> Short description:
>>> Each application runs on a sun solaris 10 server in a websphere 6.0
>>> application server (J2EE apps of course) and an orcale 10 database is
>>> used. Database queries are done using Stored Procedures. Both have 2
>>> schemas. One schema has the core data in about 20 tables and the other
>>> contains about 9 generic servers and about tables. The tables purposes
>>> don’ t really differ from each other, but columns,  DDL and data load
>>> do.
>>> What I need to come up with is a merging process we can deliver to our
>>> client so they can easily merge the two databases.  The process has
>>> two deliver a clear report.
>>> I was thinking of three possible ways to go:
>>> 1. Shell scripts with logging:
>>> pros: basic, quickest?
>>> Cons: Debugging, testing, complex, report/logging not so fancy/clear,
>>> inhouse expertise
>>> 2. Write a mini J2EE app:
>>> pros: easy to debug, easier to make?, better report/logging
>>> Cons: very slow to develop, work to throw away.
>>> 3. Write a java app:
>>> pros: easy to debug, easy to make, better report/logging
>>> cons: work to throw away, slow to develop
>>> I’ m feeling most for the 3th solution. How would you approach this?
>> I have a lot of questions from what you've written. Here they are in
>> no particular order.
>>
>> 1. After merger will it be necessary for the data to have some column
>> indicating its origin to keep track of its historical origin?
>>
>> 2. When you say "almost identical applications" are the changes to the
>> application only or also to the schemas? If the schemas to column
>> definitions such as names, data types, etc.?
>>
>> 3. Is it possible that primary key and other unique constraints may
>> be violated during a merger?
>>
>> 4. Given that reality is that very few databases have clean data
>> what are the chances of data in one or both of the schemas, at a
>> customer site, need cleansing?
>>
>> 5. Is there any chance that a customer has, on their own, made any
>> changes to the definitions in any manner such as adding indexes or
>> tables to support ad hoc reporting or internally developed functionality?
>> --
>> Daniel A. Morgan
>> Oracle Ace Director & Instructor
>> University of Washington
>> damor..._at_x.washington.edu (replace x with u to respond)
>> Puget Sound Oracle Users Groupwww.psoug.org

>
> Daniel,
>
> First let me tell you that I' m only working on the application for
> about 3 weeks. So I got a lot of catching up to do.
>
> 1. We have been thinking about adding some sort of flag indicator or a
> table containing a flag and foreign key link. We suspect the data on
> itself won' t be enough to tell wether the object' s are linked to one
> or the other application.
>
> 2. We are currently using Toad (www.toadsoft.com) to compare the
> schemas. Both applications use the same schema structures. Tables and
> columns show only minor differences. For what I' ve seen so far some
> tables from application A (lets give it a name) contain extra fields
> used by a component connected to a back office application/component.
> This component is also present in the second application, but it isn'
> t "used". So merging these tables will imply that the merged table
> will probably be a reflection of the table in application A with some
> empty columns for the records of application B.
> Datatypes, for so far I' ve seen, only show minor differences. 9/10
> cases it' s a size issue.
>
> 3. We are looking in to that. Again we hope Toad will give us a better
> view on that. For so far we saw, no constraint issues were found.
>
> 4. Indeed some cleansing will be needed. For example the tables
> containg user data. Our client gave one employee several application
> users in application A and only 1 in application B. Reason was
> business related. A user in application B is coupled to a working area
> and one employee could be responsible for several working area' s. In
> application B a working area isn' t used, so one user per employee.
> Our client used this "trick" in appliaction A so they wouldn' t need
> to provide resources for an application change... :) .
>
> 5. No. The application are maintained by us.
>
> I think this is interesting case study :) . I' m looking forward to
> how we are going to solve this.

Based on your answers it look straight forward.

Keep the tables from Application A modifying column definitions, only where necessary, to reflect the larger length. Add appropriate constraints so that bad data will fail, use DBMS_ERRLOG to create a table to hold constraint violations and then use the technique used here: http://www.psoug.org/reference/dbms_errlog.html.

Always run through first in a test environment before doing it in production and always have a fresh reliable backup, or flashback logs, before beginning.

-- 
Daniel A. Morgan
Oracle Ace Director & Instructor
University of Washington
damorgan_at_x.washington.edu (replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Received on Tue Sep 30 2008 - 09:52:11 CDT

Original text of this message