Re: Data conversion between various schemas

From: Barry <barryw_at_databaseanswers.com>
Date: 23 Jul 2002 05:13:42 -0700
Message-ID: <55816d8a.0207230413.13e3d76a_at_posting.google.com>


An approach which I have used with some success in the past is :- 1) Define schema conversion as specifications for metadata integration, using

   a suitable product.
2) Use a data integration tool, such as SeeBeyond, to retrieve, convert and load

   the data, using the specs from Step 1.

This is a rather expensive solution, suitable for enterprise-level budgets.
If you do not have that kind of money available,($250K+), then a sensible alternative is to follow the same kind of methodology, and then use a small database for holding a Data Dictionary with mappings, and either hand-code some SQL, or use Embarcadero's product at about $25,000.   

This page on my Database Answers Web Site has some useful links, including 5
 products reviewed in a recent evaluation report of MetaData Integration products :-
http://www.databaseanswers.com/metadata_management.htm  

Barry Williams
Principal Consultant
Database Answers

Joe Novella <jnovella_at_voicenet.com> wrote in message news:<3D3474DD.D83EB468_at_voicenet.com>...
> Paul Vernon wrote:
>
> > > There are newer data profiling products which not only integrate
> multiple databases into a
> > > consolidated normalized model, they do so based on analyzing the data
> below the metadata.
> > > Through inferring dependencies and redundancies across different
> sources, the analyst can model
> > >the source data or map to a pre-defined fixed target.
> >
> > Indeed there are tools to help automate the normalisation process on
> > individual data sources (my step 2).
> > I'm not sure how well they work when attempting to integrate separate
> > models together however. This is more a matter of matching entity
> > predicates between sources rather than finding hidden functional
> > dependences
>
> Agreed. Though at least one of the data profiling tools on the market can
> infer, what they call, "cross-file dependencies."
>
> Other products do not infer cross-file dependencies. Rather, they compare
> value sets of individual pairs of attributes and provide overlap
> percentages. For example, identifying how many (and which) values in
> cust_id in one file exist in customer_num in another file and vice versa.
> Of course, it's only one step in the redundancy process. The analyst has
> to decide whether cust_id and customer_num really should be the same
> attribute in the model. If so, the analyst creates a "synonym" between the
> attributes. If not, the analyst moves to the next attribute pair.
>
> Likewise, if there is little overlap between the two attributes, and there
> really should be a synonym between the two, you should be able to still
> create the synonym. Of course, you have another problem on your hands.
> :-)
>
> Finally, other profiling tools can perform orphan analysis on sets of
> attributes. The purpose of redundancy profiling is to identify potential
> foreign keys and duplicate columns across files, tables, or systems, using
> the data itself, not the metadata or code.
>
> Keep in mind, the data profiling tools do not make the normalization
> decisions. The analyst has to determine which functional dependencies to
> include in the model and which redundancies are true foreign keys (or
> denormalized attributes). The normalization decisions still need to be
> made by a user. The difference is that you're making these decisions based
> on the underlying data. Once these decisions are made, though, it's very
> easy for the profiling tool to normalize the sources.
>
> From personal experience, I've developed integrated models incorporating
> data sources of flat files, RDBMS tables, VSAM files, and IMS segments,
> using these tools. It still is a lengthy process, but it is manageable,
> iterative, and repeatable.
>
> Hope this helps.
Received on Tue Jul 23 2002 - 14:13:42 CEST

Original text of this message