Re: Data conversion between various schemas

From: Paul Vernon <paul.vernon_at_ukk.ibmm.comm>
Date: Tue, 16 Jul 2002 10:51:10 +0100
Message-ID: <ah0qor$10f2$1_at_sp15at20.hursley.ibm.com>


> I would like to ask, if there exist some papers with the topic of
general
> DB conversion according to different source and target schemas.

I don't know of any such work, but would be v interested in any leads people have.
Obviously books on data warehousing sort of cover this, but I find that most DW book don't have a lot of theory in them ;-)

>I need to create system that converts data between two different schemas.
>I have fixed target schema and various source schemas, that represent
>some subset (not always) of information that can be expressed by the
target
>schema.

>I know I can't guarantee non-lossy conversion, i.e. when there are
different
>subsets of attributes, non-compatible functional dependencies and so on.
>But has someone thought about some subset of operations that is
inevitable
>to implement to provide tool, that can be helpful in any case that should
arise
>during conversion process ?

Very briefly, my thoughts here are.

  1. Get each source system into 1NF
  2. Use Relational Algebra (or SQL if that is all you have to hand..) to normalise and constrain each source individually 2a. Add temporal attributes to you source models, then normalise and constrain again. This will let you store history in the DW.
  3. For each normalised, temporal source model, use Views (i.e. Relational Algebra again) to: 3a. Join to multiple 'lookup' tables to map source domains to common (target) domains 3b. Rename source attributes to common target names
  4. Select a common subset of attributes from each source model and UNION them all together 4a. Use aggregation over the source models if the data cannot be consolidated at the detail level

That's the basic outline. Life gets more complex when you get different sources stating contradictory facts.
Note that there is no information loss as the data from each source is stored in individual normalised source models.

Regards
Paul Vernon
Business Intelligence, IBM Global Services Received on Tue Jul 16 2002 - 11:51:10 CEST

Original text of this message