Re: Data conversion between various schemas
Date: Tue, 16 Jul 2002 19:32:40 GMT
Message-ID: <3D3474DD.D83EB468_at_voicenet.com>
Paul Vernon wrote:
> > There are newer data profiling products which not only integrate
Agreed. Though at least one of the data profiling tools on the market can
infer, what they call, "cross-file dependencies."
Other products do not infer cross-file dependencies. Rather, they compare
value sets of individual pairs of attributes and provide overlap
percentages. For example, identifying how many (and which) values in
cust_id in one file exist in customer_num in another file and vice versa.
Of course, it's only one step in the redundancy process. The analyst has
to decide whether cust_id and customer_num really should be the same
attribute in the model. If so, the analyst creates a "synonym" between the
attributes. If not, the analyst moves to the next attribute pair.
Likewise, if there is little overlap between the two attributes, and there
really should be a synonym between the two, you should be able to still
create the synonym. Of course, you have another problem on your hands.
:-)
Finally, other profiling tools can perform orphan analysis on sets of
attributes. The purpose of redundancy profiling is to identify potential
foreign keys and duplicate columns across files, tables, or systems, using
the data itself, not the metadata or code.
Keep in mind, the data profiling tools do not make the normalization
decisions. The analyst has to determine which functional dependencies to
include in the model and which redundancies are true foreign keys (or
denormalized attributes). The normalization decisions still need to be
made by a user. The difference is that you're making these decisions based
on the underlying data. Once these decisions are made, though, it's very
easy for the profiling tool to normalize the sources.
> multiple databases into a
> > consolidated normalized model, they do so based on analyzing the data
> below the metadata.
> > Through inferring dependencies and redundancies across different
> sources, the analyst can model
> >the source data or map to a pre-defined fixed target.
>
> Indeed there are tools to help automate the normalisation process on
> individual data sources (my step 2).
> I'm not sure how well they work when attempting to integrate separate
> models together however. This is more a matter of matching entity
> predicates between sources rather than finding hidden functional
> dependences
Hope this helps.
-- Joseph Novella Anthem Consulting, LLC "Good ideas are not adopted automatically. They must be driven into practice with courageous patience." - Admiral Hyman RickoverReceived on Tue Jul 16 2002 - 21:32:40 CEST
