Re: Access to heterogeneous data sources - XML

From: Jim Kennedy <kennedy-downwithspammersfamily_at_attbi.net>
Date: Wed, 10 Mar 2004 15:08:31 GMT
Message-ID: <PbG3c.515065$I06.5824907_at_attbi_s01>


"Andrea M. Segovia" <andrea_at_mun.ca> wrote in message news:7589088f.0403100500.4303c01d_at_posting.google.com...
> "Jim Kennedy" <kennedy-downwithspammersfamily_at_attbi.net> wrote in message
news:<1av3c.92957$PR3.1575500_at_attbi_s03>...
> > "Andrea M. Segovia" <andrea_at_mun.ca> wrote in message
> > news:7589088f.0403080637.63c9f418_at_posting.google.com...
> > > Hello, everyone:
> > >
> > > We have a Oracle 3 tier application (using Oracle iAs to a back-end
> > > Oracle database on an Tru64 UNIX server) which also incorporates
> > > custom written analysis and mapping tools for fisheries-related data.
> > >
> > > There is interest in broadening the scope of this application to also
> > > gather data from other data sources. The problem is that the other
> > > data sources are not Oracle databases (some are MS-SQL Server, others
> > > are MS-Access databases). Moving data to an Oracle format is not a
> > > possibility, as some of these data sources are out of our
> > > organization's control. To complicate matters even further, some of
> > > the data sources are not local (ie. in remote locations in either
> > > internal or external networks from one end of the country to another).
> > >
> > > Is XML a possibility for this? What XML functionality is available
> > > through Oracle, MS-SQL server? How would I do this using XML?
> > >
> > > Regards,
> > > Andrea
> >
> > XML is basically a file format and that in and by itself isn't going to
help
> > you. Currently, it is a big buzz word. Concentrate on the business
problem
> > and what a solution to your problem would look like. This will help you
to
> > come up with solutions that would fit the problem. (instead of a hammer
and
> > everything is a nail.)
> >
> > You are basically talking about using an ETL tool. It is unlikely that
the
> > fishery data is stored in the same structures in all those different
> > databases. That is not the same table and table name with columns that
are
> > matching in data type and name. (and meanings) So for each data source
you
> > will need to know what the "transformation" is so you can read the data.
> > Major job. If the data sources are out of your control then it is
unlikely
> > you will be able to see them or connect to them.
> >
> > Your best bet is probably to treat your central system as a data
warehouse
> > and copy the data to your system periodically. How to do that? You
could
> > load a java SQLServer driver into Oracle and connect to all those
SQLServer
> > databases and grab the data.(the other problem is that things won't
always
> > be up or available to connect to.) You would still have to transform
the
> > data into some common schema etc. Again , a highly detailed task.
> > As for the MS Access databases, you could probably find some driver that
can
> > read those and use SQL to get at the data. Again you are going to have
a
> > lot of mapping to do.
> >
> > Jim
>
> Jim,
>
> Thanks for the input...I tend to agree with you that the only way to
> deal with this is to sketch out the business requirements at a high
> level and work from there. Unfortunately, my (somewhat) technically
> savvy clients don't agree!!! :-)
>
> Thinking more in terms of the future, which way would you head in
> order to deal with this type of problem? Obviously, in order to be
> able to deal with foreign data in real-time, you will need a
> standardized way of transferring the following information: meta data
> about the data itself (data dictionary type of information,
> information about the meaning/interpretation of the data), the data
> itself, and standardized database queries (such as done through ODBC,
> perhaps).....are there any bleeding-edge initiatives out there to
> support this type of functionality?
>
> Andrea [Quoted]
Well your "technically savvy" clients aren't that savvy or that technical. Sounds like they are good at throwing around tlas. (good if you are in marketing, not if you actually have to deliver something)

There is not simple solution. Just because you can retrieve the meta data doesn't mean that it is useful. Just because a table is called fish and a field is called color in two different systems doesn't mean you can merge one into another. The color field might be a character in one system with the color (red, yellow,...) the other system might be an id that relates to a color look up table. (or it might be a numeric triple RGB) So you actually have to know what each table, field, and relationships are in each system before you could even hope to "query them in real time". Sure I can read the data dictionary from Access, or MS SQLServer, or Oracle via ODBC, but it just doesn't tell me much other than the names of the tables, fields, and data types. It does not hold sufficient information to tell you what the tables are for or what the columns are for. What is the business need to query them in real time? Do the number of fish suddenly change? I think your "technically savvy" clients are rather technically naive.

Your big problem is how to extract and transform the information. One poster suggested web services. That's fine, it you can install a program on each remote site that knows how to read that specific database and present the data in some agreed upon method. (the program on each machine would be doing the ETL on demand, but each program would have to know the particular meaning of the local schema and translate that into the schema that you want.) So you still have your basic problem. The technology doesn't magically take that away.

Google the Internet for Fred Brooks No Silver Bullet or read his book The mythical Man Month. Very relevant in the development of software. Jim Received on Wed Mar 10 2004 - 16:08:31 CET

Original text of this message