Re: data warehousing discussion

From: David Donigian <donigian_dave_at_bah.com>
Date: 1995/11/22
Message-ID: <donigian_dave-2211951520570001_at_156.80.155.107>#1/1


In article <48afua$ala_at_independence.ecn.uoknor.edu>, nnaas_at_mailhost.ecn.uoknor.edu (Nadia Naas) wrote:

> William Inmon introduced the phrase "data warehouse" in 1990. He defined it
> as a managed database in which the data is:
>
> * subject-oriented
>
> * integrated
>
> * time-variant
>
> * nonvolatile
>
> This last characteristic of the DW defined by Inmon cannot always be applied;
> the DW isn't completely read-only.

I've only implemented one data warehouse, but I think Inmon is correct. The idea of the data warehouse is to hold aggregated/derived data from multiple sources. For example, a men's clothing retail company might want to track total sales, by month, of tie sales from all of its stores, each having a different POS system. The data for this is derived/aggregated from all transactions from all stores where a tie was purchased. You may offer the capability to alter these sales/month numbers, but that change is not reflected in the underlying data, i.e., tie-sales transactions will not be created in the various stores' POS systems. There may be counter-examples and I would like to hear them.

> In addition, Inmon says:
> "The technology supporting backup and recovery, transaction and data
 integrity,
> and the detection and remedy of deadlock is quite complex and unnecessary for
> for data warehouse processing.".
> Backup and recovery seem as an important issue in data warehousing for me.

Again, because the data is derived, it can be regenerated at any time. You would probably want to back data up, because recovery is probably faster than re-creation, but it is not absolutely critical as it is in an OLTP system
>
> I would like to open this discussion with persons that are working in the
> warehousing field or who are interrested in it, and to do so, please
 post your
> answers on this group.
>
> Regards,
> Nadia.

The major issue I encountered in implementation was reconciliation of disparate key values, e.g. how to match corporations from different data sources, internal and external - DUNS number, Name, address and zip, internal id, etc.

Other important issues are middleware technology to communicate with multiple DBMS's, frequency of update of warehoused data (full refreshes can take a REALLY long time) and getting the granularity/level of aggregation of data to the proper level.

David

David Donigian
Booz, Allen & Hamilton
donigian_dave_at_bah.com

-- 
David Donigian
Booz, Allen & Hamilton
donigian_dave_at_bah.com
Received on Wed Nov 22 1995 - 00:00:00 CET

Original text of this message