Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Mailing Lists -> Oracle-L -> RE: Data warehouse book- need recommendation please
Hi all, having lurked for a while now I thought it was time to put my neck out, since my favorite issue of data warehousing was brought up. I'm relatively new at Oracle DBA'ing, only worked with Oracle the last 18 months, and that's been mostly as a developer-DBA, responsible for the data model and generally making sure that the database is happy and content. I for one am simply amazed at how much powerful stuff Oracle provides for data warehousing, what with external tables, merge, multi-table insert, partitioning, SQL for analysis, direct parallell insert and so on (WOW!). Why anyone wants to spend millions of dollars on reporting tools and ETL tools and what not when you get all that for "free" is beyond me. I haven't had much luck yet in convincing my co-working project managers, though :-(
Anyway, back to data warehousing. For the past 18 months I've been working on an enterprise data warehouse solution for a major retail bank in Norway. We went for a Bill Inmon style architecture, complete with Operational Data Store (ODS), Enterprise DataWarehouse (EDW), and Data Marts. Now, in the "Inmon universe" an ODS is a (close to) 3rd Normal Form OLTP database, EDW is a (somewhat) denormalized version of the ODS with an added time column to all the tables, and the data marts are "dimensional", ie like Kimball's marts.
The purpose of an ODS is primarily to integrate data from disparate sources. These sources will in general not be of very high quality so a lot (a LOT) of work needs to be done to "massage" the data so that it can fit in a heavily constrained database (as in lots of PKs, FKs, CHECKs, and did I mention NOT NULLs?). The value of forcing the data to conform is invaluable since it gives you a high-quality, fully integrated, high-performance data store, which can feed not only the EDW and the data marts but also other (previously unthought of) applications (round trip business intelligence).
The ODS only captures real-time data, e.g, one row per customer. An EDW on the other hand has one row per customer /per unit of time/ (in our case we have one row per customer for every time the customer data changed in the ODS, which isn't every time). This captures the historical changes in the data. By making this data non-dimensional (ie. denormalized version of the ODS) we have the flexibility to create new data marts very quicky.
The alternative is as others have pointed out, to go with a Kimball-style architecture. In that case you will load directly from the data source (via a staging area of course) into the data marts. The data integration then appears in the transformations from the source to the mart. Instead of relying on an EDW (as in Inmon's approach) to make the data conform to the same "dimensions" you do this explicitly in each transformation from source to target mart. (Hence the benefit of an EDW, since you only do it once). Another issue with Kimball's approach is that it becomes difficult to do advaned analysis across differeent data marts. In an EDW the data is all in one place, so that never becomes an issue.
Many people think that there's too much work in building two extra databases. But the thing is, building those two extra database don't cost that much extra, since you spend most of your time anyway analyzing the business rules. Additinally, actually building the database that implements and checks all those rules is extremely valuable since it helps you make sure all the data is loaded and aggregated correctly. In our project we often experienced that there was an error in our ETL jobs since the load exploded in our face with various violations. I don't even want to think about what would have happened if we didn't have constraints protecting us. That's the only way you can make sure the data is correct and validated. (And what's the point of building a data warehouse if you can't guarantee even that?)
Well, that was way more than I intended to write, back to lurking mode ...
Regards,
Eric Mortensen
Consultant, Objectware AS
eric.mortensen_at_objectware.no
--
Please see the official ORACLE-L FAQ: http://www.orafaq.net
--
Author: Eric Mortensen
INET: ericm_at_objectware.no
Fat City Network Services -- 858-538-5051 http://www.fatcity.com San Diego, California -- Mailing list and web hosting services ---------------------------------------------------------------------To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). Received on Wed May 14 2003 - 03:31:39 CDT