Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Data warehouse book- need recommendation please

RE: Data warehouse book- need recommendation please

From: <Saminathan_Seerangan_at_i2.com>
Date: Wed, 14 May 2003 21:28:34 -0800
Message-ID: <F001.0059939D.20030514212834@fatcity.com>


Dear Eric,

Thanks for your valuable response, really appreciated.

>Why anyone wants to spend millions of dollars on reporting tools and ETL
tools and what not when you get all that >for "free" is beyond me

Even i ahd the same question and got teh answer at asktom site.

http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:6749198829295

So what does an ETL tool, and in particular OWB, bring to a data warehousing project? Many good things.

  1. Automated code generation. OK, we're coders. We can do that. There's more to it than that. In a team environment you varying levels of expertise, techniques, and approaches, etc. OWB generates all this code (sql, plsql, tcl scripts, etc.) in a standardized way that so implementation and maintenance is consistent, and much easier to maintain, than trying to figure out how someone else coded the transformations. Some projects may have only semi technical staff and coding may be difficult for them. A non-issue with OWB. It is also quite common for development staff to continue to do things the way they know how rather than adopting the latest best practices, such as leveraging the 9i ETL functions such as external tables, merge processing, pipelined transformations, etc. To address one of the questions about OWB and the 9i ETL engine - the point is OWB generates code that takes advantage of database ETL features such as merge, multi table insert, external tables, etc. The OWB development group work VERY closely with the server technologies group to ensure optimal techniques are used. Its a very collaborative development environment. They are right up to date with the latest server features, leveraging all the things you raise in your comments, but implemented consistently in a standardized way.
  2. Utilities. In our custom development project we built a whole infrastructure around process management and control. I'm talking about auditing the load jobs, monitoring, reporting, tracking, etc. jobs and processes. This is part of what a tool brings to a project. It's one thing to code a transformation and another to log these and know when the jobs were run, how long they took, what the results were, how the exceptions were handled, etc. In a large production warehouse these become important.
  3. Documentation. Ever used Excel spreadsheets to document flows from source to target? What a pain. I have. Dreary work. Never maintained. OWB provides extensive reporting of all the metadata in the repository, including lineage tracking. Better yet, end users can drill from a workbook in Discoverer (end user ad hoc query tool) back to the source - target mapping to determine where the data came from. This is not developed for each workbook but come as a result of building the mappings in the tools and exploiting the tool integration.
  4. This brings me to inter-tool integration. ETL is not the end goal. It's getting information to end users. Without an ETL tool you start from scratch in reporting. OWB allow you to build your reporting End User metadata layer (EUL) as you develop ETL processes. In fact, updates to the EUL are now (OK, maybe in a month) real time. The saves a lot of effort building the business metadata reporting layer. This also extends to OLAP. OWB generates multidimensional OLAP Cubes in Oracle 9i, and provides the support for cube loading and OLAP metadata. Again, you're building from scratch there without the ETL tool. Same goes for Designer. We're really just scratching the surface here. The bottom line - Datawarehousing is a lot more than just ETL, and tools (those that are integrated that is), save a lot of development and maintenance time and effort down stream.

Maybe two or three years ago a lot of this was not there. Its come a long way and should now be a serious consideration for anyone building a data warehouse.

Thanks
Sami

"Eric Mortensen" <ericm_at_objectware.no>
Sent by: root_at_fatcity.com
05/14/03 02:01 PM
Please respond to ORACLE-L  

        To:     Multiple recipients of list ORACLE-L <ORACLE-L_at_fatcity.com>
        cc: 
        Subject:        RE: Data warehouse book- need recommendation please


Hi all, having lurked for a while now I thought it was time to put my neck out, since my favorite issue of data warehousing was brought up. I'm relatively new at Oracle DBA'ing, only worked with Oracle the last 18 months, and that's been mostly as a developer-DBA, responsible for the data model and generally making sure that the database is happy and content. I for one am simply amazed at how much powerful stuff Oracle provides for data warehousing, what with external tables, merge, multi-table insert, partitioning, SQL for analysis, direct parallell insert and so on (WOW!). Why anyone wants to spend millions of dollars on reporting tools and ETL tools and what not when you get all that for "free" is beyond me. I haven't had much luck yet in convincing my co-working project managers, though :-(  

Anyway, back to data warehousing. For the past 18 months I've been working on an enterprise data warehouse solution for a major retail bank in Norway. We went for a Bill Inmon style architecture, complete with Operational Data Store (ODS), Enterprise DataWarehouse (EDW), and Data Marts. Now, in the "Inmon universe" an ODS is a (close to) 3rd Normal Form OLTP database, EDW is a (somewhat) denormalized version of the ODS with an added time column to all the tables, and the data marts are "dimensional", ie like Kimball's marts.  

The purpose of an ODS is primarily to integrate data from disparate sources. These sources will in general not be of very high quality so a lot (a LOT) of work needs to be done to "massage" the data so that it can fit in a heavily constrained database (as in lots of PKs, FKs, CHECKs, and did I mention NOT NULLs?). The value of forcing the data to conform is invaluable since it gives you a high-quality, fully integrated, high-performance data store, which can feed not only the EDW and the data marts but also other (previously unthought of) applications (round trip business intelligence).  

The ODS only captures real-time data, e.g, one row per customer. An EDW on the other hand has one row per customer /per unit of time/ (in our case we have one row per customer for every time the customer data changed in the ODS, which isn't every time). This captures the historical changes in the data. By making this data non-dimensional (ie. denormalized version of the ODS) we have the flexibility to create new data marts very quicky.  

The alternative is as others have pointed out, to go with a Kimball-style architecture. In that case you will load directly from the data source (via a staging area of course) into the data marts. The data integration then appears in the transformations from the source to the mart. Instead of relying on an EDW (as in Inmon's approach) to make the data conform to the same "dimensions" you do this explicitly in each transformation from source to target mart. (Hence the benefit of an EDW, since you only do it once). Another issue with Kimball's approach is that it becomes difficult to do advaned analysis across differeent data marts. In an EDW the data is all in one place, so that never becomes an issue.  

Many people think that there's too much work in building two extra databases. But the thing is, building those two extra database don't cost that much extra, since you spend most of your time anyway analyzing the business rules. Additinally, actually building the database that implements and checks all those rules is extremely valuable since it helps you make sure all the data is loaded and aggregated correctly. In our project we often experienced that there was an error in our ETL jobs since the load exploded in our face with various violations. I don't even want to think about what would have happened if we didn't have constraints protecting us. That's the only way you can make sure the data is correct and validated. (And what's the point of building a data warehouse if you can't guarantee even that?)  

Well, that was way more than I intended to write, back to lurking mode ...  

Regards,
Eric Mortensen
Consultant, Objectware AS
eric.mortensen_at_objectware.no    

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Eric Mortensen
  INET: ericm_at_objectware.no

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).




-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: 
  INET: Saminathan_Seerangan_at_i2.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message
to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).
Received on Thu May 15 2003 - 00:28:34 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US