Re: On the subject of Data Warehouses, Data Cubes & OLAP....

From: Jorge_Beteta <jbeteta_at_yahoo.com>
Date: 15 Oct 2003 15:39:02 -0700
Message-ID: <5140b91e.0310151439.596771a0_at_posting.google.com>


Very interesting discussion. Also I'm just looking for answers of what a datamart really is. If somebody could just take a look of my previos question
"Am I correct of what a datamart is?"

http://groups.google.com/groups?dq=&hl=en&lr=&ie=UTF-8&group=comp.databases.olap&selm=5140b91e.0310141301.5c0920ac%40posting.google.com

I guess nobody did it :(

I think I have in some way the same interrogants. I got an answer I would like to share:

"The reason the data mart is constructed differently than a
transactional system is because it's usage is different.

Basically, it's an off-line or near-line system where data is dumped for the purpose of post-transaction processing. This could include mining for fraudulent or erroneus transactions, aggregate and statistical report processing, or sharing data with a class of users that are not colocated (or even part of the same organization) and don't require online data access. It's easy for these operations to take a lot of system resources.

The DBMS resources used can also be quite different between TP and analytical processing and are such tuned differently (ie procedure cache vs data cache). The hardware itself (Raid 0 vs Raid 5) may be setup differently if the database is read-only. Creating a separate system for these uses solves a lot of potential problems.

The data structures themselves are sometimes no longer the same. While it's the case that one could dump the data in its source structure, it is often better to precalculate formulas and store computed aggregates. Management reports often are looking for sum of sums type data or cross-tabs.

RE 1. Users making queries w/o programmer assistance is more of a goal than a indicator of whether a system is a data mart. Any system can have an application or middleware running that pre-processes user queries and converts them to SQL. Ultimately, the query mecahnism ends up being similar or the same for both. Sometimes the interface from the data mart is a dump into yet another system.

RE 2. I think that is essentially the case in AP systems. Some data is not transferred, some data is copied wrote, and some is computed.

RE Questions:
2) You could have a data mart that accesses another data mart, or a scientific system where the data wasn't transactional, but started off life as a mass of observations."

"I also would clarify by indicating that a Data Mart is going to be a
more highly indexed subset than you would normally want your actual data itself to be.

The concept seems to have emerged from the n-tier design concept where you would ideally not want users all hitting your main data, but prefer they hit a nearly-realtime highly indexed precompiled set of views from an intermediate source whenever possible."

Jorge Received on Thu Oct 16 2003 - 00:39:02 CEST

Original text of this message