Oracle Data Warehousing FAQ
$Date: 04-Mar-2002 $
$Revision: 1.41 $
$Author: Frank Naudé $
Topics
Back to Oracle FAQ Index
What is a Data Warehouse?
A Data Warehouse is the "corporate memory". Academics will say it is
a subject oriented, point-in-time, inquiry only collection of operational
data.
Typical relational databases are designed for on-line transactional
processing (OLTP) and do not meet the requirements for effective on-line
analytical processing (OLAP). As a result, data warehouses are designed
differently than traditional relational databases.
Back to top of file
What is ETL/ How does Oracle support the ETL process?
ETL is the Data Warehouse acquisition processes of Extracting, Transforming
(or Transporting) and Loading (ETL) data from source systems into the data
warehouse.
Oracle supports the ETL process with their "Oracle Warehouse Builder"
product. Many new features in the Oracle9i database will also make ETL processing
easier. For example:
- New MERGE command (also called UPSERT, Insert and update information in one step);
- External Tables allows users to run SELECT statements on external data
files (with pipelining support).
Back to top of file
What is the difference between a data warehouse and a data mart?
This is a heavily debated issue. There are inherent similarities between the
basic constructs used to design a data warehouse and a data mart. In general
a Data Warehouse is used on an enterprise level, while Data Marts is used
on a business division/department level. A data mart only contains the required
subject specific data for local analysis.
Back to top of file
What is the difference between a W/H and an OLTP application?
Typical relational databases are designed for on-line transactional processing
(OLTP) and do not meet the requirements for effective on-line analytical
processing (OLAP). As a result, data warehouses are designed differently
than traditional relational databases.
Warehouses are Time Referenced, Subject-Oriented, Non-volatile (read only)
and Integrated.
OLTP databases are designed to maintain atomicity, consistency and integrity
(the "ACID" tests). Since a data warehouse is not updated, these constraints
are relaxed.
Back to top of file
What is the difference between OLAP, ROLAP, MOLAP and HOLAP?
ROLAP, MOLAP and HOLAP are specialized OLAP (Online Analytical Analysis)
applications.
ROLAP stands for Relational OLAP. Users see their data organized in cubes
with dimensions, but the data is really stored in a Relational Database
(RDBMS) like Oracle. The RDBMS will store data at a fine grain level, response
times are usually slow.
MOLAP stands for Multidimensional OLAP. Users see their data organized in
cubes with dimensions, but the data is store in a Multi-dimensional database
(MDBMS) like Oracle Express Server. In a MOLAP system lot of queries have a
finite answer and performance is usually critical and fast.
HOLAP stands for Hybrid OLAP, it is a combination of both worlds.
Seagate Software's Holos is an example HOLAP environment.
In a HOLAP system one will find queries on aggregated data as well as on
detailed data.
Back to top of file
What is the difference between an ODS and a W/H?
An ODS (Operational Data Store) is an integrated database of operational
data. Its sources include legacy systems and it contains current or near
term data. An ODS may contain 30 to 90 days of information.
A warehouse typically contains years of data (Time Referenced). Data
warehouses group data by subject rather than by activity (subject-oriented).
Other properties are: Non-volatile (read only) and Integrated.
Back to top of file
What Oracle tools can be used to design and build a W/H?
Data Warehouse Builder (or Oracle Data Mart builder), Oracle Designer,
Oracle Express, Express Objects, etc.
Back to top of file
When should one use an MD-database (multi-dimensional database) and not a relational one?
Data in a multi-dimensional database is stored as business people views it,
allowing them to slice and dice the data to answer business questions. When
designed correctly, an OLAP database will provide must faster response times
for analytical queries.
Normal relational databases store data in two-dimensional tables and
analytical queries against them are normally very slow.
Back to top of file
What is a star schema? Why does one design this way?
A single "fact table" containing a compound primary key, with one segment for
each "dimension," and additional columns of additive, numeric facts.
Why?
It allows for the highest level of flexibility of metadata
Low maintenance as the data warehouse matures
Best possible performance
Back to top of file
When should you use a STAR and when a SNOW-FLAKE schema?
The star schema is the simplest data warehouse schema. Snow flake schema is
similar to the star schema. It normalizes dimension table to save data
storage space. It can be used to represent hierarchies of information.
Back to top of file
What is the difference between Oracle Express and Oracle Discoverer?
Express is an MD database and development environment.
Discoverer is an ad-hoc end-user query tool.
Back to top of file
How can Oracle Materialized Views be used to speed up data warehouse queries?
With "Query Rewrite" (QUERY_REWRITE_ENABLED=TRUE in INIT.ORA) Oracle can
direct queries to use pre-aggregated tables instead of scanning large tables
to answer complex queries.
Materialized views in a W/H environments is typically referred to as
summaries, because they store summarized data.
Back to top of file
What Oracle features can be used to optimize my Warehouse system?
The following Oracle features can be used to compliment your Warehouse
system/database:
- From Oracle8i One can transport tablespaces between Oracle databases. Using
this feature one can easily "detach" a tablespace for archiving purposes.
One can also use this feature to quickly move data from an OLTP database
to a Warehouse database.
- Data partitioning allows one to split big tables into smaller more
manageable sub-tables (partitions). Data is automatically directed
to the correct partition based on data ranges or hash values.
- Oracle Materialized Views can be used to pre-aggregate data. The Query
Optimizer can direct queries to summary/ roll-up tables instead of the
detail data tables (query rewrite). This will dramatically speed-up
warehouse queries and saves valuable machine resources.
- Oracle Parallel Query can be used to speed up data retrieval by using
multiple processes (and CPUs) to process a single task.
Back to top of file
Where can one get more info about Warehousing?
Back to top of file