Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Data Warehouse - extract methods

Re: Data Warehouse - extract methods

From: Tanel Poder <tanel_at_@peldik.com>
Date: Sun, 23 Feb 2003 17:21:05 +0200
Message-ID: <3e58e63e$1_2@news.estpak.ee>


Hi,

Check Oracle streams as well.
Since the information needed for change detection is read from redo (archive) logs, there's no additional overhead issues on source system. Using Streams you can replicate both DML and DDL (for the case replicating dropping partitions for example), it even can replicate user messages what you can generate using pl/sql packages. You should be cautious about NOLOGGING operations as CTAS in source system, since those changes won't go to redologs. In 9i you could use FORCE LOGGING tablespaces for making sure log gets always written, but in 8i you'd have to enforce this as a user rule.

Since Streams uses LogMiner for getting information out of logs and 9i LogMiner can read logs from release 8.0 to 9i, I assume that you can get your changes out from 8i database using Streams as well. I'm only guessing that, I've only used streams in 9i -> 9i env. Couldn't find anything in Streams documentation either.

Anyway, Streams can be very useful in multi-server-replication environment, without generating much overhead to source system. I like it.

Tanel.

"Dave" <dave_at_spam-killer-dghconsulting.co.uk> wrote in message news:b3ab95$3ie$1_at_sparta.btinternet.com...
> Hi,
>
> This has probably been covered before, but it is very well hidden! I'm
about
> to build a data warehouse, and am discovering that the schema design is
the
> easy bit. One of my biggest challenges has been designing the extract
> processes. I'd appreciate it if anyone would be kind enough to share their
> experiences regarding the "E" bit of ETL. Here's my situation:
>
>
> Prerequisites:
>
> 1. All the data sources are production Oracle8i databases. We don't have
the
> option of upgrading these in the near future.
> 2. Anything other than the very cheapest commercial ETL tools are out, I'm
> afraid. Data Junction's ETL fitted our budget (the basic version is very
> cheap), but its features weren't adequate for some of our complex
> transformation rules.
> 3. The data warehouse database will be Oracle9i Release 2.
> 4. Some of the source tables are relatively large (60 million rows).
>
>
> My options (as I see them):
>
> 1. Extract to flat files, ship files, then load with SQL*Loader/external
> tables.
>
> Oracle seems to like this method (they mention it in their data
warehousing
> docs). Somehow I don't. Change detection would be a major challenge, as we
> don't have timestamps on all the source tables. Even with timestamps, I'd
> still have to detect deletions. Some of the tables on the source database
> are relatively large, and we'd like to minimise performance hit and
extract
> window. I would guess we'd have to implement some kind of proprietary
change
> detection system, but I'm wary of the maintenance costs involved with
> proprietary code. On the plus side, the extract doesn't rely on the data
> warehouse database being up, and the load doesn't rely on the source
> databases being up.
>
> 2. Change Data Capture (CDC).
>
> Not really an option, as the source databases are on 8i. I just thought
I'd
> mention that I'm aware of this technology in 9i. Even if it was an option
> for us, I'd be slightly concerned about the related performance hit on our
> production databases (our source systems), and have not been able to find
> much information about CDC and performance. Presumably CDC would also
> require significant amounts of storage on the source systems.
>
> 3. Read-only snapshots (materialized views).
>
> Implementing snapshots (materialized views) with fast refreshes would
reduce
> the hit on our source systems. However, the initial hit would be large
> (requires an initial full refresh). The snapshot logs would require some
> storage on the source systems, but I wouldn't be too concerned about that
if
> we implement regular refreshes. I would foresee us replicating the source
> tables to staging area tables on the data warehouse database. My main
> concern about this option is that it merely moves the change detection
> challenge to the data warehouse database. Change detection is clearly a
> major challenge here.
>
> 4. Proprietary solution over Oracle Net.
>
> This option is just too scary to contemplate. The maintenance cost would
be
> large, and the change detection logic could place additional load on the
> source systems. I want to avoid over-engineering the solution, and would
> like to use built-in Oracle features wherever possible.
>
>
> Any help at all would be gratefully received.
>
> Dave.
>
>
Received on Sun Feb 23 2003 - 09:21:05 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US