Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: Data Warehouse - extract methods

Re: Data Warehouse - extract methods

From: Dave <dave_at_spam-killer-dghconsulting.co.uk>
Date: Tue, 25 Feb 2003 04:12:55 +0000 (UTC)
Message-ID: <b3eqg6$1mv$1@venus.btinternet.com>


Thanks guys for your help.

Sybrand: Not sure what you meant about us misusing materialized views. Of course I do understand that fast refreshes are all about change detection (i.e. only replicate incremental changes), but the final result is a complete copy of the source table in the staging area of the DW. The reason why we'd be willing to do that is that replicating changes would be a lot faster than doing the whole extract, transformation and load process direct from live, thus reducing the load on the critical system (i.e. production). As Jusung confirms, we're still left with a change detection challenge on the DW with this imperfect solution.

Tanel: Great, thanks. I'll check out Oracle Streams.

Jusung: Thanks for saving my sanity by confirming my thoughts! Thanks also for the "on prebuilt table" tip.

Best wishes,

Dave.

"Jusung Yang" <JusungYang_at_yahoo.com> wrote in message news:130ba93a.0302231056.2c9a77ce_at_posting.google.com...
> The analysis of your situation is pretty complete and accurate. If you
> are restricted in so many ways, you will have very few options. Change
> detection, or "delta" extraction, is indeed the major challenge. If
> ORACLE CDC is not an option for you now, you may have to design your
> own CDC - typically with trigger. Data storage is not a concern when
> you are dealing with incremental extraction. But you will have to see
> if you can live with the performance impact on the production system.
>
> If you do decide to replicate the source table to either staging area
> or the warehouse, you can export/import the source tables from
> production and then use the "on prebuilt table" clause to create your
> materialized views. From then on you can fast refresh your MVs -
> provided they are fast refreshable.
>
> Of course, as you also noted, the only benefits of doing the above is
> that the ETL will run in the staging or DW database and the production
> system will not take a performance hit during ETL. But you are simply
> shifting the workload from one server to another, and the solution is
> not scalable - and it won't be until a "delta" extraction stategy has
> been implemented.
>
>
> - Jusung Yang
>
>
> "Dave" <dave_at_spam-killer-dghconsulting.co.uk> wrote in message
news:<b3ab95$3ie$1_at_sparta.btinternet.com>...
> > Hi,
> >
> > This has probably been covered before, but it is very well hidden! I'm
about
> > to build a data warehouse, and am discovering that the schema design is
the
> > easy bit. One of my biggest challenges has been designing the extract
> > processes. I'd appreciate it if anyone would be kind enough to share
their
> > experiences regarding the "E" bit of ETL. Here's my situation:
> >
> >
> > Prerequisites:
> >
> > 1. All the data sources are production Oracle8i databases. We don't have
the
> > option of upgrading these in the near future.
> > 2. Anything other than the very cheapest commercial ETL tools are out,
I'm
> > afraid. Data Junction's ETL fitted our budget (the basic version is very
> > cheap), but its features weren't adequate for some of our complex
> > transformation rules.
> > 3. The data warehouse database will be Oracle9i Release 2.
> > 4. Some of the source tables are relatively large (60 million rows).
> >
> >
> > My options (as I see them):
> >
> > 1. Extract to flat files, ship files, then load with SQL*Loader/external
> > tables.
> >
> > Oracle seems to like this method (they mention it in their data
warehousing
> > docs). Somehow I don't. Change detection would be a major challenge, as
we
> > don't have timestamps on all the source tables. Even with timestamps,
I'd
> > still have to detect deletions. Some of the tables on the source
database
> > are relatively large, and we'd like to minimise performance hit and
extract
> > window. I would guess we'd have to implement some kind of proprietary
change
> > detection system, but I'm wary of the maintenance costs involved with
> > proprietary code. On the plus side, the extract doesn't rely on the data
> > warehouse database being up, and the load doesn't rely on the source
> > databases being up.
> >
> > 2. Change Data Capture (CDC).
> >
> > Not really an option, as the source databases are on 8i. I just thought
I'd
> > mention that I'm aware of this technology in 9i. Even if it was an
option
> > for us, I'd be slightly concerned about the related performance hit on
our
> > production databases (our source systems), and have not been able to
find
> > much information about CDC and performance. Presumably CDC would also
> > require significant amounts of storage on the source systems.
> >
> > 3. Read-only snapshots (materialized views).
> >
> > Implementing snapshots (materialized views) with fast refreshes would
reduce
> > the hit on our source systems. However, the initial hit would be large
> > (requires an initial full refresh). The snapshot logs would require some
> > storage on the source systems, but I wouldn't be too concerned about
that if
> > we implement regular refreshes. I would foresee us replicating the
source
> > tables to staging area tables on the data warehouse database. My main
> > concern about this option is that it merely moves the change detection
> > challenge to the data warehouse database. Change detection is clearly a
> > major challenge here.
> >
> > 4. Proprietary solution over Oracle Net.
> >
> > This option is just too scary to contemplate. The maintenance cost would
be
> > large, and the change detection logic could place additional load on the
> > source systems. I want to avoid over-engineering the solution, and would
> > like to use built-in Oracle features wherever possible.
> >
> >
> > Any help at all would be gratefully received.
> >
> > Dave.
Received on Mon Feb 24 2003 - 22:12:55 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US