Re: datawarehouse extraction, snapshots?

From: Jusung Yang <jusungyang_at_yahoo.com>
Date: 15 Aug 2002 23:14:53 -0700
Message-ID: <42ffa8fa.0208152214.796cbb17@posting.google.com>

In my opinion, you have 2 choices.

Bite the bullete and revamp your ETL process to start doing delta load. This will take more time, but in the long run you will be glad you do that - when you see the ETL process can be completed hundreds or thousands of times faster than before. And the performance will not decrease as your data volume grows - since you will be dealing with about the same delta everyday.
Find a workaround to alliviate your current problem - ETL is killing the performance of your production, OLAP data source. So you want to shift the problem from the data source system to your warehouse. A few problems here:
You are still using the same ETL, just now you are running it in your warehouse. As the data volume grows, the ETL performance will decline - since it deals with the entire source table.
The usage load on both systems can change. I suspect you probably do not have a lot of users hitting the warehouse right now. As users getting to know the power of a data warehouse they will want to use it more. Also the queries run against a warehouse are typically more resource itensive than in an OLAP environment. Sooner or later you will have to face the same problem again. Where are you going to shit the problem to next time?
You now have the extra work of trying to keep 2 copies of data source in sync. If you go this route, you may indeed have to create snapshots and snapshot logs. And, if memory serves, if you have complex snapshots you can not use snapshot logs for fast refresh. What would you do when you need a complex snapshot in the future and it needs to be completely, millions upon millions of records, refreshed everyday?

Think it over and proceed carefully. Your workaround can potentially get very
expensive and messy.

sg <s4v4g3_at_europe.com> wrote in message news:<3D5C2B32.80607_at_europe.com>...
> the groups who designed the datawarehouse didnt design the processes to
> extract incrementally, yes very sad. That's why I am trying to do it
> with snapshots
>
> The source system is pretty powerful, 3 node TruCluster Server in Compaq
> GS320 but it is not giving enough because as I mentioned same data is
> extracted over and over again everyday plus the billing system itself
> has many batch processes going everynight too!
>
> And yes I am considering replicating the data from source to
> datawarehouse and do all transformations there then distribute data to
> rest of datamarts. I have been doing some research but seems like the
> basic ETL processes are mostly
>
> sqlplus spool
> ftp
> sql loader
>
> that is good if we dont have to deal with quite considerable amount of
> data but since we are not doing anything incrementally everything is
> slow, in order to do extraction delta we probably have to modify source
> data tables (like add a column for last DML date) which seems impossible
> because the billing software provider although they are willing to do it
> but they probably take 4 ~ 8 months! Then we have to test etc which can
> take another couple of months!
>
> I am using 8.1.7 so.... transportable tablespace is not an option
Received on Fri Aug 16 2002 - 01:14:53 CDT