Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Usenet -> c.d.o.server -> Re: datawarehouse extraction, snapshots?

Re: datawarehouse extraction, snapshots?

From: sg <s4v4g3_at_europe.com>
Date: Fri, 16 Aug 2002 17:38:53 +0200
Message-ID: <3D5D1C8D.9010407@europe.com>


re-working ETL processes to start doing delta load is virtually impossible because source data are all third party software, therefore there are no timestamp columns for us to do delta extraction

there are quite a warehouse users, this warehouse has been up for around 2 years, so far the performance is not too bad. the problem is extraction which kills OLTP source...

regarding creating snapshots, I just did a small research in oracle documentation about datawarehouses and seems like snapshots are recommended for delta extractions.... in my case there wont be any complex snapshots because most probably is i will replicate tables and not data based on queries

the perfect way would be adding timestamp columns but that seems far far far away although timestamp will not be able to deal with deleted data and for that we will probably have to rely on delete triggers and a log table

I was hoping someone who has worked in datawarehouse share their data extraction experience :-)

cheers

Jusung Yang wrote:

> In my opinion, you have 2 choices.
>
> 1. Bite the bullete and revamp your ETL process to start doing delta
> load.
> This will take more time, but in the long run you will be glad you do
> that -
> when you see the ETL process can be completed hundreds or thousands of
> times faster than before. And the performance will not decrease as
> your data volume grows - since you will be dealing with about the same
> delta everyday.
>
> 2. Find a workaround to alliviate your current problem - ETL is
> killing the
> performance of your production, OLAP data source. So you want to shift
> the problem from the data source system to your warehouse. A few
> problems here:
>
> A. You are still using the same ETL, just now you are running it in
> your
> warehouse. As the data volume grows, the ETL performance will decline
> - since it deals with the entire source table.
> B. The usage load on both systems can change. I suspect you probably
> do not have a lot of users hitting the warehouse right now. As users
> getting to know the power of a data warehouse they will want to use it
> more. Also the queries run against a warehouse are typically more
> resource itensive than in an OLAP environment. Sooner or later you
> will have to face the same problem again. Where are you going to shit
> the problem to next time?
> C. You now have the extra work of trying to keep 2 copies of data
> source in sync. If you go this route, you may indeed have to create
> snapshots and snapshot logs. And, if memory serves, if you have
> complex snapshots you can not use snapshot logs for fast refresh. What
> would you do when you need a complex snapshot in the future and it
> needs to be completely, millions upon millions of records, refreshed
> everyday?
>
> Think it over and proceed carefully. Your workaround can potentially
> get very
> expensive and messy.
>
>
> sg <s4v4g3_at_europe.com> wrote in message news:<3D5C2B32.80607_at_europe.com>...
>

>>the groups who designed the datawarehouse didnt design the processes to 
>>extract incrementally, yes very sad. That's why I am trying to do it 
>>with snapshots
>>
>>The source system is pretty powerful, 3 node TruCluster Server in Compaq 
>>GS320 but it is not giving enough because as I mentioned same data is 
>>extracted over and over again everyday plus the billing system itself 
>>has many batch processes going everynight too!
>>
>>And yes I am considering replicating the data from source to 
>>datawarehouse and do all transformations there then distribute data to 
>>rest of datamarts. I have been doing some research but seems like the 
>>basic  ETL processes are mostly
>>
>>sqlplus spool
>>ftp
>>sql loader
>>
>>that is good if we dont have to deal with quite considerable amount of 
>>data but since we are not doing anything incrementally everything is 
>>slow, in order to do extraction delta we probably have to modify source 
>>data tables (like add a column for last DML date) which seems impossible 
>>because the billing software provider although they are willing to do it 
>>but they probably take 4 ~ 8 months! Then we have to test etc which can 
>>take another couple of months!
>>
>>I am using 8.1.7 so.... transportable tablespace is not an option
>>
Received on Fri Aug 16 2002 - 10:38:53 CDT

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US