Re: datawarehouse extraction, snapshots?

From: sg <s4v4g3_at_europe.com>
Date: Fri, 16 Aug 2002 00:29:06 +0200
Message-ID: <3D5C2B32.80607@europe.com>

the groups who designed the datawarehouse didnt design the processes to extract incrementally, yes very sad. That's why I am trying to do it with snapshots

The source system is pretty powerful, 3 node TruCluster Server in Compaq GS320 but it is not giving enough because as I mentioned same data is extracted over and over again everyday plus the billing system itself has many batch processes going everynight too!

And yes I am considering replicating the data from source to datawarehouse and do all transformations there then distribute data to rest of datamarts. I have been doing some research but seems like the basic ETL processes are mostly

sqlplus spool
ftp
sql loader

that is good if we dont have to deal with quite considerable amount of data but since we are not doing anything incrementally everything is slow, in order to do extraction delta we probably have to modify source data tables (like add a column for last DML date) which seems impossible because the billing software provider although they are willing to do it but they probably take 4 ~ 8 months! Then we have to test etc which can take another couple of months!

I am using 8.1.7 so.... transportable tablespace is not an option

Jusung Yang wrote:

>>Transportable tablespace cant go because we have different data block
>>sizes (8k vs 32k)
>>

> 
> First off, 9i supports multi-blocksize. I heard it is specifically
> designed for transpotable tablespace.
> 
> In my opinion, you either have a bad ETL mechanism or you simply have
> to up the processing power of your source system. During ETL, you move
> your data (the new data, the delta since last extraction) from your
> source system to the datawarehouse. I certainly hope you are doing the
> delta, that's what ETL is all about. The fact that you are considering
> creating snapshops seems to indicate that the current ETL mechanism is
> not doing the delta thing. It would very, very sad if this is true.
> 
> If I understand you correctly, you want to actually replicate your
> data source in your data warehouse. And extract, transform (maybe) and
> load them all in the same system. Interesting idea, but I never heard
> of such design in a data warehouse environment and dare to say that it
> is a bad idea. Instead of maintainging a single ETL mechanism, you
> will now add another data replication mechanism to your system.
> 
> If you are using 9i, CDC is your answer.
> 
> 
> 
> sg <s4v4g3_at_europe.com> wrote in message news:<3D5B8EC5.5030509_at_europe.com>...
>

>>Hi
>>
>>our problem is that extraction processes is exchausting our billing
>>system resources. This is a mobile company so
>>
>>imagine we have a table for customer contracts which is around 15000000
>>rows, for extraction a the process probably do
>>
>>select column1, column3, column 5
>>from CONTRACTS;
>>
>>then next extraction process comes and require part of data from that
>>table and runs
>>
>>select column2, column4, column 6
>>from CONTRACTS;
>>
>>then tehre comes another process which does probably similar thing but
>>for different column combinations, there are around 20 of these
>>extraction processes going in our billing system everynight!
>>
>>So what I am trying to do is relieve this load from billing system and
>>move the load to our datawarehouse server is more suited for these kind
>>of heavy queries. I want to move the entire table contracts everyday to
>>datawarehouse.
>>
Received on Thu Aug 15 2002 - 17:29:06 CDT