Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: datawarehouse extraction, snapshots?
yea I understand that at the end extraction has to be delta based
I mean a timestamp column is needed to queries like
select * from source where time between trunc(sysdate - 1) and trunc(sysdate)
and yes your understanding about my plan is correct, too bad I am in 8i and cannot use CDC =(
thank you for ur suggestions
cheers
Jusung Yang wrote:
> Well, best that I shut up now. Data warehousing is kindda complicated.
> Without knowing your system fully, it's hard for me to say what's best
> for you. Once the DW is built, it is usually difficult to change it.
> We spent 4 months just to build the model. Our data sources came from
> SAP, SYBASE, ORACLE and EDI. Delta load design was one of the biggest
> design issues. Data partitioning was another. Refresh frequency and
> downtime management was another. We used Informatics PowerMart for the
> ETL. What I said before was based on the information provided in your
> posts.
>
> Just a couple more things.
> 1.
> Delta load design does not have to depend on a timestamp column.
> Snapshot log is a type of delta, no? I don't see ORACLE depend on
> timestamp columns to capture the changed data and place them in these
> logs. Trigger is what you need - if you can not use the 9i CDC. You
> basically do your own "limited" CDC. Capture the changed data and
> decide how to use them to refresh your warehouse. Conceptually, I
> would do this:
> a. Use triggers to capture newly inserted or deleted data, flag them
> as 'I' and 'D'.
> b. For updated data, you log the old record as 'D' and the new record
> as 'I'.
> c. You Refresh your DW with these data. Transform them if needed.
>
>
> 2.
> Here is what I understand you would like to do:
>
> Current Design
> -------------------------------------------------
> ETL queris
> (Source tables) -> (DW tables) <- (DW user queries)
>
>
> Your Plan
> -----------------------------------------------------------------
> ETL queries
> (Source tables) -> (Replicated source tables in DW) -> (DW
> tables) <- (DW user queries)
>
>
> Your DW tables are being completely refreshed every night, yes? By
> replicating source tables into DW, like I said before, you only shift
> your problem from OLAP to DW. It may be OK today since the DW hardware
> is more powerful, may not be in the future when the data volume grows.
> ETL queries are still doing the full table search, as opposed to
> delta, on the replicated tables.
>
> But if doing CDC and modify ETL is not an option for you, you may
> indeed have to do this. I have nothing further to say. Good luck!
>
>
>
> sg <s4v4g3_at_europe.com> wrote in message news:<3D5D1C8D.9010407_at_europe.com>...
>
>>re-working ETL processes to start doing delta load is virtually >>impossible because source data are all third party software, therefore >>there are no timestamp columns for us to do delta extraction >> >>there are quite a warehouse users, this warehouse has been up for around >>2 years, so far the performance is not too bad. the problem is >>extraction which kills OLTP source... >> >>regarding creating snapshots, I just did a small research in oracle >>documentation about datawarehouses and seems like snapshots are >>recommended for delta extractions.... in my case there wont be any >>complex snapshots because most probably is i will replicate tables and >>not data based on queries >> >>the perfect way would be adding timestamp columns but that seems far far >>far away although timestamp will not be able to deal with deleted data >>and for that we will probably have to rely on delete triggers and a log >>table >> >>I was hoping someone who has worked in datawarehouse share their data >>extraction experience :-) >> >>cheers >>Received on Sat Aug 17 2002 - 08:51:56 CDT