Re: Rants. Difficulty to learn ETL tools?

From: DA Morgan <damorgan_at_psoug.org>
Date: Wed, 16 May 2007 13:13:41 -0700
Message-ID: <1179346417.758792_at_bubbleator.drizzle.com>


Peter Nolan wrote:
> Hi Araron,
>
> "What can it do that you can't do in PL/SQL? Well, some nice things
> are
> we can parallelise the processing of large numbers of fact records
> and
> we can put the dimension tables in memory mapped IO and access them
> in
> a shared fashion using binary search......this is 10x faster than
> doing the same in PL/SQL at runtime.... "
>
> The most cpu intensive task in building a dimensional model is the
> tranlsation of real keys into integer keys...the attribution process.
>
> Doing this work in PL/SQL or inside the database as ELT type tools do
> consumes much more (10x or more) processing power than is required if
> you do it more efficiently.....and since this is the most expensive
> bit, it's worth speeding up in large accounts...
>
> Yes, of course, you can run multiple pl/sql statements at the same
> time to do this work if you would like.
>
> However, one thing we have been doing for about 7 years how is putting
> a unique key on the front of fact table records. And when processing a
> single set of transactions that have been split into different files
> to enable parallelism you have to somehow allow for the allocation of
> keys for these unqiue ids on the front of the fact records...this
> means a semaphor at the file level, locks etc, so the pl/sql to handle
> that becomes more difficult...especially when you consider you cannot
> just allocate one key at a time and do the update because it slows
> things down too much. You must be able to allocate blocks of
> keys.......the ETL tools have these features in place because they are
> so necessary.
>
> Many people say to me, well, just let the database allocate a sequence
> number....but in most databases these are slow....and they also give
> you trouble if you want to delete and reload data because it is often
> not so easy to reset the sequence number so that the delete/reloaded
> records get the same sequence number they used to have......
>
> We also do things like...when we need to update records it is often
> faster to delete them and then to use the loader to load the record
> rather than do an update. In these cases the sequence numbers must be
> respected so using the database to allocate them does not work if you
> want to do a delete/load rather than an update.
>
> And yes, PL/SQL can do a good part of all this if you want to write
> it...but it is far from trivial......and it will run 10x or more
> slower than doing it properly....and that is well worth the effort in
> most sizable accounts.....plenty of DW projects fail because the ETL
> subsystem is too slow, too expensive, and too difficult to change.
>
> Best Regards
>
> Peter
>
>
>
>
>
> On May 4, 10:13 pm, "Aaron Kempf" <ake..._at_dol.wa.gov> wrote:

>> what you can't run multiple PL/SQL statements at the same time in Oracle?
>>
>> ROFL
>>
>> "Peter Nolan" <p..._at_peternolan.com> wrote in message
>>
>> news:1177319493.688345.26010_at_o5g2000hsb.googlegroups.com...
>>
>>
>>
>>> Hi DBA...
>>> your append is exactly what I have been talking about since the mid
>>> 90s as well....making ETL easier......
>>> We have invented the future and the future of ETL is 'generated ETL
>>> from the data mapping workbook'.   (www.instantbi.com)
>>> You have to do your data mapping somehow, and excel is how most people
>>> do it, the laggards are still using word......
>>> Since you already have to do your data mapping, and if you are
>>> sensible you do it in excel, it makes the most sense to generate the
>>> etl subsystem directly from the workbook as well as publish the
>>> workbook via the web so that authorised people can see any and all
>>> details of the ETL subsystem.
>>> No ETL subsystem will ever be any easier to develop and deploy than
>>> what we have invented because no ETL subsystem will ever be easier to
>>> build than a direct generate from the mapping workbook.....this is the
>>> 'end game' for development of ETL subsystems.
>>> Why use such a tool rather than PL/SQL.....well, because it is
>>> generated directly from the workbook we have 'done away with' the ETL
>>> programmer.....and that is a good thing. I have done far too much ETL
>>> programming over the years and I want to get rid of that complete
>>> waste of time....
[Quoted] >>> What can it do that you can't do in PL/SQL? Well, some nice things are
>>> we can parallelise the processing of large numbers of fact records and
>>> we can put the dimension tables in memory mapped IO and access them in
>>> a shared fashion using binary search......this is 10x faster than
>>> doing the same in PL/SQL at runtime....
>>> Also, we have intelligence built into it that means you can do things
>>> like add new summaries without any code changes, you can add new keys
>>> to fact tables without any code changes, you can make lookups into
>>> dimension tables to get new keys FAR more complex than possible than
>>> via normal sql statements.
>>> In short, we have eliminated all the 'coding' effort that is required
>>> when writing you ETL subsystem no matter what the tool.....and we have
>>> done it in such a way that it is as scalable as the operating system
>>> underneath....
>>> Another BIG feature is that the ETL subsystem is portable across
>>> databases and operating systems....something that PL/SQL and SSIS are
>>> obvioulsy not.....this means that if some better/faster database comes
>>> out we can move to it......not something that MSFT would like to her
>>> and this is their newsgroup.....but it has always been a belief of
>>> mine that the ETL subsystem should be fully portable across OS and
>>> database.....and surprisingly, this is NOT the case with any of the
>>> ETL tools that I have seen....they all require quite some effort to
>>> move them.....thereby creating a cost to adopt a faster/cheaper/better
>>> database.
>>> If you are keenly interested, feel free to check my personal site
>>> www.peternolan.comwhere I have published vast amounts of code and
>>> documentation on ETL subsystems.
>>> Best Regards
>>> Peter
>>> www.peternolan.com- Hide quoted text -
>> - Show quoted text -

Your last actual experience with Oracle was how many versions ago?

-- 
Daniel A. Morgan
University of Washington
damorgan_at_x.washington.edu
(replace x with u to respond)
Puget Sound Oracle Users Group
www.psoug.org
Received on Wed May 16 2007 - 22:13:41 CEST

Original text of this message