Re: Rants. Difficulty to learn ETL tools?

From: Susie DBA [MSFT] <susiedba_at_hotmail.com>
Date: 23 May 2007 23:17:45 -0700
Message-ID: <1179987465.242389.3990_at_b40g2000prd.googlegroups.com>


you're full of crap

[Quoted] you can't do any of this IN ORACLE

[Quoted] it's all a seperate tool you have to purchase

[Quoted] Oracles just plain stupid; I wish that MS had some real competition

[Quoted] as it is; MS won the olap war - www.olapreport.com and you're just plain fucking stupid if you're not using Analysis Services

On May 16, 9:00 am, Peter Nolan <p..._at_peternolan.com> wrote:
> Hi Araron,
>
> "What can it do that you can't do in PL/SQL? Well, some nice things
> are
> we can parallelise the processing of large numbers of fact records
> and
> we can put the dimension tables in memory mapped IO and access them
> in
> a shared fashion using binary search......this is 10x faster than
> doing the same in PL/SQL at runtime.... "
>
> The most cpu intensive task in building a dimensional model is the
> tranlsation of real keys into integer keys...the attribution process.
>
> Doing this work in PL/SQL or inside the database as ELT type tools do
> consumes much more (10x or more) processing power than is required if
> you do it more efficiently.....and since this is the most expensive
> bit, it's worth speeding up in large accounts...
>
> Yes, of course, you can run multiple pl/sql statements at the same
> time to do this work if you would like.
>
> However, one thing we have been doing for about 7 years how is putting
> a unique key on the front of fact table records. And when processing a
> single set of transactions that have been split into different files
> to enable parallelism you have to somehow allow for the allocation of
> keys for these unqiue ids on the front of the fact records...this
> means a semaphor at the file level, locks etc, so the pl/sql to handle
> that becomes more difficult...especially when you consider you cannot
> just allocate one key at a time and do the update because it slows
> things down too much. You must be able to allocate blocks of
> keys.......the ETL tools have these features in place because they are
> so necessary.
>
> Many people say to me, well, just let the database allocate a sequence
> number....but in most databases these are slow....and they also give
> you trouble if you want to delete and reload data because it is often
> not so easy to reset the sequence number so that the delete/reloaded
> records get the same sequence number they used to have......
>
> We also do things like...when we need to update records it is often
> faster to delete them and then to use the loader to load the record
> rather than do an update. In these cases the sequence numbers must be
> respected so using the database to allocate them does not work if you
> want to do a delete/load rather than an update.
>
> And yes, PL/SQL can do a good part of all this if you want to write
> it...but it is far from trivial......and it will run 10x or more
> slower than doing it properly....and that is well worth the effort in
> most sizable accounts.....plenty of DW projects fail because the ETL
> subsystem is too slow, too expensive, and too difficult to change.
>
> Best Regards
>
> Peter
>
> On May 4, 10:13 pm, "Aaron Kempf" <ake..._at_dol.wa.gov> wrote:
>
>
>
> > what you can't run multiple PL/SQL statements at the same time in Oracle?
>
> > ROFL
>
> > "Peter Nolan" <p..._at_peternolan.com> wrote in message
>
> >news:1177319493.688345.26010_at_o5g2000hsb.googlegroups.com...
>
> > > Hi DBA...
> > > your append is exactly what I have been talking about since the mid
> > > 90s as well....making ETL easier......
>
> > > We have invented the future and the future of ETL is 'generated ETL
> > > from the data mapping workbook'. (www.instantbi.com)
>
> > > You have to do your data mapping somehow, and excel is how most people
> > > do it, the laggards are still using word......
>
> > > Since you already have to do your data mapping, and if you are
> > > sensible you do it in excel, it makes the most sense to generate the
> > > etl subsystem directly from the workbook as well as publish the
> > > workbook via the web so that authorised people can see any and all
> > > details of the ETL subsystem.
>
> > > No ETL subsystem will ever be any easier to develop and deploy than
> > > what we have invented because no ETL subsystem will ever be easier to
> > > build than a direct generate from the mapping workbook.....this is the
> > > 'end game' for development of ETL subsystems.
>
> > > Why use such a tool rather than PL/SQL.....well, because it is
> > > generated directly from the workbook we have 'done away with' the ETL
> > > programmer.....and that is a good thing. I have done far too much ETL
> > > programming over the years and I want to get rid of that complete
> > > waste of time....
>
> > > What can it do that you can't do in PL/SQL? Well, some nice things are
> > > we can parallelise the processing of large numbers of fact records and
> > > we can put the dimension tables in memory mapped IO and access them in
> > > a shared fashion using binary search......this is 10x faster than
> > > doing the same in PL/SQL at runtime....
>
> > > Also, we have intelligence built into it that means you can do things
> > > like add new summaries without any code changes, you can add new keys
> > > to fact tables without any code changes, you can make lookups into
> > > dimension tables to get new keys FAR more complex than possible than
> > > via normal sql statements.
>
> > > In short, we have eliminated all the 'coding' effort that is required
> > > when writing you ETL subsystem no matter what the tool.....and we have
> > > done it in such a way that it is as scalable as the operating system
> > > underneath....
>
> > > Another BIG feature is that the ETL subsystem is portable across
> > > databases and operating systems....something that PL/SQL and SSIS are
> > > obvioulsy not.....this means that if some better/faster database comes
> > > out we can move to it......not something that MSFT would like to her
> > > and this is their newsgroup.....but it has always been a belief of
> > > mine that the ETL subsystem should be fully portable across OS and
> > > database.....and surprisingly, this is NOT the case with any of the
> > > ETL tools that I have seen....they all require quite some effort to
> > > move them.....thereby creating a cost to adopt a faster/cheaper/better
> > > database.
>
> > > If you are keenly interested, feel free to check my personal site
> > >www.peternolan.comwhereI have published vast amounts of code and
> > > documentation on ETL subsystems.
>
> > > Best Regards
>
> > > Peter
> > >www.peternolan.com-Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
Received on Thu May 24 2007 - 08:17:45 CEST

Original text of this message