Re: data cleansing: externally or internally?

From: Ramon F Herrera <ramon_at_conexus.net>
Date: Thu, 10 Nov 2011 05:48:57 -0800 (PST)
Message-ID: <53fd761c-8a9b-4fbf-980d-93e9448bc009_at_o5g2000yqa.googlegroups.com>



On Nov 4, 12:51 am, geos <g..._at_nowhere.invalid> wrote:
> there is a big text file with dirty data. a company wants it to be
> clean. there are some known patterns expressed as like or regexp. I
> first thought about two approaches:
> 1) do this on the system level
> 2) or in a database
> for the latter case it looks to me that I could use external tables or
> load data into temporary table and then do the cleaning.
>
> I am looking for pros and cons of each variant. my intuition tells me
> that loading into temporary table would give the most flexibility but
> also take additional space. I am not sure about the other methods. I
> would appreciate your opinion about what I should pay attention to when
> choosing the other methods. how are they restricted in terms of
> performance, flexibility and capabilities (eg. multitable loading)? I am
> also interested in good practices and your experience in similar cases
> you can share.
>
> thank you,
> geos
>
> --
> NOTE: Follow Up set to comp.databases.oracle.misc

After more than a decade of experience my advice to you is: Use Oracle as little as possible.

I wrote all my business logic in C/C++ making calls to the database only as needed, and now my applications run much, much, much, much faster. Not to mention the improved development and debug (can use a debugger, not sure whether Oracle has something similar).

In essence, the only commands that I run in the database are basic ones such as SELECT and UPDATE. No IFs or BUTs.

-Ramon Received on Thu Nov 10 2011 - 07:48:57 CST

Original text of this message