Re: data cleansing: externally or internally?

From: Robert Klemme <shortcutter_at_googlemail.com>
Date: Fri, 11 Nov 2011 21:51:08 +0100
Message-ID: <9i5g5sFnc7U2_at_mid.individual.net>



On 11/10/2011 02:48 PM, Ramon F Herrera wrote:
> On Nov 4, 12:51 am, geos<g..._at_nowhere.invalid> wrote:
>> there is a big text file with dirty data. a company wants it to be
>> clean. there are some known patterns expressed as like or regexp. I
>> first thought about two approaches:
>> 1) do this on the system level
>> 2) or in a database
>> for the latter case it looks to me that I could use external tables or
>> load data into temporary table and then do the cleaning.
>>
>> I am looking for pros and cons of each variant. my intuition tells me
>> that loading into temporary table would give the most flexibility but
>> also take additional space. I am not sure about the other methods. I
>> would appreciate your opinion about what I should pay attention to when
>> choosing the other methods. how are they restricted in terms of
>> performance, flexibility and capabilities (eg. multitable loading)? I am
>> also interested in good practices and your experience in similar cases
>> you can share.

You still did not disclose the type of processing you want to do. Without that information advice cannot be targeted at your scenario.

> After more than a decade of experience my advice to you is: Use Oracle
> as little as possible.
>
> I wrote all my business logic in C/C++ making calls to the database
> only as needed, and now my applications run much, much, much, much
> faster. Not to mention the improved development and debug (can use a
> debugger, not sure whether Oracle has something similar).
>
> In essence, the only commands that I run in the database are basic
> ones such as SELECT and UPDATE. No IFs or BUTs.

This cannot be generalized as advice! You do not even mention the type of application(s) you are talking about. What may work good for the application types you work on may not work at all for other application types.

Kind regards

        robert Received on Fri Nov 11 2011 - 14:51:08 CST

Original text of this message