data cleansing: externally or internally?

From: geos <geos_at_nowhere.invalid>
Date: Fri, 04 Nov 2011 07:51:49 +0100
Message-ID: <j9022g$4b5$1_at_news.task.gda.pl>



there is a big text file with dirty data. a company wants it to be clean. there are some known patterns expressed as like or regexp. I first thought about two approaches:
1) do this on the system level
2) or in a database
for the latter case it looks to me that I could use external tables or load data into temporary table and then do the cleaning.

I am looking for pros and cons of each variant. my intuition tells me that loading into temporary table would give the most flexibility but also take additional space. I am not sure about the other methods. I would appreciate your opinion about what I should pay attention to when choosing the other methods. how are they restricted in terms of performance, flexibility and capabilities (eg. multitable loading)? I am also interested in good practices and your experience in similar cases you can share.

thank you,
geos

-- 
NOTE: Follow Up set to comp.databases.oracle.misc
Received on Fri Nov 04 2011 - 01:51:49 CDT

Original text of this message