data cleansing: externally or internally?
From: geos <geos_at_nowhere.invalid>
Date: Fri, 04 Nov 2011 07:51:49 +0100
Message-ID: <j9022g$4b5$1_at_news.task.gda.pl>
there is a big text file with dirty data. a company wants it to be clean. there are some known patterns expressed as like or regexp. I first thought about two approaches:
1) do this on the system level
2) or in a database
for the latter case it looks to me that I could use external tables or load data into temporary table and then do the cleaning.
Date: Fri, 04 Nov 2011 07:51:49 +0100
Message-ID: <j9022g$4b5$1_at_news.task.gda.pl>
there is a big text file with dirty data. a company wants it to be clean. there are some known patterns expressed as like or regexp. I first thought about two approaches:
1) do this on the system level
2) or in a database
for the latter case it looks to me that I could use external tables or load data into temporary table and then do the cleaning.
I am looking for pros and cons of each variant. my intuition tells me that loading into temporary table would give the most flexibility but also take additional space. I am not sure about the other methods. I would appreciate your opinion about what I should pay attention to when choosing the other methods. how are they restricted in terms of performance, flexibility and capabilities (eg. multitable loading)? I am also interested in good practices and your experience in similar cases you can share.
thank you,
geos
-- NOTE: Follow Up set to comp.databases.oracle.miscReceived on Fri Nov 04 2011 - 01:51:49 CDT