Re: data cleansing: externally or internally?
From: Deadly Dirk <dirk_at_pfln.invalid>
Date: Fri, 4 Nov 2011 17:51:02 +0000 (UTC)
Message-ID: <pan.2011.11.04.17.50.19_at_pfln.invalid>
On Fri, 04 Nov 2011 07:51:49 +0100, geos wrote:
Date: Fri, 4 Nov 2011 17:51:02 +0000 (UTC)
Message-ID: <pan.2011.11.04.17.50.19_at_pfln.invalid>
On Fri, 04 Nov 2011 07:51:49 +0100, geos wrote:
> there is a big text file with dirty data.
How big is "big"?
> a company wants it to be
> clean. there are some known patterns expressed as like or regexp. I
> first thought about two approaches:
> 1) do this on the system level
> 2) or in a database
Database is not well suited for things like that. Personally, I would use Perl. Perl is my favorite tool because it's extremely versatile and fast but any scripting language with regex support will probably do.
-- I don't think, therefore I am not.Received on Fri Nov 04 2011 - 12:51:02 CDT