Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: kroger <kroger_at_vp.pl>
Date: Fri, 8 Sep 2006 20:21:38 +0200
Message-ID: <edscbl$ffo$1@news.onet.pl>

> As much as possible, push the filtering out to the data entry point. It
> is a lot easier to keep garbage out thatn is is to clean up.

That I'm aware of.
A lot of verification is done before persisting the data, however, this particular thing is meant for kind of supervisors that by daily routine clean up the data.

When I started with that, I couldn't believe how inventive pepole can be to push garbage into database... Blank text field not allowed? Let's try spaces, dashes, asterisks, combinations of those with letters... Or just 'aaabbbcccddd' just to push the data through. No matter they KNOW there should be a legal name ;)

> For good or ill, my big project for next year will be something
> similar.
> Once done, it will sure make our system cleaner.

That's the whole point. This particular thing may be slow, with dirty GUI and dirty code, whatever else, but data check must be on the high level.

And to finish up with taking care for the CLEAN input: you cannot have an eye on all users on all continents... And how come can you tell by automated routine that for example Porshe AG and Porshe Engines are different entities for invoicing, even though located at the same address?

> Looks like you have a good idea how to approach this now.
> Good luck,

Much better than at the beginnig in any case :) Thank you all!

BR,
kroger Received on Fri Sep 08 2006 - 13:21:38 CDT