Oracle FAQ Your Portal to the Oracle Knowledge Grid

Home -> Community -> Usenet -> c.d.o.misc -> Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates

From: joel garry <>
Date: 6 Sep 2006 14:23:19 -0700
Message-ID: <>

kroger wrote:
> >> I'm a Java programmer
> >>
> > Not to be too cruel but this explains much.
> I don't feel offended. I'm not working with DBMS most of the time, that's
> why I'm posting here the questions, not the answers ;)
> > From the DBMS side my basic premise is that if the data is not
> > constrained a part of a primary key, unique constraint, foreign
> > key, check constraint, or by a trigger ... it is a memo field and
> > one should expect it to contain nothing but garbage.
> >
> > That seems to be your current situation.
> Exactly. But the scope of the project is to gather the data, filter, clean
> and make the garbage useful - by automated and manual matching.
> Whether one likes it or not, this is the goal and it implies some business
> requirements.
> As I wrote before - I'm just providing applications - there are up to 100
> people all over the world doing this (manual matching) Sisyphus job.
> > Get someone to apply some SQL or PL/SQL to the problem.
> After this discussion (which helped me a lot in fact) I can handle that
> myself - I'm not that COMPLETE newbie with it... ;)

There's a whole language dedicated to pattern matching called AWK (awk or nawk, depending on your system). You may have run across it if you've learnt any unix. Very useful for scrubbing data. I use it a lot for cleaning up the crud coming from spreadsheets or other integrity-challenged data sources, as well as maintaining data which requires views with different business rules (like, one group wants to see the data with the old rules and another with the new rules). Of course, if I were starting now I'd probably use perl. Oracle has only recently implemented perl style regexps and I'm not there yet.


-- is bogus.
"...not nearly as deep or as broad a cut as many had anticipated..."
Received on Wed Sep 06 2006 - 16:23:19 CDT

Original text of this message