Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> Re: Selecting SIMILAR, not the same records (PROBABLE) duplicates
kroger wrote:
> >> I'm a Java programmer
> >>
> > Not to be too cruel but this explains much.
>
> I don't feel offended. I'm not working with DBMS most of the time, that's
> why I'm posting here the questions, not the answers ;)
>
> > From the DBMS side my basic premise is that if the data is not
> > constrained a part of a primary key, unique constraint, foreign
> > key, check constraint, or by a trigger ... it is a memo field and
> > one should expect it to contain nothing but garbage.
> >
> > That seems to be your current situation.
>
> Exactly. But the scope of the project is to gather the data, filter, clean
> and make the garbage useful - by automated and manual matching.
> Whether one likes it or not, this is the goal and it implies some business
> requirements.
> As I wrote before - I'm just providing applications - there are up to 100
> people all over the world doing this (manual matching) Sisyphus job.
>
> > Get someone to apply some SQL or PL/SQL to the problem.
>
> After this discussion (which helped me a lot in fact) I can handle that
> myself - I'm not that COMPLETE newbie with it... ;)
There's a whole language dedicated to pattern matching called AWK (awk or nawk, depending on your system). You may have run across it if you've learnt any unix. Very useful for scrubbing data. I use it a lot for cleaning up the crud coming from spreadsheets or other integrity-challenged data sources, as well as maintaining data which requires views with different business rules (like, one group wants to see the data with the old rules and another with the new rules). Of course, if I were starting now I'd probably use perl. Oracle has only recently implemented perl style regexps and I'm not there yet. http://download-west.oracle.com/docs/cd/B19306_01/appdev.102/b14251/adfns_regexp.htm#ADFNS1003
jg
-- @home.com is bogus. "...not nearly as deep or as broad a cut as many had anticipated..." http://www.signonsandiego.com/uniontrib/20060906/news_1b6intel.htmlReceived on Wed Sep 06 2006 - 16:23:19 CDT
![]() |
![]() |