Re: [SQL] Comparing records, selecting them by two

From: <michael_bialik_at_my-deja.com>
Date: Fri, 29 Oct 1999 15:14:51 GMT
Message-ID: <7vcdl7$rmi$1@nnrp1.deja.com>

An obvious question :

Do you have indexes defined on name, postcode and valid fields?

Michael.

In article <7vbnn6$3ja$1_at_jaydee.iway.fr>, "Jean-Christophe Boggio" <cat_at_creaweb.fr> wrote:
> Hello,
>
> I have a 'clients' database that has the following main fields :
>
> id number (unique)
> name char(25)
> postcode char(10)
> phone char(20)
> valid char
>
> I want to find the duplicate records but not to wildly delete them :
I must
> compare the records two by two and keep one or the two of them but
only
> human can decide. The user can decide which fields will be compared :
for
> example, persons who have the same phone number are probably the same
> persons but sometimes phone is left blank so the user has to compare
on
> other fields.
> The valid field means that some records are already validated and the
new
> ones are compared to the validated ones.
>
> For the moment, my query looks like :
>
> CREATE TABLE foo (N1,N2) as
> SELECT c.id as N1,d.id as N2
> FROM clients c, clients d
> WHERE (c.id > d.id) and (c.name=d.name) and (c.postcode=d.postcode)
and
> (c.valid='Y') and (d.valid='N');
>
> This creates a table FOO with two fields being the unique IDs for the
> records to compare so I can load the two records and the user can
visually
> compare them. The result is good but it takes ages to complete. My
client
> says that the same duplicate search in M$ACCESS takes 40seconds and
4hours
> on Oracle :-((( (Same machine : P3/500, 512MB RAM)
>
> There are indexes for the ID, NAME, POSTCODE and PHONE fields. Being
new to
> Oracle, I don't now how to tune it, maybe I should do some simple
things
> like modify the cache size ? I've read a few things about analyzing
the
> tables but I don't know how it works at all. Is the tool supplied with
> Oracle ?
>
> Well, I started digging a little more in my SQL code and came up with
this
> starting query :
>
> select id,name,postcode from clients where name||postcode in
> (select name||postcode from clients group by name||postcode having
> count(*)>1);
>
> This works but :
> - doesn't check for the valid field
> - does not return two IDs per row
> - don't know about speed (I work on a 50000 recs subset of my client's
> data).
>
> I thought about this also (pseudocode) :
>
> create view foo as
> select name,postcode from clients group by name,postcode having
> count(*)>1;
>
> This gives me a list of duplicate values (I don't get all the
records, only
> the duplicated values).
> What I need then is some way to create the table containing all the
> duplicated couples : The idea is to separate the two processes :
> - one process creates the duplicate values (select * from foo)
> - the user selects one of them
> - the program will then search for all the records that match the
duplicate
> values. Something like :
> SELECT c.id,d.id
> FROM clients c, clients d
> WHERE c.id>d.id
> AND (c.name in (select name from foo)) AND (c.postcode in (select
postcode
> from foo))
> AND (d.name in (select name from foo)) AND (d.postcode in (select
postcode
> from foo));
>
> This is ugly and probably VERY VERY slow (and probably won't work)
but you
> see the point.
>
> I hope you get an idea of what I want to do and you can help me. Maybe
> there's a much simpler solution or another way to design it ?
>
> *ANY* help will be appreciated.
>
> Thanks in advance.
>
> P.S.: We are running Oracle 8.
>
>

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Fri Oct 29 1999 - 10:14:51 CDT