Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> [SQL] Comparing records, selecting them by two
Hello,
I have a 'clients' database that has the following main fields :
id number (unique)
name char(25)
postcode char(10)
phone char(20)
valid char
I want to find the duplicate records but not to wildly delete them : I must
compare the records two by two and keep one or the two of them but only
human can decide. The user can decide which fields will be compared : for
example, persons who have the same phone number are probably the same
persons but sometimes phone is left blank so the user has to compare on
other fields.
The valid field means that some records are already validated and the new
ones are compared to the validated ones.
For the moment, my query looks like :
CREATE TABLE foo (N1,N2) as
SELECT c.id as N1,d.id as N2
FROM clients c, clients d
WHERE (c.id > d.id) and (c.name=d.name) and (c.postcode=d.postcode) and
(c.valid='Y') and (d.valid='N');
This creates a table FOO with two fields being the unique IDs for the records to compare so I can load the two records and the user can visually compare them. The result is good but it takes ages to complete. My client says that the same duplicate search in M$ACCESS takes 40seconds and 4hours on Oracle :-((( (Same machine : P3/500, 512MB RAM)
There are indexes for the ID, NAME, POSTCODE and PHONE fields. Being new to Oracle, I don't now how to tune it, maybe I should do some simple things like modify the cache size ? I've read a few things about analyzing the tables but I don't know how it works at all. Is the tool supplied with Oracle ?
Well, I started digging a little more in my SQL code and came up with this starting query :
select id,name,postcode from clients where name||postcode in (select name||postcode from clients group by name||postcode having count(*)>1);
This works but :
- doesn't check for the valid field
I thought about this also (pseudocode) :
create view foo as
select name,postcode from clients group by name,postcode having
count(*)>1;
This gives me a list of duplicate values (I don't get all the records, only
the duplicated values).
What I need then is some way to create the table containing all the
duplicated couples : The idea is to separate the two processes :
- one process creates the duplicate values (select * from foo)
This is ugly and probably VERY VERY slow (and probably won't work) but you see the point.
I hope you get an idea of what I want to do and you can help me. Maybe there's a much simpler solution or another way to design it ?
*ANY* help will be appreciated.
Thanks in advance.
P.S.: We are running Oracle 8. Received on Fri Oct 29 1999 - 04:00:18 CDT