Home » SQL & PL/SQL » SQL & PL/SQL » PL/SQL (Oracle 10g)
PL/SQL [message #598542] Tue, 15 October 2013 12:52 Go to next message
ranki
Messages: 19
Registered: October 2013
Junior Member
Hello,
I have an urgent requirement to identify name and address matching from a large table 2 to 3 crores.
Please help if any fastest method can be used for identifying the data duplication.
I have already used JARO_WINKLER and EDIT_DISTANCE algorithms but these are very very slow on large data.

[Updated on: Tue, 15 October 2013 12:52]

Report message to a moderator

Re: PL/SQL [message #598544 is a reply to message #598542] Tue, 15 October 2013 12:55 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
The best solution is to NEVER allow duplicates to start with!

NEVER do in PL/SQL that which can be done in plain SQL

post SQL & results that show an example or two of data duplication.

Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/

[Updated on: Tue, 15 October 2013 13:03]

Report message to a moderator

Re: PL/SQL [message #598545 is a reply to message #598544] Tue, 15 October 2013 13:09 Go to previous messageGo to next message
ranki
Messages: 19
Registered: October 2013
Junior Member
MY TABLE
ID
NAME
MIDDLE_NAME
LAST_NAME
ADDRESS1
ADDRESS2
CITY
STATE
Thank you for quick response.I am trying to black list customer having more than 5 ids/accounts
I want to identify name or customer having multiple account/ids within the same table.
SELECT UTL_MATCH.JARO_WINKLER_SIMILARITY (name||' '||middle_name||' '||last_name,name||middle_name||last_name) jws_name,UTL_MATCH.EDIT_DISTANCE_SIMILARITY (
address1||' '||ADDRESS2||' '||CITY,ADDRESS1||ADDRESS||CITY) jws_address from my_table where UTL_MATCH.EDIT_DISTANCE_SIMILARITY(name||' '||middle_name||' '||last_name,name||middle_name||last_name)>90 and UTL_MATCH.EDIT_DISTANCE_SIMILARITY (ADDRESS1||' '||ADDRESS2||' '||CITY,ADDRESS1||ADDRESS2||CITY)>90);

result data like this
ANJALI DESHPANDE
AJAALI DESHPAANDE

similarly address must match for the same person .

[Updated on: Tue, 15 October 2013 13:11]

Report message to a moderator

Re: PL/SQL [message #598550 is a reply to message #598545] Tue, 15 October 2013 13:25 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
Have a look How does one eliminate duplicates rows from a table?

The above post is unreadable, seems syntactically incorrect.
Please use Instant SQL Formatter

[Updated on: Tue, 15 October 2013 13:29]

Report message to a moderator

Re: PL/SQL [message #598562 is a reply to message #598542] Tue, 15 October 2013 13:56 Go to previous messageGo to next message
ranki
Messages: 19
Registered: October 2013
Junior Member
Hi Lalit,
Actually I am trying to concat name,middle_name,last_name and compare within the same table.
And also concating ADDRESS and comparing within same table.
My aim is not to remove duplicates but I am trying to identify customer with multiple accounts/ids within the same table.
I have tried Oracle ware house builder and also algorithms provided by Oracle .But they seems to be too slow. For comparing 20000 records it is taking 6 to 7 hours and even more than that even after creating indexes on required columns,
Any method which is very fast and provide quick results for comparing large data is highly appreciable ..
Re: PL/SQL [message #598563 is a reply to message #598562] Tue, 15 October 2013 13:59 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
Let me understand your requirement correctly. You need a SQL to identify the duplicates based on some rule, right?
Re: PL/SQL [message #598572 is a reply to message #598563] Tue, 15 October 2013 14:45 Go to previous messageGo to next message
ranki
Messages: 19
Registered: October 2013
Junior Member
Yes exactly, I have tried Oracle provided algorithms which give result but for comparing large data even for 20000 records it is taking hours together. And I have to compare 2 to 3 crores of data.
Tried all possible methods but no quick solution. And this has to be done in house and not by any deduplication softwares available outside.
Re: PL/SQL [message #598575 is a reply to message #598572] Tue, 15 October 2013 14:50 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
Finding "inexact matches" is resource (time) intensive.
If there was a better or faster way to accomplish the same task, it would be published & popular.
You are looking for the Golden Fleece so I wish you well while you continue your quest.
Re: PL/SQL [message #598576 is a reply to message #598572] Tue, 15 October 2013 14:51 Go to previous messageGo to next message
Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web
Senior Member
You are making simple things complex. When you crores, I assume it is multiples of ten million of records. That is not huge enough to take hours to return you the output. Number of records is not a measure of size.

It is high time to provide your test case, and show us what exactly is happening. Use SQL*Plus and paste the output, and let's see, if your query really takes hours. Now I don't mean you have to give us the real data, but, the create table statements and insert statements with test data and a working test case. If you are struggling with writing a query, we will help with it too.
Re: PL/SQL [message #598601 is a reply to message #598575] Wed, 16 October 2013 00:00 Go to previous messageGo to next message
ranki
Messages: 19
Registered: October 2013
Junior Member
Blackswan,
Yes I know that I am looking for some thing as you said. But I was wondering when Oracle itself could not provide a faster method for its Own database how deduplication softwares provide fast matching methods??
Re: PL/SQL [message #598602 is a reply to message #598601] Wed, 16 October 2013 00:06 Go to previous messageGo to next message
BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal
Senior Member
ranki ,
I want you to provide me instructions so that I can teach my goat to fly.

Oracle can not refute the laws of physics.
Re: PL/SQL [message #598608 is a reply to message #598602] Wed, 16 October 2013 01:13 Go to previous messageGo to next message
ranki
Messages: 19
Registered: October 2013
Junior Member
Blackswan,
When senior members like you are quoting that it is time consuming and I too experienced it is tedious process for one time comparing,our team is expecting i move the magic stick and data is compared.
Re: PL/SQL [message #598610 is a reply to message #598608] Wed, 16 October 2013 01:30 Go to previous message
pablolee
Messages: 2882
Registered: May 2007
Location: Scotland
Senior Member
Then you should reset the team's expectations. As Blackswan said, what you are trying to do is time consuming. Yes, there is software out there that can do it faster, it is specialised and doesn't do most of what Oracle can do. It's about picking the right tool for the job. If you want to de-dupe a lot of data using fuzzy matching rules, quickly, use a piece of software that is specifically designed for the task. If you choose not to, then you/your team will have to accept that it will take a bit longer.
Previous Topic: Unable to fisue out simple sql
Next Topic: Nested table or Object table
Goto Forum:
  


Current Time: Thu Apr 25 07:08:53 CDT 2024