OraFAQ Forum: SQL & PL/SQL

Home » SQL & PL/SQL » SQL & PL/SQL » PL/SQL (Oracle 10g)

Show: Today's Messages :: Polls :: Message Navigator
E-mail to friend

PL/SQL [message #598542]

Tue, 15 October 2013 12:52

ranki
Messages: 19
Registered: October 2013

Junior Member

Hello,
I have an urgent requirement to identify name and address matching from a large table 2 to 3 crores.
Please help if any fastest method can be used for identifying the data duplication.
I have already used JARO_WINKLER and EDIT_DISTANCE algorithms but these are very very slow on large data.

[Updated on: Tue, 15 October 2013 12:52]

Report message to a moderator

Re: PL/SQL [message #598544 is a reply to message #598542]

Tue, 15 October 2013 12:55

BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal

Senior Member

The best solution is to NEVER allow duplicates to start with!

NEVER do in PL/SQL that which can be done in plain SQL

post SQL & results that show an example or two of data duplication.

Please read and follow the forum guidelines, to enable us to help you:

http://www.orafaq.com/forum/t/88153/0/

[Updated on: Tue, 15 October 2013 13:03]

Report message to a moderator

Re: PL/SQL [message #598545 is a reply to message #598544]

Tue, 15 October 2013 13:09

ranki
Messages: 19
Registered: October 2013

Junior Member

MY TABLE
ID
NAME
MIDDLE_NAME
LAST_NAME
ADDRESS1
ADDRESS2
CITY
STATE
Thank you for quick response.I am trying to black list customer having more than 5 ids/accounts
I want to identify name or customer having multiple account/ids within the same table.
SELECT UTL_MATCH.JARO_WINKLER_SIMILARITY (name||' '||middle_name||' '||last_name,name||middle_name||last_name) jws_name,UTL_MATCH.EDIT_DISTANCE_SIMILARITY (
address1||' '||ADDRESS2||' '||CITY,ADDRESS1||ADDRESS||CITY) jws_address from my_table where UTL_MATCH.EDIT_DISTANCE_SIMILARITY(name||' '||middle_name||' '||last_name,name||middle_name||last_name)>90 and UTL_MATCH.EDIT_DISTANCE_SIMILARITY (ADDRESS1||' '||ADDRESS2||' '||CITY,ADDRESS1||ADDRESS2||CITY)>90);

result data like this
ANJALI DESHPANDE
AJAALI DESHPAANDE

similarly address must match for the same person .

[Updated on: Tue, 15 October 2013 13:11]

Report message to a moderator

Re: PL/SQL [message #598550 is a reply to message #598545]

Tue, 15 October 2013 13:25

Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web

Senior Member

Have a look How does one eliminate duplicates rows from a table?

The above post is unreadable, seems syntactically incorrect.
Please use Instant SQL Formatter

[Updated on: Tue, 15 October 2013 13:29]

Report message to a moderator

Re: PL/SQL [message #598562 is a reply to message #598542]

Tue, 15 October 2013 13:56

ranki
Messages: 19
Registered: October 2013

Junior Member

Hi Lalit,
Actually I am trying to concat name,middle_name,last_name and compare within the same table.
And also concating ADDRESS and comparing within same table.
My aim is not to remove duplicates but I am trying to identify customer with multiple accounts/ids within the same table.
I have tried Oracle ware house builder and also algorithms provided by Oracle .But they seems to be too slow. For comparing 20000 records it is taking 6 to 7 hours and even more than that even after creating indexes on required columns,
Any method which is very fast and provide quick results for comparing large data is highly appreciable ..

Report message to a moderator

Re: PL/SQL [message #598563 is a reply to message #598562]

Tue, 15 October 2013 13:59

Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web

Senior Member

Let me understand your requirement correctly. You need a SQL to identify the duplicates based on some rule, right?

Report message to a moderator

Re: PL/SQL [message #598572 is a reply to message #598563]

Tue, 15 October 2013 14:45

ranki
Messages: 19
Registered: October 2013

Junior Member

Yes exactly, I have tried Oracle provided algorithms which give result but for comparing large data even for 20000 records it is taking hours together. And I have to compare 2 to 3 crores of data.
Tried all possible methods but no quick solution. And this has to be done in house and not by any deduplication softwares available outside.

Report message to a moderator

Re: PL/SQL [message #598575 is a reply to message #598572]

Tue, 15 October 2013 14:50

BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal

Senior Member

Finding "inexact matches" is resource (time) intensive.
If there was a better or faster way to accomplish the same task, it would be published & popular.
You are looking for the Golden Fleece so I wish you well while you continue your quest.

Report message to a moderator

Re: PL/SQL [message #598576 is a reply to message #598572]

Tue, 15 October 2013 14:51

Lalit Kumar B
Messages: 3174
Registered: May 2013
Location: World Wide on the Web

Senior Member

You are making simple things complex. When you crores, I assume it is multiples of ten million of records. That is not huge enough to take hours to return you the output. Number of records is not a measure of size.

It is high time to provide your test case, and show us what exactly is happening. Use SQL*Plus and paste the output, and let's see, if your query really takes hours. Now I don't mean you have to give us the real data, but, the create table statements and insert statements with test data and a working test case. If you are struggling with writing a query, we will help with it too.

Report message to a moderator

Re: PL/SQL [message #598601 is a reply to message #598575]

Wed, 16 October 2013 00:00

ranki
Messages: 19
Registered: October 2013

Junior Member

Blackswan,
Yes I know that I am looking for some thing as you said. But I was wondering when Oracle itself could not provide a faster method for its Own database how deduplication softwares provide fast matching methods??

Report message to a moderator

Re: PL/SQL [message #598602 is a reply to message #598601]

Wed, 16 October 2013 00:06

BlackSwan
Messages: 26766
Registered: January 2009
Location: SoCal

Senior Member

ranki ,
I want you to provide me instructions so that I can teach my goat to fly.

Oracle can not refute the laws of physics.

Report message to a moderator

Re: PL/SQL [message #598608 is a reply to message #598602]

Wed, 16 October 2013 01:13

ranki
Messages: 19
Registered: October 2013

Junior Member

Blackswan,
When senior members like you are quoting that it is time consuming and I too experienced it is tedious process for one time comparing,our team is expecting i move the magic stick and data is compared.

Report message to a moderator

Re: PL/SQL [message #598610 is a reply to message #598608]

Wed, 16 October 2013 01:30

pablolee
Messages: 2882
Registered: May 2007
Location: Scotland

Senior Member

Then you should reset the team's expectations. As Blackswan said, what you are trying to do is time consuming. Yes, there is software out there that can do it faster, it is specialised and doesn't do most of what Oracle can do. It's about picking the right tool for the job. If you want to de-dupe a lot of data using fuzzy matching rules, quickly, use a piece of software that is specifically designed for the task. If you choose not to, then you/your team will have to accept that it will take a bit longer.

Report message to a moderator

Previous Topic:	Unable to fisue out simple sql
Next Topic:	Nested table or Object table

Goto Forum:

-=] Back to Top [=-

[ Syndicate this forum (XML) ] [

]

Current Time: Thu Apr 25 07:08:53 CDT 2024