Home » SQL & PL/SQL » SQL & PL/SQL » Duplicate Account Check Process
Duplicate Account Check Process [message #202627] Fri, 10 November 2006 10:21 Go to next message
srivaths
Messages: 60
Registered: June 2005
Location: France
Member

Is there a better way of doing this thing?

I have a duplicate account check process which has to check for the first time the whole set of records for duplicates based on some 10 rules and after that the new onces to the remaining ones which will run every night. I have a problem with running for the first time.

Set A : To be Compared Set

Set B : the Actual Set

My plsql logic

For the first run, both the sets are same.

Step 1 : Accumalate the Set A into a tmp table(TMP_DUP_ACCOUNT_CHECK) and array

Step 2 : Loop thro each rule

For Ex One of the rules (checking surname,first_name and dob):

      SELECT ap.account_num
      BULK COLLECT INTO array_dup_account
        FROM ACCOUNT_PROFILE ap,ACCOUNT_PERSON apn
       WHERE apn.surname = array_to_compare (i).l_surname
         AND apn.first_name = array_to_compare (i).l_first_name
         AND apn.dt_birth = array_to_compare (i).l_dob
         AND ap.account_status_id = active
         AND ap.account_num = apn.account_num
         AND apn.account_num NOT IN (SELECT account_num
                                       FROM TMP_DUP_ACCOUNT_CHECK);



Step 3 : The caught duplicate is inserted into another table after every rule, if a account is caught with one rule, it should not check for any further rules. The rule comparision are arranged in its priority order.


Step 4 : Run a Self check on Set A for all the rules to find if there is any duplicates within Set A.

It takes me around 1-2 secs for 10 rules to be completed for each account. I have some 1.5 lakh records for the first run. So its juz running, running ... as its
1.5 lakhs * 2 sec = some huge time.

If its running for cut-off of new records only like 5,000 to 10,000 compared to 1.5 lakhs. It gets completed in 3-4 hrs. Thats ok.

I juz planned to give it like running in splits

the SET A for 3 sets of rules like
Run1 : Rules (1,2,3)
Run2 : Rules (4,5,6,7)
Run3 : Rules (8,9,10)

So that atleast i can finish it off.

Is there a better way of doing it for the first run , its to compare 1.5 lakhs record to 1.5 lakh records for 10 rules
?

One more thing I call this plsql from Pro*C.

regards
Srivaths









[Updated on: Fri, 10 November 2006 10:22]

Report message to a moderator

Re: Duplicate Account Check Process [message #202673 is a reply to message #202627] Fri, 10 November 2006 18:38 Go to previous messageGo to next message
rleishman
Messages: 3724
Registered: October 2005
Location: Melbourne, Australia
Senior Member
I see by the sample rule you posted that you are performing SQL inside the loop.

You must not do this.

Oracle processes SQL and PL/SQL statements with different engines, and when you embed SQL inside PL/SQL, it performs a "context-switch" as it passes control from one engine to the other and back again. Context-switches are slow.

This is why we do BULK COLLECT instead of individual row-fetches. By fetching all rows in one step, we reduce context switches.

Since you have 1.5 lakhs (I don't know how much a lakh is, but it sounds like something with quite a few zeros on it) and 10 rules, you are probably performing AT LEAST 15 lakhs context switches.

There are two options here:
  • If you want to use procedural logic to evaluate the rules, you must fetch ALL of the necessary data in the main cursor. Do NOT use SQL inside the loop to retrieve more data. If this is complex, you may need Concurrent Cursors.
  • You can load the initial result-set into a GLOBAL TEMPORARY TABLE, and then perform bulk UPDATE or MERGE statements against it to evaluate each rule. If you need to update a lot of rows (ie. many rows will fail a rule), make sure you use Updateable Join Views


Ross Leishman
Re: Duplicate Account Check Process [message #203131 is a reply to message #202673] Tue, 14 November 2006 00:30 Go to previous message
srivaths
Messages: 60
Registered: June 2005
Location: France
Member

Thanks for the reply.

Sorry that I had to respond after 3 days.

Actually i use BULK collect to accumalate the SET A records into plsql table and a temp table.

And then I proceed going thro the rules which are framed in 10 different sqls which again accumalates the duplicate for each array element in SET A. I thought sql query would be fast comparing to sequencially going thro each record to find out a match.

Concurrent Cursor :

OPEN CURSOR1 (SET A)
LOOP START
     OPEN CURSOR2 (SET B)
     LOOP START
         CHECK RULE 1 (using only plsql no sql)
         ....
         CHECK RULE N
         ....
     END LOOP CUSROR2
LOOP END CUSROR1


Is this , what you meant?

And 1 lakh would be 100,000.

regards
Srivaths
Previous Topic: Double Inheritence
Next Topic: Updation in a table
Goto Forum:
  


Current Time: Tue Dec 06 04:20:21 CST 2016

Total time taken to generate the page: 0.14312 seconds