Re: Removing dublicate records in table and hostarray

From: Graham Miller <lgmiller_at_elmrd.u-net.com>
Date: Fri, 24 Apr 1998 12:28:36 GMT
Message-ID: <35407fb5.862871@news.u-net.com>

Brian Rasmusson <br_at_belle.dk> wrote (in <354077AD.4037658F_at_belle.dk>)...

| Hi,
|
| Here is my detailed scenario:
|
| 1. I have table X with 1000's of transactions
| 2. I have destination table Y with millions of processed transactions
|
| I read 1000 records from table X into my hostarray.
| I want to find out which records in the hostarray already exists in the
| destination table Y.
|

This is where some of us are having a problem. Why don't you want to read only the records which are not duplicates? It will not be a problem.
So when you are filling your host array with transactions us a query which only gets rows which are certain to be inserted i.e:
select

     t.rowid, t.key_col, t.col1, t.col2, 0 t.status   from
    transactions_X t
  where
    not exists
    (select null

     from   master_Y m
     m.key_col = T.key_col);

Note: this query will use the index on the Master table primary key.

| What i would like is a sql statement that can return a list of statuses,
| one for each record on my hostarray.
| I would do something like this:
|
| col1[1000]
| col2[1000]
| status[1000]
|
| col1+2 is filled by my select from table x
| status is filled by an examination in table y which records from my
| hostarray already exists.
| I imagine a statement returning a 0 if the record is not a dupe, and 1
| if it is. I can then insert all records with status 0, and know it wont
| failed because of the primary key constraint.
|

Now, insert all the rows in the host array then delete them from the transaction table using the t.rowid as the key. Or mark them as processed or whatever.
This process will also be quick as you are accessing them by row-id (using the host array at a time processing - one call to Oracle for each 1000 rows).

| I continue the steps above, until table X is empty. It may contain
| 100.000's of records, and the destination table contains millions of
| records, so there is no way i can have all records in memory at the same
| time. Thats why i read them in chunks of 1000 records.
|

At the and of the cursor you will have inserted (and removed, marked or otherwise) all the non-duplicates.
What is left in the transaction table is the ones which already existed in the master table. Do with these what you will.

| Table X is filled by process A, while process B reads data from it. This
| cannot be synchronized.

Has no bearing on the above processing. imo

graham Received on Fri Apr 24 1998 - 07:28:36 CDT