Re: how to rebuild index after direct path load?

From: Dirk Bellemans <Dirk.Bellemans_at_skynet.belgium>
Date: Thu, 9 Dec 1999 18:51:46 +0100
Message-ID: <82tera$8uv$1_at_news0.skynet.be>


Well, one solution could be to load your data into a new table with the same layout as the old, let's call them "TableA" and "TableB". You leave the (primary key) index on the old table and create a new index in your new table (after the load). Now, depending on the ratio of duplicates/correct data and the size of the old table, there are several options to eliminate the old duplicates.

Let me explain one option: a PL/SQL procedure.

To ease the explanation, let's create a ViewA that is something like

SELECT *
FROM TableA
UNION ALL
SELECT *
FROM TableB

The "ALL" is essential to prevent Oracle from executing a MINUS set operation. Since you have an index, the rows are fetched very quickly.

The easiest way is by means of a cursor with an extra "old" row-variable, as in:

CURSOR c_view
IS
SELECT pk_id
FROM ViewA
ORDER BY pk_id;

row_old c_view%ROWTYPE;
row_current c_view%ROWTYPE;

BEGIN
  row_old.pk_id := NONEXISTING_VALUE;
  OPEN c_view;
  FETCH c_view INTO row_old;
  WHILE c_view%FOUND LOOP
    FETCH c_view INTO row_current;
    IF c_view%FOUND THEN

      IF row_old.pk_id = row_current.pk_id THEN
        DELETE FROM TableA WHERE pk_id = row_old.pk_id;
      ELSE
        row_old.pk_id := row_currnet.pk_id;
      END IF;

    END IF;
  END LOOP;
  CLOSE c_view;
END; It is always a good idea to do the commit at the end, somewhere ;-)

An even faster solution would be to use TWO cursors (one for each table) and delete by ROWNUM. But that requires some more programming...

HTH Of course you'll end up with two (disjoint) tables, but you can still choose to merge them in a view, or merge them physically.

--
Dirk Bellemans
Modify email address to reply (use .be instead of .belgium)

hkpang_at_my-deja.com wrote in message <81hbqm$6ql$1_at_nnrp1.deja.com>...

>Hi,
>
>I need to upload a huge dataset into an existing table. If the dataset
>contains duplicate keys, I want to the delete the old ones already in
>the table.
>
>My current approach is using sqlldr direct path. Since the data
>contains duplicate keys, the primary key index is violated and put in
>direct load state afterward.
>
>Now my problem is how to rebuild the index *and* eliminate the
>duplicates?
>
>1) use a delete command to eliminate the duplicates. this kind of works
>but the performance is unacceptable. i potentially have millions to
>delete.
>
>2) drop the index before upload, reenable the index with exceptions
>going to an exception table. the probelm is that both the old rows and
>the new rows are in the table, there is no way I can figure out which
>one to eliminate.
>
>Any suggestions???
>
>Many thanks for making a happy thanksgiving possible. :-)
>
>
>Sent via Deja.com http://www.deja.com/
>Before you buy.
Received on Thu Dec 09 1999 - 18:51:46 CET

Original text of this message