SQL Loader Problem and How to remove duplicates ?

From: thielm <thielm_at_ix.netcom.com>
Date: 1997/01/10
Message-ID: <32D6FD36.5CA9_at_ix.netcom.com>#1/1


Hey Guys  

Maybe some of you can help me

I've got a asci file with about 600,000 records (about 100MB). Each record is identified by a 40 character key. There are a lot of duplicates in the file.

I'm trying to use sql loader to load this file into a table with a primary key on the 40 characters. i.e. I don't want the duplicates.

When I use sql loader conventional path (GUI Interface) I get the following behaviour. The first 20,000 records load in 10 seconds. After the 20,000 (first duplicate occurs) the load suddenly becomes unbarebly slow and seems to get slower over time. I don't know why, but I'm suspecting that the bad file is part of it (since it is getting bigger and bigger).

Maybe somebody can answer these questions for me.

  1. How do I turn of the bad file (and discard file). I know there are a lot of dups and I'd like to turn off the overhead of having to create a bad file. I couldn't find anything in the docs to turn the bad file off all together.
  2. Does anyone have a theory why the load suddenly gets so slow ?

If I drop the index and do a load of the same file it absolutly flies, but now I'm stuck with a lot of duplicate keys. I don't want these duplicates.
I tried using a insert into from select distinct .... statement after the load but since a duplicate key doesn't necesseraly mean that all the other fields in the record are the same this is a problem too.

Example File

Key Info CollectionDate

abc  Hello    1/3/97
abc  Hello    1/8/97
def  Hi       2/2/97

If I issue :

insert into sample(Key, Info, CollectDate) select t.key, t.info, t.collectdate
from (select distinct key, info, collectdate from temptable) t;

I still get 3 records because the collectiondate is different in all three records. I don't want the second occurence of 'abc'. I do want the first occurence including the collectdate. This means that I can't remove the collect date from the statement since I need it.

Maybe somebody can answer this questions for me.

3. Is there an easy way to remove the duplicate keys but still retain all the fields associated with the first occurence of the key. For Instance, Other DBMS tools will create a violation table with all the duplicate keys when you create a primary index. This would be perfect but it doesn't seem oracle does this, it just fails when you create the index.

One more problem :

I tried to use the direct path method and got the following message :

SQL Loader version 7.3.2.2.0 etc etc etc Error checking path homogeneity
ORACLE-02371 : Loader must be at least version 7.3.2.2.0 for Direct Path

I don't know what this means, I clearly have version 7.3.2.2.0

Thanks to anyone who can answer some of my questions.

Mike. Received on Fri Jan 10 1997 - 00:00:00 CET

Original text of this message