Re: SQL*Loader: Field termination

From: Russell Smith <rsmith_at_serck-controls.co.uk>
Date: 17 Jan 2002 01:48:22 -0800
Message-ID: <f191bfd4.0201170148.3b964893_at_posting.google.com>


"Henry Hayden" <gumzilla_at_voicenet.com> wrote in message news:<wFH08.154$Ld.19815_at_news2.voicenet.com>...
> I see, finally. Your CSV files seem to have been one-shot events, and you
> don't have the capability to redo them. If that's the case, then I probably
> would run SQL*Loader and let it build a BAD file. I'd process the BAD
> records in a second, separate operation as it would probably be easier to
> see if there is a definite pattern to the fault in the bad records.
>
> At this point, it's probably time to use another tool, such as VB or Delphi
> or FoxPro, to perform low-level parsing and correction of the remaining
> records. Then, once they have been fixed so that they match the original
> good records, you could use the same SQL*Loader control file to reprocess
> the cleaned-up data.
>
> Sure, it's not the most elegant solution, but sometimes brute force is all
> we have left. SQL*Loader seems pretty fast to me. I've loaded more than
> 80,000 records in less than two and a half minutes. I've also used Access
> to open the external table and append it into a blank Oracle table, but if
> you do this, you might as well
> go out and have a nice lunch and be back before Access finishes.
>
> Now, let's hope this doesn't become a recurring project, eh? Let me know
> how it turns out, and what you did to solve it. Good luck.
>
> Henry
>
>
>
>
> "Russell Smith" <rsmith_at_serck-controls.co.uk> wrote in message
> news:f191bfd4.0201140204.4929b628_at_posting.google.com...
> > "Henry Hayden" <gumzilla_at_voicenet.com> wrote in message
 news:<mzV%7.51$Ld.8946_at_news2.voicenet.com>...
> > > Could you be a little more specific about the problems caused by the "."
 on
> > > the end of the timestamp field?
> > > In the control file, are you using the parameter OPTIONALLY ENCLOSED BY
 in
> > > the control file? What is the data type of the timestamp field? It
 seems
> > > to me that if you used the switches FIELDS TERMINATED BY "," OPTIONALLY
> > > ENCLOSED BY '"' (terminated by a comma, strings enclosed by double
 quotes)
> > > the problem- whatever it is- might go away.
> > >
> > > In addition, you might want to (or maybe you have to) specify the field
 as a
> > > DATE field and supply a valid NLS_DATE_FORMAT mask. For example:
> > >
> > > timestamp DATE(19) "DD-MON-YYYY HH:MI:SS"
> > >
> > > If you do not supply a field width, SQL*Ldr will take if from the
 supplied
> > > mask.
> > >
> > > Let us know if this helps, or if there is still a problem.
> > >
> > >
> > > Henry
> > >
> > > "Russell Smith" <rsmith_at_serck-controls.co.uk> wrote in message
> > > news:f191bfd4.0201100342.c98246f_at_posting.google.com...
> > > > Hi,
> > > > I have been importing a bunch of csv's into oracle using the
> > > > sql*loader, and are having some problems.
> > > >
> > > > basically the timestamp field when the csv was generated was set to be
> > > > a fixed length field, and therefore sometimes i get a date like this;
> > > >
> > > > 24-MAY-1998 00:00:10.
> > > >
> > > > the . on the end is causing errors;
> > > > i need to know how to do an OPTIONALLY TERMINATED BY (i have just made
> > > > this up but this is what i need to do).
> > > >
> > > > Any ideas?
> > > >
> > > > Russ
> > > >
> > > > p.s. unfortunatly the csv's can't be done again properly.
> >
> > Right, the csv's were generated from another database, the program
> > that did it has an error where the time is truncated (there should be
> > milliseconds too), this is why there is occasionally a . at the end.
> > I am not using the optionally enclosed by thing. It rejects all the
> > records with a . on the end. one current (but shit) solution is to
> > process the data once and then process the bad file looking for a . on
> > the end.
> > any other ideas?

Thanks! I have come up with a solution and have also realised the problem;
the problem is that the software used to retrieve the data from the legacy system has a string lenght which is the correct size for the maxium length of a normal date/time but; date time in our software uses milli seconds too, so occasionally you get some milliseconds or just the decimal place...a pain.

The reason why i can't do the files again is that there are now about 20,000 files and in total just under 7gb. I imported 85mb of the stuff and its 12million rows.... :( :) its funny and it's also bad...it's going to take AGES.

Then i have to process them...

select count(*) from test;

this takes around 45-1min to execute...on 12million rows... :(

One other question; i have been monitoring the cpu usage whilst importing using sqlldr; and it never seems to go above about 15%: STRANGE. Is this my fault or oracle or my computer? (it's only a p3 500mhz 256mb ram...)

Russ Received on Thu Jan 17 2002 - 10:48:22 CET

Original text of this message