Re: SQLLDR question

From: Access <idmwarpzone_NOSPAM__at_yahoo.com>
Date: Tue, 23 Nov 2004 23:41:20 +0100
Message-ID: <41a3bc92$0$29504$ba620e4c_at_news.skynet.be>


[Quoted] "Dinanath" <sulakhe_at_mcs.anl.gov> wrote in message news:103eadf.0411231235.22fc0e2f_at_posting.google.com...
> Hi,
> I have file that contains PROTEIN sequences in it. I will show the
> format below. It has about 2million records in it. I wanted to know if
> someone can suggest a sqlldr control file that would help in uploading
> the data that I need to upload. I don't want all the data from this
> file but only some data. I tried to figure out on my own but i
> couldn't do it..
>
> First look at the example of this file: Looks complex but is very
> simple.
>



> >gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain
>

MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIYSGGSS TYY
>

AQSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMESCCLS DIS
> GPVATGCLATGFCLPPRPSRGLINLEKL
> >gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain
>

ABLHOOJIHOIHOIDFIHDFMGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWV RQT
>

PGKGLEWVSIISAGGSTYYAPSVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGT MIT
> VTSATPSPPTVFPLMQACCSVDVTGPSATGCLATEF
> >gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain
>

POYOGHOSIHLDSKHSDLFHMGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWV RQA
> PGKGLEWVSTISYSVNAYYAQASLDHIADFKSFDLADKFLKJADSLFKAFD
> >gi|....|emb|.........
> ...............................
> ..........so on.. upto 2million records.
> ==========================================================
>
> There two things in this file for each record.
> One is Header: that starts with ">"
> Two is SEQUENCE: lines after header till the next header (letters in
> CAPS).
>
> The next record again start with a ">". and so on..
>
> I am interested to pick only two fields out of each records:
> 1. GI number: that is the number between ">gi|" and "|emb|" eg:2695851
> (for firt record).
> 2. Sequence: example, line numbers 2,3, and 4 of the first record.
> MGILTA......NLEKL
>
> Can anyone write an sqlldr control file that can extract this info and
> put in the following table:
>
> create table sequences
> (
> gi_number NUMBER NOT NULL,
> sequence CLOB NOT NULL
> );
>
> Please notice the second column is a clob. because the sequence might
> go beyond 4000 characters sometimes so i can't fit it into varchar.
>
> I would appreciate any help and sorry if its not the right place to
> post such messages. Please let me know where can post this message if
> not here.
>
> Thank you very much,
> Cheers,
> Dina

Which version of Oracle do you use ?
Maybe you could try to load the file with the external table feature (Oracle 9i and above).

See this page for examples :
http://www.dbasupport.com/oracle/ora9i/External_Tables9i.shtml

HTH
Matthias Hoys Received on Tue Nov 23 2004 - 23:41:20 CET

Original text of this message