Re: SQLLDR question

From: DA Morgan <damorgan_at_x.washington.edu>
Date: Sat, 27 Nov 2004 19:35:42 -0800
Message-ID: <1101612846.555535_at_yasure>


Dinanath wrote:

> Hi,
> I have file that contains PROTEIN sequences in it. I will show the
> format below. It has about 2million records in it. I wanted to know if
> someone can suggest a sqlldr control file that would help in uploading
> the data that I need to upload. I don't want all the data from this
> file but only some data. I tried to figure out on my own but i
> couldn't do it..
>
> First look at the example of this file: Looks complex but is very
> simple.
> ===========================================================================
>

>>gi|2695847|emb|CAA73704.1| immunoglobulin heavy chain

>
> MGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSAYMSWVRQAPGKGLEWVAYIYSGGSSTYY
> AQSVQGRFAISRDDSNSMLYLQMNSLKTEDTAVYYCARGGLGWSLDYWGKGTMITVTSATPSPPTVFPLMESCCLSDIS
> GPVATGCLATGFCLPPRPSRGLINLEKL
>
>>gi|2695851|emb|CAA73709.1| immunoglobulin heavy chain

>
> ABLHOOJIHOIHOIDFIHDFMGILTALCIIMTALSSVRSDVVLTESGPAVVKPGESHKLSCKAAGFTFSSYWMGWVRQT
> PGKGLEWVSIISAGGSTYYAPSVEGRFTISRDNSNSMLYLQMNSLKTEDTAMYYCARKPETGSYGNISFEHWGKGTMIT
> VTSATPSPPTVFPLMQACCSVDVTGPSATGCLATEF
>
>>gi|2695853|emb|CAA73712.1| immunoglobulin heavy chain

>
> POYOGHOSIHLDSKHSDLFHMGILTALCIIMTALSSVRSDVVLTESGPAVIKPGESHKLSCKASGFTFSSNNMGWVRQA
> PGKGLEWVSTISYSVNAYYAQASLDHIADFKSFDLADKFLKJADSLFKAFD
>
>>gi|....|emb|.........

>
> ...............................
> ..........so on.. upto 2million records.
> ==========================================================
>
> There two things in this file for each record.
> One is Header: that starts with ">"
> Two is SEQUENCE: lines after header till the next header (letters in
> CAPS).
>
> The next record again start with a ">". and so on..
>
> I am interested to pick only two fields out of each records:
> 1. GI number: that is the number between ">gi|" and "|emb|" eg:2695851
> (for firt record).
> 2. Sequence: example, line numbers 2,3, and 4 of the first record.
> MGILTA......NLEKL
>
> Can anyone write an sqlldr control file that can extract this info and
> put in the following table:
>
> create table sequences
> (
> gi_number NUMBER NOT NULL,
> sequence CLOB NOT NULL
> );
>
> Please notice the second column is a clob. because the sequence might
> go beyond 4000 characters sometimes so i can't fit it into varchar.
>
> I would appreciate any help and sorry if its not the right place to
> post such messages. Please let me know where can post this message if
> not here.
>
> Thank you very much,
> Cheers,
> Dina

SQL*Loader allows for conditional logic. Go to http://www.psoug.org Click on Morgan's Library
Click on SQL*Loader

-- 
Daniel A. Morgan
University of Washington
damorgan_at_x.washington.edu
(replace 'x' with 'u' to respond)
Received on Sun Nov 28 2004 - 04:35:42 CET

Original text of this message