Re: Problems loading data into the database using SQL Loader.

From: Vladimir M. Zakharychev <vladimir.zakharychev_at_gmail.com>
Date: Thu, 25 Jun 2009 00:14:40 -0700 (PDT)
Message-ID: <76711052-ef73-4174-bd3e-9ab6245fa7e5_at_r3g2000vbp.googlegroups.com>



On Jun 24, 5:47 pm, trub3101 <trub3..._at_sky.com> wrote:
> On 24 June, 07:55, "Vladimir M. Zakharychev"
>
>
>
> <vladimir.zakharyc..._at_gmail.com> wrote:
> > On Jun 23, 11:38 pm, trub3101 <trub3..._at_sky.com> wrote:
>
> > > On 23 June, 20:30, trub3101 <trub3..._at_sky.com> wrote:
>
> > > > On 23 June, 19:53, ddf <orat..._at_msn.com> wrote:
>
> > > > > On Jun 23, 1:38 pm, trub3101 <trub3..._at_sky.com> wrote:
>
> > > > > > On 23 June, 17:53, ddf <orat..._at_msn.com> wrote:
>
> > > > > > > On Jun 23, 11:42 am, trub3101 <trub3..._at_sky.com> wrote:
>
> > > > > > > > Hi all,
>
> > > > > > > > Database
> > > > > > > > Oracle 9.2.0.8
> > > > > > > > NLS_LANG=AMERICAN_AMERICA.UTF8
> > > > > > > > NLS_CHARACTERSET UTF8
> > > > > > > > NLS_NCHAR_CHARACTERSET AL16UTF16
>
> > > > > > > > Server
> > > > > > > > Red Hat Linux 2.6.9-67.ELsmp
> > > > > > > > LANG=en_US.UTF-8
>
> > > > > > > > I cannot figure this one out. When I try to populate a table using SQl
> > > > > > > > Loader I am getting corrupt characters in the table e.g. Alien³
> > > > > > > > appears in the table as Alien¿
>
> > > > > > > > I have even set the CHARACTERSET to UTF8 in the control file for good
> > > > > > > > measure!
>
> > > > > > > > Can anyone throw some light on this one?
>
> > > > > > > > Thanks in advance for your replies.
>
> > > > > > > > TB3101
>
> > > > > > > I'll take a guess that you're doing this from a client machine.  What
> > > > > > > is the NLS_LANG on the machine where you're running the loader?
>
> > > > > > > David Fitzjarrell- Hide quoted text -
>
> > > > > > > - Show quoted text -
>
> > > > > > Thanks for reply David,
>
> > > > > > I am actually logged onto the server and running the sql loader
> > > > > > process. The NLS_LANG is set to AMERICAN_AMERICA.UTF8
>
> > > > > > Cheers,
>
> > > > > > TB3101- Hide quoted text -
>
> > > > > > - Show quoted text -
>
> > > > > We'll need to see your control file and data file.
>
> > > > > David Fitzjarrell- Hide quoted text -
>
> > > > > - Show quoted text -
>
> > > > Hi David,
>
> > > > Here is the control file:
>
> > > > LOAD DATA
> > > > CHARACTERSET UTF8
> > > > INFILE '/opt/ora/oracle/oradata/data/xxxx.dat' "str '\n'"
> > > > BADFILE '/opt/ora/oracle/oradata/xxxx.bad'
> > > > DISCARDFILE '/opt/ora/oracle/oradata/xxxx.dis'
> > > > TRUNCATE
> > > > INTO TABLE film_film
> > > > TRAILING NULLCOLS
>
> > > > (
> > > >   START_TIME                    DATE "DD/MM/YYYY,HH24:MI:SS,"
> > > > TERMINATED BY '^',
> > > >   CHANNEL_ID                    CHAR(300)      TERMINATED BY '^',
> > > >   REGION_CODE                   CHAR(10)       TERMINATED BY '^',
> > > >   TITLE                         CHAR(200)      TERMINATED BY '^',
> > > >   PERFORMERS                    CHAR(4000)     TERMINATED BY '^',
> > > >   DIRECTOR                      CHAR(150)       TERMINATED BY '^',
> > > >   FILM_RATING                   INTEGER EXTERNAL       TERMINATED BY
> > > > '^',
> > > >   YEAR                          CHAR(15)       TERMINATED BY '^',
> > > >   DESCRIPTION                   CHAR(4000)    TERMINATED BY '^',
> > > >   PREMIERE                      CHAR(1)       TERMINATED BY '^',
> > > >   TV_MOVIE                      CHAR(1)       TERMINATED BY '^',
> > > >   FILM_CERTIFICATE              CHAR(10)      TERMINATED BY '^',
> > > >   WARNING                       CHAR(200)     TERMINATED BY '^',
> > > >   REVIEW_AUTHOR                 CHAR(50)
> > > > )
>
> > > > and here is the data file:
>
> > > > 30/06/2009 19:00:00^252^^Alien³^Warrant Officer Ripley*Sigourney
> > > > Weaver|Dillon*Charles S Dutton|Clemens*Charles Dance|Golic*Paul McGann|
> > > > Superintendent Andrews*Brian Glover|Aaron*Ralph Brown (2)|Morse*Danny
> > > > Webb|Rains*Christopher John Fields|Junior*Holt McCallany|Bishop
> > > > II*Lance Henriksen|Murphy*Christopher Fairbank|David *Pete
> > > > Postlethwaite|William*Clive Mantle^David Fincher^3^1992^Given that the
> > > > first two films stand up as sci-fi classics in their own right, Se7en
> > > > director David Fincher, in his feature film debut, had a virtually
> > > > impossible act to follow with this second sequel. He makes a
> > > > surprisingly good fist of it, developing the maternal themes of first
> > > > sequel Aliens and providing an exhilarating final showdown. Sigourney
> > > > Weaver returns as Ripley, who this time crash-lands on a prison colony
> > > > where another lethal alien is let loose. A familiar cast of Brits
> > > > (Charles Dance, Paul McGann, Brian Glover) provides the alien food
> > > > and, while it isn't in the same class as the first two films, this
> > > > provides a satisfactory entry in the series. Still, it would have been
> > > > interesting to see what second-choice director Vincent Ward (of The
> > > > Navigator: a Medieval Odyssey fame) would have made of it -
> > > > apparently, he was brought in when Renny Harlin left after script
> > > > disagreements, but was himself replaced when it emerged that his
> > > > version of the movie would be set in a monastery and the alien itself
> > > > wouldn't be appearing. ^N^N^18^Contains violence, swearing and nudity.
> > > > ^JF
>
> > > > Thanks again for your help,
>
> > > > TB3101- Hide quoted text -
>
> > > > - Show quoted text -
>
> > > Hi David,
>
> > > When I run 'cat -e' on the data file the output for the title is
> > > different!
>
> > > 30/06/2009 19:00:00^252^^AlienM-3^Warrant Officer Ripley*Sigourney
> > > Weaver|Dillon*Charles S Dutton|Clemens*Charles Dance|Golic*Paul McGann|
> > > Superintendent Andrews*Brian Glover|Aaron*Ralph Brown (2)|Morse*Danny
> > > Webb|Rains*Christopher John Fields|Junior*Holt McCallany|Bishop
> > > II*Lance Henriksen|Murphy*Christopher Fairbank|David *Pete
> > > Postlethwaite|William*Clive Mantle^David Fincher^3^1992^Given that the
> > > first two films stand up as sci-fi classics in their own right, Se7en
> > > director David Fincher, in his feature film debut, had a virtually
> > > impossible act to follow with this second sequel. He makes a
> > > surprisingly good fist of it, developing the maternal themes of first
> > > sequel Aliens and providing an exhilarating final showdown. Sigourney
> > > Weaver returns as Ripley, who this time crash-lands on a prison colony
> > > where another lethal alien is let loose. A familiar cast of Brits
> > > (Charles Dance, Paul McGann, Brian Glover) provides the alien food
> > > and, while it isn't in the same class as the first two films, this
> > > provides a satisfactory entry in the series. Still, it would have been
> > > interesting to see what second-choice director Vincent Ward (of The
> > > Navigator: a Medieval Odyssey fame) would have made of it -
> > > apparently, he was brought in when Renny Harlin left after script
> > > disagreements, but was himself replaced when it emerged that his
> > > version of the movie would be set in a monastery and the alien itself
> > > wouldn't be appearing. ^N^N^18^Contains violence, swearing and nudity.
> > > ^JF$
>
> > > Cheers,
>
> > > TB3101
>
> > Can you also post the SQL*loader log file? Do you see something like
> > this at the very beginning of the log:
>
> > Character Set UTF8 specified for all input.
> > First primary datafile xxxx.dat has a
> >  utf8 byte order mark in it.
>
> > I mean, you gotta make sure the file is really in UTF-8 so that
> > SQL*Loader could process it properly.
>
> > Regards,
> >    Vladimir M. Zakharychev
> >    N-Networks, makers of Dynamic PSP(tm)
> >    http://www.dynamicpsp.com-Hide quoted text -
>
> > - Show quoted text -
>
> Hi Vladimir,
>
> The log file has 'Character Set UTF8 specified for all input.' in it.
>
> Was 'First primary datafile xxxx.dat has a utf8 byte order mark in it'
> an observation on your behalf or was this also supposed to be in the
> log file?
>
> Thanks for your reply,
>
> TB3101

I observed it while testing your case and I suppose it should be there for the loader to properly process/convert your data. In my test (on 10.2.0.4/Win32,) a file in UTF-8 with BOM at the very beginning of the file was loaded correctly. It is not mandatory to have the BOM in an UTF-8 file as the byte order in UTF-8 is the same on all platforms and BOM's only purpose in UTF-8 is to mark otherwise unmarked plain text as being in Unicode.

Since your file is actually in ISO-8859-1, maybe it's easier to reflect this in the control file (CHARACTERSET WE8ISO8859P1) and let Oracle do the conversion to the database charset (it's pretty good at that.)

Hth,

   Vladimir M. Zakharychev
   N-Networks, makers of Dynamic PSP(tm)    http://www.dynamicpsp.com Received on Thu Jun 25 2009 - 02:14:40 CDT

Original text of this message