Re: Issues while storing a long dash in oracle

From: Laurenz Albe <invite_at_spam.to.invalid>
Date: 11 Mar 2008 10:02:29 GMT
Message-ID: <1205229744.867385@proxy.dienste.wien.at>


Sandy80 <svarshneymail_at_gmail.com> wrote:
> nvl(rpad(UPPER(substr(nvl( substr(column1,1,instr(column1,'(')-1),column1),
> 1,24)),24),' ')

> Not sure as to why is it not working but when I run this command on
> the data "Finance ¿ Fulfillment Factory" the result returned is:
>
> "FINANCE ¿ FULFILLMENT F" whereas the correct result that I am looking
> for is "FINANCE ¿ FULFILLMENT FA".

It works fine on a 10g database here.

Most likely the encoding of the data does not match the client character set.

You must find out the encoding of your text data. Ask the people who produced the data, or examine the file and guess.

You could for example use "od -c". Here are two samples:

0000000   F   i   n   a   n   c   e     342 200 224       F   u   l   f
0000020   i   l   l   m   e   n   t       F   a   c   t   o   r   y  \n
0000040

This is in UTF-8 encoding. The em-dash (UNICODE 0x2014) is encoded as three bytes in UTF-8. They are represented in octal in the "od" output.

0000000   F   i   n   a   n   c   e     227       F   u   l   f   i   l
0000020   l   m   e   n   t       F   a   c   t   o   r   y
0000035

This is WIN-1252 encoding. The em-dash is a single byte here.

Next step is to set the client character set correctly.

For the first case, NLS_LANG should be AMERICAN_AMERICA.AL32UTF8, and for the second case it should be AMERICAN_AMERICA.WE8MSWIN1252 (the AMERICAN_AMERICA is not important - choose what you want).

If data and client character set match, SUBSTR will work and sqlldr should too.

Yours,
Laurenz Albe Received on Tue Mar 11 2008 - 05:02:29 CDT

Original text of this message