Home » SQL & PL/SQL » Client Tools » Unicode conversion (Oracle 9i, windows server 2003)
Unicode conversion [message #548779] Mon, 26 March 2012 04:21 Go to next message
sstellini
Messages: 32
Registered: April 2005
Location: Malta
Member
Hi there,

how can I convert incorrectly imported data into it's proper unicode format.
example:


FULL_NAME
GöRAN JOHANSSON
GÖRAN JOHANSSON


The first record is incorrectly imported and the second is how it should be looking like, if it has been properly imported.


NLS parameters are:
NLS_CHARACTERSET: WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16

In the example above, full_name is of a NVARCHAR2(100) type but the same problem applies to columns with VARCHAR2 type.

Is there a function or a peace of code I could use to convert value of the first record to be look alike of the second record?

thanks
Sonja
Re: Unicode conversion [message #548785 is a reply to message #548779] Mon, 26 March 2012 04:49 Go to previous messageGo to next message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
How are they "imported"?

If all is correctly set there is no need of any function, Oracle converts automatically and appropriately.

Regards
Michel
Re: Unicode conversion [message #548788 is a reply to message #548785] Mon, 26 March 2012 05:07 Go to previous messageGo to next message
sstellini
Messages: 32
Registered: April 2005
Location: Malta
Member
they had been imported in 2 ways, both resulted with the same problem:
1. via Toad's Import Table Data (Import from Text file option.. and the file was comma delimited csv file)
2. using an Oracle external table that was using csv file (organisation external type ORACLE_LOADER, comma delimited fields)
Re: Unicode conversion [message #548796 is a reply to message #548788] Mon, 26 March 2012 05:45 Go to previous messageGo to next message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
1. Through it away it is b....
2. Copy and paste all what you did and got including NLS parameters and content of the file.

Before, Please read OraFAQ Forum Guide and How to use [code] tags and make your code easier to read.
Make sure that lines of code do not exceed 80 characters when you format.
Indent the code, use code tags and align the columns in result.
Use the "Preview Message" button to verify.
Also always post your Oracle version, with 4 decimals.

Regards
Michel
Re: Unicode conversion [message #548802 is a reply to message #548796] Mon, 26 March 2012 06:25 Go to previous messageGo to next message
sstellini
Messages: 32
Registered: April 2005
Location: Malta
Member
thanks Michel,

here is the external table def:

CREATE TABLE leads
(
  EMAIL      VARCHAR2(100 BYTE),
  HTML       VARCHAR2(20 BYTE),
  F_NAME1    VARCHAR2(80 BYTE),
  S_NAME1    VARCHAR2(50 BYTE),
  URL        VARCHAR2(200 BYTE),
  OPTIN      VARCHAR2(50 BYTE),
  USERAGENT  VARCHAR2(300 BYTE),
  IP         VARCHAR2(150 BYTE),
  KEYWORD    VARCHAR2(300 BYTE)
)
ORGANIZATION EXTERNAL
  (  TYPE ORACLE_LOADER
     DEFAULT DIRECTORY MAILER
     ACCESS PARAMETERS 
       ( RECORDS DELIMITED BY newline
  LOGFILE 'log_file.log'
  FIELDS TERMINATED BY ','
  missing field values are null
  REJECT ROWS WITH ALL NULL FIELDS
  ( email CHAR(100),
  html        CHAR(20),
  F_NAME1     CHAR(80),
  s_NAME1     CHAR(50),
  url         CHAR(200),
  optin       CHAR(50),
  UserAgent   CHAR(300),
  ip          CHAR(150),
  keyword     CHAR(300)  
  )  
      )
     LOCATION (DOTMAILER:'Leads_to_load.csv')
  )
REJECT LIMIT UNLIMITED
NOPARALLEL
NOMONITORING;



Then, I was populating ORIG_LEADS table by insert into.. select from this external LEADS table.

NLS parameters are:
PARAMETER		VALUE
NLS_CALENDAR		GREGORIAN
NLS_CHARACTERSET	WE8MSWIN1252
NLS_COMP		BINARY
NLS_CURRENCY		$
NLS_DATE_FORMAT		DD-MON-RR
NLS_DATE_LANGUAGE	AMERICAN
NLS_DUAL_CURRENCY	$
NLS_ISO_CURRENCY	AMERICA
NLS_LANGUAGE		AMERICAN
NLS_LENGTH_SEMANTICS	BYTE
NLS_NCHAR_CHARACTERSET	AL16UTF16
NLS_NCHAR_CONV_EXCP	FALSE
NLS_NUMERIC_CHARACTERS	.,
NLS_RDBMS_VERSION	9.2.0.7.0
NLS_SORT		BINARY
NLS_TERRITORY		AMERICA
NLS_TIMESTAMP_FORMAT	DD-MON-RR HH.MI.SSXFF AM
NLS_TIMESTAMP_TZ_FORMAT	DD-MON-RR HH.MI.SSXFF AM TZR
NLS_TIME_FORMAT		HH.MI.SSXFF AM
NLS_TIME_TZ_FORMAT	HH.MI.SSXFF AM TZR



thanks for helping!
Re: Unicode conversion [message #548803 is a reply to message #548802] Mon, 26 March 2012 06:30 Go to previous messageGo to next message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
And the file is (only some of the invalid records)?

Regards
Michel
Re: Unicode conversion [message #548806 is a reply to message #548803] Mon, 26 March 2012 06:40 Go to previous messageGo to next message
sstellini
Messages: 32
Registered: April 2005
Location: Malta
Member
ok, here is it.. you can see the funny characters in USERAGENT column.
Re: Unicode conversion [message #548807 is a reply to message #548806] Mon, 26 March 2012 06:54 Go to previous messageGo to next message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
If the "funny" characters are in the file then they are in the datbase.
If your file is not in WE8MSWIN1252 character set, you have to tell its character set to Oracle:
SQL> CREATE TABLE leads
  2  (
  3    EMAIL      VARCHAR2(100 BYTE),
  4    HTML       VARCHAR2(20 BYTE),
  5    F_NAME1    VARCHAR2(80 BYTE),
  6    S_NAME1    VARCHAR2(50 BYTE),
  7    URL        VARCHAR2(200 BYTE),
  8    OPTIN      VARCHAR2(50 BYTE),
  9    USERAGENT  VARCHAR2(300 BYTE),
 10    IP         VARCHAR2(150 BYTE),
 11    KEYWORD    VARCHAR2(300 BYTE)
 12  )
 13  ORGANIZATION EXTERNAL
 14    (  TYPE ORACLE_LOADER
 15       DEFAULT DIRECTORY FILESDIR
 16       ACCESS PARAMETERS
 17         ( RECORDS DELIMITED BY newline CHARACTERSET UTF8
 18    LOGFILE 'log_file.log'
 19    FIELDS TERMINATED BY ','
 20    missing field values are null
 21    REJECT ROWS WITH ALL NULL FIELDS
 22    ( email CHAR(100),
 23    html        CHAR(20),
 24    F_NAME1     CHAR(80),
 25    s_NAME1     CHAR(50),
 26    url         CHAR(200),
 27    optin       CHAR(50),
 28    UserAgent   CHAR(300),
 29    ip          CHAR(150),
 30    keyword     CHAR(300)
 31    )
 32        )
 33       LOCATION (FILESDIR:'Leads_to_load.csv')
 34    )
 35  REJECT LIMIT UNLIMITED
 36  NOPARALLEL
 37  NOMONITORING
 38  /

Table created.

SQL> select * from leads;

no rows selected

Regards
Michel
Re: Unicode conversion [message #548809 is a reply to message #548807] Mon, 26 March 2012 06:59 Go to previous messageGo to next message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
And with your file:
SQL> select * from leads;
EMAIL
--------------------------------------------------------------------------------
HTML
--------------------
F_NAME1
--------------------------------------------------------------------------------
S_NAME1
--------------------------------------------------
URL
--------------------------------------------------------------------------------
OPTIN
--------------------------------------------------
USERAGENT
--------------------------------------------------------------------------------
IP
--------------------------------------------------------------------------------
KEYWORD
--------------------------------------------------------------------------------
bo.israelsson@platdetaljer-hallsberg.se
True


http://www.luxurymalta.co.uk/winter-189
21/11/2011 09:38
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)
150.70.172.107
;;;;;
bo@boroe.com
True


http://www.luxurymalta.co.uk/winter-189
21/11/2011 12:36
Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Borö Pannan; .NE
T CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)
194.68.222.196
httpwwwluxurymaltacoukswmaltadeal

2 rows selected.

SQL> @nls_param character

Parameter                     Session                       Database
----------------------------- ----------------------------- -----------------------------
NLS_CHARACTERSET              WE8MSWIN1252                  WE8MSWIN1252
NLS_NCHAR_CHARACTERSET        AL16UTF16                     AL16UTF16

You can see the characters are converted to the database one.

Regards
Michel
Re: Unicode conversion [message #548811 is a reply to message #548809] Mon, 26 March 2012 07:08 Go to previous messageGo to next message
sstellini
Messages: 32
Registered: April 2005
Location: Malta
Member
thank you very much
I'll use that the next time I'm importing.

However, the data has been loaded in the database and to load them again would take lot of time. Is there a way to convert the 'badly loaded' data to their correct presentation?

If nothing more sophisticated, then a search/replace would do.. I'm trying to find a translation table for these characters like:

bad_value    good_value
  ö              Ö

Re: Unicode conversion [message #548813 is a reply to message #548811] Mon, 26 March 2012 07:45 Go to previous messageGo to next message
sstellini
Messages: 32
Registered: April 2005
Location: Malta
Member
Just to say (to who ever might find it usefull) that I've fixed the bad data in the database by creating a 'translation table'.
Re: Unicode conversion [message #548814 is a reply to message #548811] Mon, 26 March 2012 07:56 Go to previous messageGo to next message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
You first have to get the UTF8 code points for all the non-ASCII characters, for instance:
SQL> select dump(convert('Ö','UTF8')) from dual;
DUMP(CONVERT('Ö','UT
--------------------
Typ=1 Len=2: 195,150

When you have all these values you can convert back:
SQL> select replace('-->Ã<--', chr(195)||chr(150), 'Ö') from dual;
REPLACE
-------
-->Ö<--

1 row selected.

Unfortunatly, it is (very) possible that some codes of UTF8 character fall in codes that do not exist in your character set and in this case Oracle convert them to a replacement character which is ¿ (chr(191) in WE8MSWIN1252 and so some UTF8 characters may have been converted to the same set of WE8MSWIN1252 characters.

Regards
Michel
Re: Unicode conversion [message #548815 is a reply to message #548813] Mon, 26 March 2012 07:56 Go to previous messageGo to next message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
sstellini wrote on Mon, 26 March 2012 14:45
Just to say (to who ever might find it usefull) that I've fixed the bad data in the database by creating a 'translation table'.


Can you post your "translation table", it may be useful for futur readers.

Regards
Michel
Re: Unicode conversion [message #548816 is a reply to message #548815] Mon, 26 March 2012 08:16 Go to previous messageGo to next message
sstellini
Messages: 32
Registered: April 2005
Location: Malta
Member
and here is the translation table for who ever comes this way Smile
Re: Unicode conversion [message #548817 is a reply to message #548816] Mon, 26 March 2012 08:19 Go to previous message
Michel Cadot
Messages: 57607
Registered: March 2007
Location: Nanterre, France, http://...
Senior Member
Account Moderator
Thanks
Michel
Previous Topic: Not able to make ora connection through macro in excel
Next Topic: How to create complete setup
Goto Forum:
  


Current Time: Thu Apr 17 17:11:32 CDT 2014

Total time taken to generate the page: 0.13398 seconds