Home » RDBMS Server » Server Administration » Character Set
Character Set [message #206355] Wed, 29 November 2006 13:33 Go to next message
lowcash75
Messages: 67
Registered: February 2006
Member
I'm in the process of performing migration from a Mysql database to Oracle 9.2.0.7

The Mysql database language/character set is setup us en-utf-8.

The oracle instance on the development environment is setup as
NLS_CHARACTERSET WE8MSWIN1252
NLS_NCHAR_CHARACTERSET AL16UTF16

The oracle instance on the production environment is setup as
NLS_CHARACTERSET WE8ISO8859P1
NLS_NCHAR_CHARACTERSET AL16UTF16

The column that I'm having issues with is a BLOB column in Mysql and I'm converting it into a CLOB when I bring it over to Oracle. Issue is primarily with "end of line", european characters and carriage returns. It does seem to keep the format and instead coverts these special characters to ' '.

Question is do I need to change my character set/encoding mechanism to another character set? Do I need to change my column to NCLOB instead of CLOB?

Appreciate any help on this issue.
Re: Character Set [message #206360 is a reply to message #206355] Wed, 29 November 2006 13:48 Go to previous messageGo to next message
andrew again
Messages: 2577
Registered: March 2000
Senior Member
Generally it's only advisable to use the NCHAR datatypes if you want to store multiple different languages. I would prefer to change the characterset on both dev and prod (if possible) to AL32UTF8 which is UTF8 and then just ignore the NCHAR characterset. By having both MS1252 and ISO-1 (8859.1), you may complicate your client NLS_LANG setup. Dev should be that same setup as prod - right... Oracle on Windows commonly defaults the db characterset to MS1252 and ISO-1 on Unix.

It's important to get the client NLS settings correctly set before copying the data across.

globalization is a massive topic - best to see the ora docs on http://tahiti.oracle.com/

[Updated on: Wed, 29 November 2006 13:51]

Report message to a moderator

Re: Character Set [message #206364 is a reply to message #206360] Wed, 29 November 2006 14:10 Go to previous messageGo to next message
lowcash75
Messages: 67
Registered: February 2006
Member
Andrew,
Thanks for your repsonse. I can change the development environment to mimic the production.
NLS_CHARACTERSET WE8ISO8859P1
NLS_NCHAR_CHARACTERSET AL16UTF16

But I don't think I'll be able to change the character set to AL32UTF8.

But how about the special characters that are being converted to "boxes". Will AL16UTF16 be able to support it? Whats the difference between AL16UTF16 and AL32UTF8?

Re: Character Set [message #206373 is a reply to message #206364] Wed, 29 November 2006 15:46 Go to previous messageGo to next message
andrew again
Messages: 2577
Registered: March 2000
Senior Member
http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96529/glossary.htm
AL16UTF16
The default Oracle character set for the SQL NCHAR data type, which is used for the national character set. It encodes Unicode data in the UTF-16 encoding.

AL32UTF8
An Oracle character set for the SQL CHAR data type, which is used for the database character set. It encodes Unicode data in the UTF-8 encoding.

http://download-west.oracle.com/docs/cd/B10501_01/server.920/a96529/glossary.htm#433170

UTF8
The UTF8 Oracle character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms. The UTF8 character set supports Unicode 3.0. Although specific supplementary characters were not assigned code points in Unicode until version 3.1, the code point range was allocated for supplementary characters in Unicode 3.0. Supplementary characters are treated as two separate, user-defined characters that occupy 6 bytes.

UTF-8
The 8-bit encoding of Unicode. It is a variable-width encoding. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes.

UTF-16
The 16-bit encoding of Unicode. It is an extension of UCS-2 and supports the supplementary characters defined in Unicode 3.1 by using a pair of UCS-2 code points. One Unicode character can be 2 bytes or 4 bytes in UTF-16 encoding. Characters (including ASCII characters) from European scripts and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes.

"Any" character can be stored in UTF-8 or UTF-16. Seems UTF-16 is a little more efficient at storing Chinese characters though...

Be very very careful of believing the characters like "boxes" you see displayed. Windows applications generally can't display all characters. Browsers are better (use iSqlplus) to query the data or better still - do a dump of the character to see what bytes it is stored as:

SELECT * FROM nls_database_parameters WHERE parameter = 'NLS_CHARACTERSET';
NLS_CHARACTERSET	AL32UTF8

create table utf8_tst(col1 varchar2(1 char));

-- Euro is U+20AC
insert into utf8_tst values (unistr('\20AC'));
-- Small Greek Gamma U+03B3
insert into utf8_tst values (unistr('\03B3'));
insert into utf8_tst values (unistr('A'));

select col1, vsize(col1), dump(col1, 1010) Decimal_bytes, dump(col1, 1016) Hex_Bytes from utf8_tst;
 	3	Typ=1 Len=3 CharacterSet=AL32UTF8: 226,130,172	Typ=1 Len=3 CharacterSet=AL32UTF8: e2,82,ac
	2	Typ=1 Len=2 CharacterSet=AL32UTF8: 206,179	Typ=1 Len=2 CharacterSet=AL32UTF8: ce,b3
A	1	Typ=1 Len=1 CharacterSet=AL32UTF8: 65	Typ=1 Len=1 CharacterSet=AL32UTF8: 41



[Updated on: Wed, 29 November 2006 15:47]

Report message to a moderator

Re: Character Set [message #206602 is a reply to message #206373] Thu, 30 November 2006 14:12 Go to previous message
lowcash75
Messages: 67
Registered: February 2006
Member
Now I've figured out the problem with the character set. The problem was because I was using Oracle Migration Workbench and that does not support NLS conversion. So now I'm trying to load the data one of these ways.

I'm trying to load bunch of data in a tab deleted file into a table which contains also a CLOB field.



I want to load the data one of two ways:



1) I have a file with all the dml statements, but when I run this I get a following error: ORA-01704: string literal too long



Here's the table structure and insert statement



create table test (report_id number, title varchar2(255), report_text clob);



INSERT INTO test

(report_id,

title,

report_text)

VALUES (

138,

'Time',

'Thzs cyntznuxtzyn qxpqrzmqnt qnxblqd us ty qxplyrq thq pysszbzlzty yf uszng tzmq rqsylvqd SXNS ty study nxnyqmulszyn cyxlqscqncq. Thq zdqx wxs ty study thq cyxlqscqncq yf szlzcynq yzl-zn-wxtqr nxnyqmulszyns by xddzng sxlt xnd chxngzng thq tqmpqrxturq zn yrdqr ty znducq thq fuszyn yf dryplqts. Yur fzrst SXNS qxpqrzmqnts yn thzs zdqx yzqldqd xggrqgxtzyn yf nxnydryplqts wzthyut szgnzfzcxnt cyxlqscqncq, xs gzvqn zn yur prqvzyus rqpyrt. Thq nxnyscxlq dryplqts pryvqd ty bq myrq rqszstxnt ty cyxlqscqncq thxn mzcryscxlq dryplqts xt thq sxmq surfxctxnt xnd sxlt cyncqntrxtzyns. X cyntznuxtzyn prypysxl qnxblqd us ty qxxmznq thq qffqcts yf cyxlqscqncq yn nxnyqmulszyns zn whzch thq rxtq yf cyxlqscqncq hxd bqqn zncrqxsqd, przmxrzly thryugh thq rqductzyn yf thq surfxctxnt cyncqntrxtzyn. Zt turns yut thxt dyzng tzmq-rqsylvqd SXNS yf nxnyqmulszyn cyxlqscqncq zs quztq x trzcky qxpqrzmqnt xnd much myrq dzffzcult thxn wq hxd yrzgznxlly xntzczpxtqd. Cyntryllzng thq rxtq yf thq cyxlqscqncq ty mxkq zt xppryprzxtq wzth thq cyllqctzyn tzmq scxlqs xnd q-rxngq zs nyntrzvzxl bqcxusq thq rxtq cxn vxry mxny yrdqrs yf mxgnztudq wzth just vqry smxll chxngqs zn cyncqntrxtzyn. Durzng yur fzrst dxy yf cyntznuxtzyn bqxmtzmq, wq xttqmptqd ty mxkq systqmxtzc mqxsurqmqnts yf thq cyxlqscqncq. Wq hxd much dzffzculty tunzng thq surfxctxnt xnd sxlt cyncqntrxtzyns xnd dryplqt vylumq frxctzyn ty put thq cyxlqscqncq rxtq rzght zn thq swqqt spyt xppryprzxtq fyr TR-SXNS. Xftqr thq fzrst dxy, wq hxd qzthqr rqsults thxt shywqd ynly vqry slyw ty ny qvylutzyn yf Z(q) yr, whqn wq xdjustqd ty cyncqntrxtzyns ty zncrqxsq thq cyxlqscqncq, Z(q) thxt hxd xlrqxdy qvylvqd tywxrd x cyxlqscqd mzcryscxlq qmulszyn wzth scxttqrzng fqxturqs wqll bqlyw thq xccqsszblq q-rxngq.\r\n\r\nSzncq thq TR-SXNS cyxlqscqncq qxpqrzmqnts xgxzn xppqxrqd ty bq pryblqmxtzc, yzqldzng ny dxtx thxt wyuld bq xppryprzxtq fyr publzcxtzyn, wq dqczdqd ty fycus yn TR-SXNS yf nxnyqmulszyn xggrqgxtzyn, x phqnymqnyn wq cyuld cyntryl myrq qxszly. Thzs turnqd yut ty bq vqry pryductzvq, yzqldzng x publzcxtzyn thxt hxs just bqqn xccqptqd zn Physzcxl Rqvzqw Lqttqrs (1). Wq studzqd thq vylumq frxctzyn dqpqndqncq yf thq qvylutzyn yf Z(q) xs dryplqts xggrqgxtqd wzthyut cyxlqsczng. Thzs yzqldqd xn zdqxl systqm fyr studyzng thq nqxrqst nqzghbyr structurq zn whxt wq nyw cxll "Slzppqry Dzffuszyn Lzmztqd Clustqr Xggrqgxtzyn (DLCX)". Durzng thq xggrqgxtzyn yf thq dryplqts, Z(q) dqvqlypqd rxthqr stryng pqxks xt hzgh q, zndzcxtzvq yf x hzgh cyyrdznxtzyn numbqr thxt zs nyt chxrxctqrzstzc yf trxdztzynxl yr clxsszc DLCX fyund wzth xggrqgxtqd gyld cyllyzds. Bqcxusq thq dryplqts xrq lzquzd, whqn thqy stzck tygqthqr, thqy dy nyt fyrm shqxr-rzgzd “5/2&brvbar;+5/2 slzp xryund qxch ythqr untzl thqy fyrm xddztzynxl bynds wzth nqzghbyrs xnd lyck znty x lycxl pxckzng. Rxthqr thxn just fyrm x dqnsq flyc, thqsq rzgzd clustqrs cxn fyrm frxctxl xggrqgxtqs xnd qvqn gql xt hzghqr vylumq frxctzyns. Zn symq sqnsq, Slzppqry DLCX zs lzkq Clxsszc DLCX zn thxt thq structurqs xrq frxctxl, but zn Slzppqry DLCX, thq bxszc buzldzng blyck thxt cxn fyrm rzgzd xggrqgxtqs zs x tqtrxhqdrxl clustqr, whqrqxs zn Clxsszc DLCX thq bxszc structurq zs thq pxrtzclq ztsqlf. Thq structurq fxctyr yf xggrqgxtqd nxnyqmulszyns zs zndqpqndqnt yf phz fyr 0.05 < phz < 0.25, xnd wq hxvq shywn thq structurq fxctyr zn Fzg. 1. Wq xlsy shyw x schqmxtzc rqprqsqntxtzyn yf thq xggrqgxtzyn prycqss fyr slzppqry DLCX zn Fzg. 2.\r\n\r\nXlthyugh wq qxpqrzqncqd dzffzcultzqs pqrfyrmzng TR-SXNS yn nxnyqmulszyn cyxlqscqncq, wq xrq qxtrqmqly dqlzghtqd thxt fycuszng yn TR-SXNS yf nxnyqmulszyn xggrqgxtzyn durzng thq xllyttqd bqxmtzmq hxs yzqldqd symq zmpyrtxnt nqw sczqncq xccqptqd by PRL. Slzppqry DLCX prymzsqs ty bq xn zmpyrtxnt xggrqgxtzyn phqnymqnyn, xnd wq thznk thxt zt wzll bq sqqn zn ythqr systqms.\r\n\r\nRqfqrqncqs:\r\n(1) Dqnsq Clustqr Fyrmxtzyn Durzng Xggrqgxtzyn xnd Gqlxtzyn yf Xttrxctzvq zt wzll bq sqqn zn ythqr systqms.\r\n\r\nRqfqrqncqs:\r\n(1) Dqnsq Clustqr Fyrmxtzyn Durzng Xggrqgxtzyn xnd Gqlxtzyn yf Xttrxctzvq.');



2) Second way would be load it via sqlldr. I have a tab delimited file with the same data and I'm running into the same problem.



Any immediate assistance on this would help.



Thanks in advance

[Updated on: Thu, 30 November 2006 14:13]

Report message to a moderator

Previous Topic: Toad Error??
Next Topic: how to create a data base
Goto Forum:
  


Current Time: Sat Dec 03 09:46:54 CST 2016

Total time taken to generate the page: 0.07162 seconds