Oracle FAQ Your Portal to the Oracle Knowledge Grid
HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US
 

Home -> Community -> Mailing Lists -> Oracle-L -> RE: Unicode: UTF-16 to UTF-8 conversion

RE: Unicode: UTF-16 to UTF-8 conversion

From: Faan DeSwardt <Faan_D1_at_VERIFONE.com>
Date: Mon, 03 Nov 2003 16:44:25 -0800
Message-ID: <F001.005D57A5.20031103164425@fatcity.com>


Alan,

Unfortunately UTF-8 and UTF-16 is in fact different in their binary representations and hence you will have to perform a full fledged conversion to map the UTF-16 characters to their UTF-8 binary representation. You can use the dump() SQL function to see this for yourself if you have both an UTF-8 and UTF-16 database. Just store the same string data in both and dump the binary representation to see the difference. Additionally, if you try to just alter the character set you will receive an error back from Oracle that UTF-16 is not an _exact_ subset of UTF-8.

The reason for the difference is that UTF-16 always uses 2 bytes for encoding Unicode but UTF-8 can use up to 3 bytes in the extreme case and hence use completely different encoding schemes to both support Unicode. I know this sounds confusing at times but Unicode is the standard and UTF-8 and UTF-16 are some of the encoding schemes implemented to support different versions of Unicode.

Referring to your second question about UTF-8 vs. UTF-16, I would suggest UTF8 for both the national character set and the database character set in a mixed 9i and pre-9i environment. If using a pure 9i (in the ideal world!) environment then I would use AL16UTF16 for the national character set and AL32UTF8 for the database character set. The reason being that Oracle will only be supporting and extending these 2 Unicode encodings going forward and you may save yourself another database character set conversion in the near future.

Another consideration would be the intensity of string manipulation in your application e.g. string searches, sub stringing, string lengths, etc. The more intense and frequent these operations are the more you would want to go with UTF-16 as this uses a fixed 2 bytes for encoding Unicode and is far more efficient at it than the variable byte length UTF-8 character set. Unfortunately you will have to load test your application to get accurate numbers but I have seen 30% longer response times on intense string manipulation operations (especially wildcard searches e.g. %DE%) after converting an Oracle 8.1.7 database from US7ASCII to UTF8. I know this is not comparing apples with apples, but gives and idea of what the extra byte and linked list/vector lookup can do to performance.

HTH, Faan

-----Original Message-----
Sent: Monday, November 03, 2003 2:20 PM
To: Multiple recipients of list ORACLE-L

Hi,

    I have a quick question that doesn't seem to be answered in the documentation. We have a database that has a database character set of WE8ISO8859P1 and a national character set of UTF-16. Currently we are not using any NCHAR, NVARCHAR2 or NCLOB columns, but would like to soon. However, we'd like to use UTF-8 instead of UTF-16. Since there should be no conversion necessary, is there any reason why I can't just alter the national character set to UTF-8? Or am I missing something?

    While we are on the subject, can anyone recommend for/against UTF-8 or UTF-16. I have read the pros and cons in the manual, but actual experiences could change my mind.

Thanks,

Alan

Alan Aschenbrenner
Oracle DBA
IHS Group
alan.aschenbrenner_at_ihs.com

--

Please see the official ORACLE-L FAQ: http://www.orafaq.net
--

Author:
  INET: alan.aschenbrenner_at_ihs.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing).

--

Please see the official ORACLE-L FAQ: http://www.orafaq.net
--

Author: Faan DeSwardt
  INET: Faan_D1_at_VERIFONE.com

Fat City Network Services    -- 858-538-5051 http://www.fatcity.com
San Diego, California        -- Mailing list and web hosting services
---------------------------------------------------------------------
To REMOVE yourself from this mailing list, send an E-Mail message to: ListGuru_at_fatcity.com (note EXACT spelling of 'ListGuru') and in the message BODY, include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be removed from). You may also send the HELP command for other information (like subscribing). Received on Mon Nov 03 2003 - 18:44:25 CST

Original text of this message

HOME | ASK QUESTION | ADD INFO | SEARCH | E-MAIL US