Re: Oracle 8.i --> Oracle 9i + Unicode

From: Howard J. Rogers <howardjr2000_at_yahoo.com.au>
Date: Fri, 12 Sep 2003 20:43:21 +1000
Message-Id: <3f61a3ba$0$15136$afc38c87@news.optusnet.com.au>

Ed wrote:

> Hi.
>
> We're about to upgrade from Oracle 8.1.7.4 to Oracle 9.2.0.3.
>
> At the moment we're using US7ASCII.
>
> We want to go to Unicode, as we're a university that would like to
> provide certificates in any language.
>
> Am I right in saying that Oracle 9i is unicode only?

No. 9i requires that your *national* character set be a unicode one, but you can still pick anything you like for your database character set.

The national character set can be used when you create tables with NCHAR and NVARCHAR2 and NCLOB datatypes. Otherwise, unless you voluntarily choose a unicode character set for your database character set, you won't actually meet unicode at all.

>
> Also, does converting to Unicode automatically increase the database
> size or does it just increase when the new foreign data is introduced?

See above. Do you have nvarchar2, nchar and nclob columns in your database already? If not, the conversion to 9i will not increase the database one iota. If you do, then it might. Likewise, if you elect to use a unicode character set for your database character set, then it might.

It depends. UTF8 (now called something like AL32UTF8) is a variable-width encoding, where letters like "t", "e" and "a" are single byte characters. But letters like "ë" "Ö" and "ô" are double bytes. And then Chinese, Korean, Arabic and so on 'characters' can be triple or even quadruple bytes. On the other hand, AL16UTF16 (or whatever it's actually called... UTF16 is the important bit) is a fixed width double-byte encoding. So "a", "e" and "t" take two bytes, but so do "ë", "Ö" and "ô", and so does Korean, Japanese and what have you.

If you were living and working and doing business in London, you would be mad to go the UTF16 route, because all those regular English characters would take an extra byte. But if you lived in Tokyo or Seoul, you would be mad to pick the UTF8 route, because your native characters take 3 or 4 bytes, whereas in UTF16 they would only take 2.

So does your database grow? It depends.

> The reason I ask is that I need to know by what multiple to resize the
> tablespaces.

If things are going to grow, it's a lot more serious than that. When you say 'create table blah varchar2(5)', you've actually just specified a field of 5 *bytes* in length, not 5 characters. So if you then suddenly migrate to a fixed-width double-byte character set (UTF16), then you'll be able to squeeze in only 2 and a half characters where before (in a single-byte character set such as ASCII or WE8ISO8859P1) you had 5. That's a table/column truncation error, and is something you can test for before actually doing the migration by using the Character Set Scanner (csscan is the executable, I believe).

Regards
HJR
>
> Can anyone give me any hints on these?
>
> Many thanks.
>
> Ed.
Received on Fri Sep 12 2003 - 05:43:21 CDT