Re: Unicode and Linguistic Sort ?

From: Yass Khogaly <ykhogaly_at_us.oracle.com>
Date: Sun, 7 Nov 1999 10:12:47 -0800
Message-ID: <804bqj$nbr$1@inet16.us.oracle.com>

Solution Summary:

Run full database export in old character set.
Create new database with new character set.
Run import with NLS_LANG environment variable set to new character set like in the following example: NLS_LANG American_America.we8iso8859p1

Solution Description:

RUN FULL DATABASE EXPORT IN OLD CHARACTER SET.

If original character set is US7ASCII NO environment variable needs to be set
before full database export is run.
If original character set is other than US7ASCII, an environment variable NLS_LANG needs to be set to an old character set before full database export is run.

For example:

If you are moving from a database with Western European character set WE8ISO8859P1 to a new database with Unicode character set UTF8, upon export NLS_LANG is set to WE8ISO8859P1.

For C shell: setenv NLS_LANG American_America.we8iso8859p1

2. CREATE NEW DATABASE WITH NEW CHARACTER SET.

Choose a Character Set for the Server.

Unless you are on Oracle8i, once the database is created, the character set cannot be changed without recreating the database, i.e., if you had a database
with default character set us7ascii and you want to change, for example, to 8-bit Western European we8iso8859p1 you need to rebuild your database.

The first number in any character set string will tell you whether it is 7-bit, 8-bit, 16-bit or 32-bit.

Example of 32-bit:

Language Traditional Chinese

Territory Taiwan

Character Set zht32cns1164386

(Note: This is an obsolete character set for Oracle8. New name is ZHT32EUC).

The way to find out what character set you have on the database right now is to issue one of the following commands:

SVRMGR> select * from sys.props$;

SQL> select * from nls_database_parameters;

PARAMETER VALUE NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_DATE_FORMAT DD-MON-YY NLS_DATE_LANGUAGE US7ASCII NLS_SORT BINARY NLS_CALENDAR GREGORIAN NLS_RDBMS_VERSION 7.3.3.3.0 11 rows selected.

The character set is specified either during the Installer session or with the CHARACTER SET clause of the CREATE DATABASE statement.

NOTE: If a character set is not included in the list of "Supported Character Sets" from an Installation Guide for your particular Platform and database version you CANNOT create a database for this particular Platform and database version.

Example of create database script:
CREATE DATABASE "japan"
character set "JA16EUC"
DATAFILE
'/home/japan/datafile/japansys01.dbf' SIZE 50M LOGFILE
'/home/japan/redolog/japanredolog1.dbf' SIZE 50M, '/home/japan/redolog/japanredolog2.dbf' SIZE 50M

Character encoding schemes are identified via Oracle-defined acronyms made up of:

For example, in this case:

WE = West European

8 = 8-bit encoding scheme

8859P1 = refers to ISO 8859/1 (but we can't use a '/' in a name, so use P instead)

FOR EXAMPLE:
Western European we8iso8859p1 character set is an 8-bit character set that supports the following languages:

Brazilian Portuguese, Canadian French, Danish, Dutch, Finnish, French, German, Italian, Latin American Spanish, Norwegian, Portuguese, Spanish, Swedish and, of course, English as English is a default for ANY CHARACTER SET. B. Setting ORA_NLS or ORA_NLS32 environment variable.

If you need to create database with any character set other than us7ascii you must set the environment variable ORA_NLS to
$ORACLE_HOME/ocommon/nls/admin/data

in the startup file of the oracle software owner.

NOTE:
For RDBMS 7.3.2 version it is called ORA_NLS, for version 7.3.3 and 7.3.4 it is called ORA_NLS32, for Oracle 8 it is called ORA_NLS33 because of NLS libraries version.

For Oracle version 8.x x if you need to connect from the client and Developer2000 version 1.6.1 is installed, ORA_NLS33 needs to be set

$ORACLE_HOME/ocommon/nls/admin/datad2k

NOTE:$ ORACLE_HOME /ocommon/nls/admin/data2k subdirectory is created by Developer2000 version 1.6.1 only. This is caused by the fact that this version of Developer uses two versions of NLSRTL library at once: version 3.2 for Oracle common RDBMS libraries (known as RSF on Windows) and version 3.3 for Forms processing. As these two NLSRTL versions needs different *.nlb files, two data directories are created: 'data' for 3.2 and 'datad2k' for 3.3. NLSRTL 3.2 files are pointed to by the ORA_NLS32 variable, which defaults from ORACLE_HOME. NLSRTL 3.3 files are pointed by the ORA_NLS33 variable, which must be set explicitly to
$ORACLE_HOME/ocommon/nls/admin/datad2k.

Before setting ORA_NLS please check that language files were installed properly on your system. To do that go to directory
$ORACLE_HOME/ocommon/nls/admin/data and issue the following command:

strings * | grep -i we8iso8859p1

where we8iso8859p1 is the new character set of new database.

You need to set the ORA_NLS (ORA_NLS32 depending on your version) environment
variable before you create your database, as ORA_NLS is the path to the language object files. For Oracle8 ORA_NLS33 does not have to be set. Make sure you have ONLY ONE ORA_NLS environment variable set at a time.

NOTE: Make sure end of UNIX directory path can be read, i.e. directory path is not too long. If it is too long, make a link so the end can be read.

If ORA_NLS is not set or can't be read and the database is started with other
languages and character sets, the database will default and character set will not be recognized.

This can result in:

Character set translation not occurring for clients using a character set different from the database's. This may result in corrupt data in certain applications as 8-bit or 16-bit or 32-bit may be lost and misinterpreted.
ORA-12705 errors.
"SQL*Loader-266: Unable to locate character set" error.
Messages being displayed in a language different from that expected.

You can check which NLS setting are valid for this platform by connecting to the database as a DBA and issuing the following command:

SELECT * FROM V$NLS_VALID_VALUES; If ORA_NLS was not set when the database was started, this will show only the defaults. ORA_NLS should be set for version 7.2 or higher.

NOTE: If you get any of ORA 127xx errors on create database, check whether all languages were installed from Oracle Server CD-ROM. Reinstalling languages from CD-ROM solves many error messages upon creation of a database with charset other than us7ascii.

Examples of errors: ORA-12700, ORA-12701, ORA-12705, ORA-12707, ORA-12708, 12709. Most common are ORA-12701 and ORA-12705.

3. RUN IMPORT WITH NLS_LANG ENVIRONMENT VARIABLE SET. 1.Make sure new database data dictionary recognizes new character set.

Export file should never be edited or touched in any way, otherwise export dump file becomes corrupt, it leads to errors on import that characters are not being recognized, thus full database export becomes invalid and has to be rerun to get a new export dump file.

If your original character set for the database was us7ascii, you were able to store ONLY English characters in the database. After full database export is run and a new database with, for example, Western European we8iso8859p1 character set was created, verify that the new character set is recognized by running one of the following query:

SVRMGR> select * from sys.props$;

SVRMGR> select * from nls_database_parameters;

PARAMETER VALUE NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_DATE_FORMAT DD-MON-YY NLS_DATE_LANGUAGE AMERICAN NLS_CHARACTERSET WE8ISO8859P1 NLS_SORT BINARY NLS_CALENDAR GREGORIAN NLS_RDBMS_VERSION 7.3.4.0.1 11 rows selected.

2. Set environment variable NLS_LANG to Americam_Americ.we8iso8859p1

Examples for different shells::

$ NLS_LANG=American_America.we8iso8859p1

$ export NLS_LANG

% setenv NLS_LANG American_America.we8iso8859p1

     Follow the logic of setting to the following string -
     language_territory.character_set

     NLS_LANG needs to be in capital letters to be recognized.

2.Run full database import with log option to see errors if there are any.

Now your new database is ready for import to be run.

ADDITIONAL NOTES. How to Choose a Language and Territory for the Server.

Shut down any running database instances.
In the initSID.ora files, change the NLS_LANGUAGE and NLS_TERRITORY parameters.
Restart the instances.

How to Choose a Language, Territory and Character Set for a User Session.

Specify the language, territory and terminal character set for each client by
setting the NLS_LANG environment variable on the client.

For the Bourne shell:
$ NLS_LANG=language_territory.character_set
$ export NLS_LANG

Example:
$ NLS_LANG=Dutch_America.we8iso8859p1
$ export NLS_LANG

For the C shell:
% setenv NLS_LANG language_territory.character_set Example:
% setenv NLS_LANG German_Germany.we8iso8859p1

NOTE: Some language and territory names use more than one word separated by a space. In these cases, double quotes should be used, for example:

$ setenv NLS_LANG "Canadian French_France.we8iso8859p1"
$ setenv NLS_LANG "Dutch_The Netherlands.we8iso8859p1"

"The Views expressed here are my own and not necessarily those of Oracle Corporation"
Jan Mosselman <jan.mosselman_at_eds.co.uk> wrote in message news:801ugg$rqn$1_at_nnrp1.deja.com...

> We are currently using Oracle 7.3.4, and would like to store text in a
> database using the Unicode UTF-2 character set. But we would also like
> to use linguistic sorts, which are not supported for the Unicode UTF-2
> character set in Oracle 7.3.4 (it always uses a binary sort).
>
> Does anyone know if there is a patch available for Oracle 7.3.4 that
> will enable linguistic sorts for UTF-2, or should we upgrade to
> Oracle 8 to get linguistic sort support for Unicode ?
>
> Regards,
>
> Jan Mosselman
>
>
> Sent via Deja.com http://www.deja.com/

> Before you buy.

Received on Sun Nov 07 1999 - 12:12:47 CST