Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.server -> Re: Unicode and Linguistic Sort ?
Solution Summary:
Solution Description:
For example:
If you are moving from a database with Western European character set WE8ISO8859P1 to a new database with Unicode character set UTF8, upon export NLS_LANG is set to WE8ISO8859P1.
For C shell: setenv NLS_LANG American_America.we8iso8859p1
2. CREATE NEW DATABASE WITH NEW CHARACTER SET.
Unless you are on Oracle8i, once the database is created, the character set
cannot be changed without recreating the database, i.e., if you had a
database
with default character set us7ascii and you want to change, for example,
to 8-bit Western European we8iso8859p1 you need to rebuild your
database.
The first number in any character set string will tell you whether it is 7-bit, 8-bit, 16-bit or 32-bit.
Example of 32-bit:
Language Traditional Chinese
Territory Taiwan
Character Set zht32cns1164386
(Note: This is an obsolete character set for Oracle8. New name is ZHT32EUC).
The way to find out what character set you have on the database right now is to issue one of the following commands:
SVRMGR> select * from sys.props$;
or
SQL> select * from nls_database_parameters;
PARAMETER VALUE NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_DATE_FORMAT DD-MON-YY NLS_DATE_LANGUAGE US7ASCII NLS_SORT BINARY NLS_CALENDAR GREGORIAN NLS_RDBMS_VERSION 7.3.3.3.0 11 rows selected.
The character set is specified either during the Installer session or with the CHARACTER SET clause of the CREATE DATABASE statement.
Example of create database script:
CREATE DATABASE "japan"
character set "JA16EUC"
DATAFILE
'/home/japan/datafile/japansys01.dbf' SIZE 50M
LOGFILE
'/home/japan/redolog/japanredolog1.dbf' SIZE 50M,
'/home/japan/redolog/japanredolog2.dbf' SIZE 50M
Character encoding schemes are identified via Oracle-defined acronyms made up of:
<language grouping><number of bits><originator>
For example, in this case:
WE = West European
8 = 8-bit encoding scheme
8859P1 = refers to ISO 8859/1 (but we can't use a '/' in a name, so use P instead)
FOR EXAMPLE:
Western European we8iso8859p1 character set is an 8-bit character set
that supports the following languages:
Brazilian Portuguese, Canadian French, Danish, Dutch, Finnish, French, German, Italian, Latin American Spanish, Norwegian, Portuguese, Spanish, Swedish and, of course, English as English is a default for ANY CHARACTER SET. B. Setting ORA_NLS or ORA_NLS32 environment variable.
If you need to create database with any character set other than us7ascii
you must set the environment variable ORA_NLS to
$ORACLE_HOME/ocommon/nls/admin/data
in the startup file of the oracle software owner.
NOTE:
For RDBMS 7.3.2 version it is called ORA_NLS, for version 7.3.3 and
7.3.4 it is called ORA_NLS32, for Oracle 8 it is called ORA_NLS33
because of NLS libraries version.
For Oracle version 8.x x if you need to connect from the client and Developer2000 version 1.6.1 is installed, ORA_NLS33 needs to be set
$ORACLE_HOME/ocommon/nls/admin/datad2k
NOTE:$ ORACLE_HOME /ocommon/nls/admin/data2k subdirectory is created by
Developer2000 version 1.6.1 only. This is caused by the fact that this
version of Developer uses two versions of NLSRTL library at once: version
3.2 for Oracle common RDBMS libraries (known as RSF on Windows) and
version 3.3 for Forms processing. As these two NLSRTL versions needs
different *.nlb files, two data directories are created: 'data' for 3.2
and 'datad2k' for 3.3. NLSRTL 3.2 files are pointed to by the ORA_NLS32
variable, which defaults from ORACLE_HOME. NLSRTL 3.3 files are pointed by
the ORA_NLS33 variable, which must be set explicitly to
$ORACLE_HOME/ocommon/nls/admin/datad2k.
Before setting ORA_NLS please check that language files were installed
properly on your system. To do that go to directory
$ORACLE_HOME/ocommon/nls/admin/data and issue the following command:
strings * | grep -i we8iso8859p1
where we8iso8859p1 is the new character set of new database.
You need to set the ORA_NLS (ORA_NLS32 depending on your version)
environment
variable before you create your database, as ORA_NLS is the path to the
language object files. For Oracle8 ORA_NLS33 does not have to be set.
Make sure you have ONLY ONE ORA_NLS environment variable set at a time.
NOTE: Make sure end of UNIX directory path can be read, i.e. directory path is not too long. If it is too long, make a link so the end can be read.
If ORA_NLS is not set or can't be read and the database is started with
other
languages and character sets, the database will default and character set
will not be recognized.
This can result in:
You can check which NLS setting are valid for this platform by connecting to the database as a DBA and issuing the following command:
SELECT * FROM V$NLS_VALID_VALUES; If ORA_NLS was not set when the database was started, this will show only the defaults. ORA_NLS should be set for version 7.2 or higher.
NOTE: If you get any of ORA 127xx errors on create database, check whether all languages were installed from Oracle Server CD-ROM. Reinstalling languages from CD-ROM solves many error messages upon creation of a database with charset other than us7ascii.
Examples of errors: ORA-12700, ORA-12701, ORA-12705, ORA-12707, ORA-12708, 12709. Most common are ORA-12701 and ORA-12705.
3. RUN IMPORT WITH NLS_LANG ENVIRONMENT VARIABLE SET. 1.Make sure new database data dictionary recognizes new character set.
Export file should never be edited or touched in any way, otherwise export dump file becomes corrupt, it leads to errors on import that characters are not being recognized, thus full database export becomes invalid and has to be rerun to get a new export dump file.
If your original character set for the database was us7ascii, you were able to store ONLY English characters in the database. After full database export is run and a new database with, for example, Western European we8iso8859p1 character set was created, verify that the new character set is recognized by running one of the following query:
SVRMGR> select * from sys.props$;
or
SVRMGR> select * from nls_database_parameters;
PARAMETER VALUE NLS_LANGUAGE AMERICAN NLS_TERRITORY AMERICA NLS_CURRENCY $ NLS_ISO_CURRENCY AMERICA NLS_NUMERIC_CHARACTERS ., NLS_DATE_FORMAT DD-MON-YY NLS_DATE_LANGUAGE AMERICAN NLS_CHARACTERSET WE8ISO8859P1 NLS_SORT BINARY NLS_CALENDAR GREGORIAN NLS_RDBMS_VERSION 7.3.4.0.1 11 rows selected.
2. Set environment variable NLS_LANG to Americam_Americ.we8iso8859p1
Examples for different shells::
$ NLS_LANG=American_America.we8iso8859p1
$ export NLS_LANG
% setenv NLS_LANG American_America.we8iso8859p1
Follow the logic of setting to the following string - language_territory.character_set NLS_LANG needs to be in capital letters to be recognized.
2.Run full database import with log option to see errors if there are any.
Now your new database is ready for import to be run.
ADDITIONAL NOTES. How to Choose a Language and Territory for the Server.
How to Choose a Language, Territory and Character Set for a User Session.
For the Bourne shell:
$ NLS_LANG=language_territory.character_set
$ export NLS_LANG
Example:
$ NLS_LANG=Dutch_America.we8iso8859p1
$ export NLS_LANG
For the C shell:
% setenv NLS_LANG language_territory.character_set
Example:
% setenv NLS_LANG German_Germany.we8iso8859p1
NOTE: Some language and territory names use more than one word separated by a space. In these cases, double quotes should be used, for example:
$ setenv NLS_LANG "Canadian French_France.we8iso8859p1"
$ setenv NLS_LANG "Dutch_The Netherlands.we8iso8859p1"
"The Views expressed here are my own and not necessarily those of Oracle
Corporation"
Jan Mosselman <jan.mosselman_at_eds.co.uk> wrote in message
news:801ugg$rqn$1_at_nnrp1.deja.com...
> We are currently using Oracle 7.3.4, and would like to store text in a > database using the Unicode UTF-2 character set. But we would also like > to use linguistic sorts, which are not supported for the Unicode UTF-2 > character set in Oracle 7.3.4 (it always uses a binary sort). > > Does anyone know if there is a patch available for Oracle 7.3.4 that > will enable linguistic sorts for UTF-2, or should we upgrade to > Oracle 8 to get linguistic sort support for Unicode ? > > Regards, > > Jan Mosselman > > > Sent via Deja.com http://www.deja.com/ > Before you buy.Received on Sun Nov 07 1999 - 12:12:47 CST