Re: Arabic text on MOTIF with Oracle

From: Philippe Verdy <verdy>
Date: 1998/10/30
Message-ID: <71ccrr$44g$1_at_minus.oleane.net>#1/1


Robin K. K. a crit dans le message <719kt2$9ld$1_at_dalen.get2net.dk>...
>I hope someone can help me here ...
>
>We are trying to set-up a MOTIF client on an AIX server so that our
 Oracle
>Forms v4.5 can run with Arabic texts. We are using NLS for boilerplates
>etc., and these are to remain in the American standard we are using,
 but the
>actual data fields should be able to accept Arabic characters.
>
>Can anyone give us some pointers here ?

Oracle allows you to specify the language and the character set separately. Arabic character sets are single-byte codes but this enters in concurrency with the ISO-8859-1 character set commonly used for Western european languages. You probably want that your application supports both English and Arabic, but also English and German or English and French, which use extended characters in the upper set. So both characters set should be accepted. The best thing is then to create your database in UNICODE, which uses two-bytes encoding per character, or to create your database in ISO-8859-1 and use ISO646 delimiters within stored strings to allow swithing between characters sets.

UNICODE has a cost, and most part of your database do not need UNICODE really, because most strings are internal and not presented to the user (such as unique codes in reference tables), or are numeric only (dates, order numbers, and so on...).

Most of the time, only labels are language dependant. These labels are not used as keys and are handled as a whole. So using an ISO646 delimiter to switch characters sets within those labels and texts (which is counted as 1 character within string character lengths) will be cost effective without affecting the functionality of your SQL code and of your application. Just allow your database and application to accept longer strings for those labels and texts.

Note that the use of ISO646 makes the effective character set to be multi-bytes, and you must adapt your strings management routines to track the character set used throughout the string (mainly when concatenating strings, or when extracting substrings). You must fix a convention for such strings. At least you must assume that a string begins in the Latin character set until a switch byte is encountered.

You must preserve this assumption when extracting substrings: you must count switch bytes from the beginning of the string up to the first position to be extracted, so that you can guess which set is used, to retain the correct character set in the extracted string.

When concatenating strings, you must count switch bytes throughout the first string to know if it terminates in the Latin or Arabic character set. You should avoid concatenating multiple switch bytes without characters between them, and allow some filtering of switch bytes without effect.

The other issue is the format of dates expected by the user. This presentation can be handled by a presentation library within the application, not by the database itself which can remain in a fixed and known setting. Finally the most complicated issue is the right-to-left direction of Arabic, and Hebrew languages, which has an impact on the way you program your GUI interface, i.e. the position of the insertion caret when the user enters arabic and hebrew characters, and the switch key combination between Arabic and Latin characters (for example when entering numbers; note that Latin languages use "Arabic" digits 0-9, while most Arabic dialects commonly use "Roman Latin" letters for digits...)

More you must provide regional settings for Arabic language, to allow local conventions which differ from country to country even though they all speak Arabic. And there are several conventions in the same country: at least there's the local popular Arabic dialect, and the historical islamic Arabic language. There are also several calendars conventions; the international Gregorian calendar is however commonly understood everywhere, but with different presentations for month names, and different orders for day/month/year. There are also different presentations for years (different eras, like in Japanese or in traditional Chinese). Such differences should be handled by using the user preference settings stored in their local profile.

The best thing is to experiment your software with an Arabic version of your client system. Received on Fri Oct 30 1998 - 00:00:00 CET

Original text of this message