RE: String comparison in Japanese

From: bryan munday <bardy_at_MailAndNews.com>
Date: Thu, 24 May 2001 12:45:36 -0400
Message-ID: <3B1406F2@MailAndNews.com>

From my experience of working with Japanese charater sets I think your trying
to do the impossibe (please someone prove me wrong). The katakana and hiragana are completely different characters, and as they are based more on the actual pornunciation than anything else I don't think there will be a way
of translating them.

That said you could carry out a translate function to convert one to the other
and then match. This however would mean serching your string for each occurance of one character and replacing it with another (all 80 or so kana)

My only other thought would be to see if there was a regular shift in the numeric values for the characters and try and work out what the translation was from that.

And finally are you sure you will never get kanji representation? If you do then you will not be able to perform the search due to the different way kanji
are used.

Sorry to be of so little use, but I must admit I never managed to do it. .

Regards

Bryan

>===== Original Message From prairiedweller_at_operamail.com =====
>Hello,
>
>Is there anyway to use Oracle 8's functionality to match Japanese text
>entered using Katakana with its equivalent entered in Hiragana? Using a UTF-8
>character set for the db, I've tried setting NLS_LANG to Japanese_japan.UTF8
>and NLS_SORT to ASCII, and using NLSSORT to compare the two values, but no
>dice. Is this possible at all using some other setup (e.g. using a Shift-JIS
>character set or some other SQL functions), or am I trying to do the
>impossible?
>
>[Note that I don't know Japanese myself, I'm just going by what the Japanese
>client is asking me to implement.]
>
>To explain further: we're using an Oracle 8i database on a Solaris box; there
>is this one table field "title" that contains some text, and using Perl and
>SQL we'd like to "search" on the "title", i.e. the user enters a keyword and
>we find all records in this table where the keyword appears in the "title"
>field's value. This title is in Japanese, and may be entered using a mixture
>of Katakana, Hiragana, regular Latin characaters, or the double-byte Latin
>characters that are part of the Japanese character set.
>
>The problem: if the title is stored in Katakana, but the user searches for
the
>same text entered in Hiragana, we'd like to see a match. But right now the
two
>texts are treated as being different, i.e. the SQL where clause "WHERE
>NLSSORT(:keyword) = NLSSORT(title)" is never true. The same goes for matching
>text stored in regular ASCII-Latin characters and the same text stored in
>double-byte Latin characters.
>
>Any help would be greatly appreciated. Thanks.

Get your FREE web-based e-mail and newsgroup access at:

                http://MailAndNews.com

Create a new mailbox, or access your existing IMAP4 or POP3 mailbox from anywhere with just a web browser.

Received on Thu May 24 2001 - 11:45:36 CDT