RE: String comparison in Japanese

From: <prairiedweller_at_operamail.com>
Date: Thu, 24 May 2001 17:00:52 GMT
Message-ID: <f5bP6.387$ow2.276917@news.intnet.net>

Right, we realize there'll be some text in kanji as well, but we only expect to match them exactly, i.e. the user must enter the keyword in kanji as well.

I was told there is a one-to-one relationship between katakana and hiragana (or at least some of the characters), and apparently they've implemented this kind of comparison in localized Palm Pilots. As you say, implementing a translation function may be the only resort, but I'm hoping for a better solution, or at least something already implemented...

In article <3B1406F2_at_MailAndNews.com>, bryan munday <bardy_at_MailAndNews.com> wrote:
>From my experience of working with Japanese charater sets I think your
>trying
>to do the impossibe (please someone prove me wrong). The katakana and
>hiragana are completely different characters, and as they are based more on
>the actual pornunciation than anything else I don't think there will be a
>way
>of translating them.
>
>That said you could carry out a translate function to convert one to the
>other
>and then match. This however would mean serching your string for each
>occurance of one character and replacing it with another (all 80 or so kana)
>
>My only other thought would be to see if there was a regular shift in the
>numeric values for the characters and try and work out what the translation
>was from that.
>
>And finally are you sure you will never get kanji representation? If you do
>then you will not be able to perform the search due to the different way
>kanji
>are used.
>
>Sorry to be of so little use, but I must admit I never managed to do it. .
>
>Regards
>
>Bryan
>
>>===== Original Message From prairiedweller_at_operamail.com =====
>>Hello,
>>
>>Is there anyway to use Oracle 8's functionality to match Japanese text
>>entered using Katakana with its equivalent entered in Hiragana? Using a UTF-8
>>character set for the db, I've tried setting NLS_LANG to Japanese_japan.UTF8
>>and NLS_SORT to ASCII, and using NLSSORT to compare the two values, but no
>>dice. Is this possible at all using some other setup (e.g. using a Shift-JIS
>>character set or some other SQL functions), or am I trying to do the
>>impossible?
>>
>>[Note that I don't know Japanese myself, I'm just going by what the Japanese
>>client is asking me to implement.]
>>
>>To explain further: we're using an Oracle 8i database on a Solaris box; there
>>is this one table field "title" that contains some text, and using Perl and
>>SQL we'd like to "search" on the "title", i.e. the user enters a keyword and
>>we find all records in this table where the keyword appears in the "title"
>>field's value. This title is in Japanese, and may be entered using a mixture
>>of Katakana, Hiragana, regular Latin characaters, or the double-byte Latin
>>characters that are part of the Japanese character set.
>>
>>The problem: if the title is stored in Katakana, but the user searches for
the
>>same text entered in Hiragana, we'd like to see a match. But right now the
two
>>texts are treated as being different, i.e. the SQL where clause "WHERE
>>NLSSORT(:keyword) = NLSSORT(title)" is never true. The same goes for matching
>>text stored in regular ASCII-Latin characters and the same text stored in
>>double-byte Latin characters.
>>
>>Any help would be greatly appreciated. Thanks.
>
>------------------------------------------------------------
> Get your FREE web-based e-mail and newsgroup access at:
> http://MailAndNews.com
>
> Create a new mailbox, or access your existing IMAP4 or
> POP3 mailbox from anywhere with just a web browser.
>------------------------------------------------------------
>
Received on Thu May 24 2001 - 12:00:52 CDT