Re: help searching text with accents and umlauts

From: Lee Horowitz <lee_at_jamtoday.com>
Date: Thu, 06 Dec 2001 19:43:20 -0500
Message-ID: <3C1010A8.E7AFCE75@jamtoday.com>

Thanx again.

I can see how in general the mappings of "decorated" letters to "base" letters might depend on the language; but as a practical matter, its likely that the ordinary bear (assumed to be an American English speaker) would expect a letter with a diacritical mark on it to map to the plain character as a "base" letter; following the
principal of "least surprise".

Be that as it may, I think you've "rung the gong". If we can get intermediatext/iText/context to perform as advertized, thats the tool we need.

Damien Salvador wrote:

> On Thu, 06 Dec 2001 15:36:17 -0500, lee
> <lee_at_jamtoday.com> a écrit:
> >
> >I hear you saying that we can escape having to store the target strings twice
> >(once plain, once fancy) by letting intermediaText/iText/context ... whatever
> >Larry is calling it these days create a map so that it "knows" that a plain
> >vanilla "e" is related to an
> >e accent aigue, e accent grave, e umlaut (is that allowed?) e circumflex and so
> >on.
>
> That's exactely the case. e circumflexe or umlaut is allowed. I think even
> the t grave (seen in Tchech Republic) can be.
>
> You just store your text 'as is', and iText makes the indexing for you.
>
> >I suppose the plain vanilla "e" would be the "base letter". I dont undertand
> >what you
> >mean by "following the NLS settings"
>
> I must admit I am not all clear on that.
>
> We were using only the "base_letter". But we experienced problems with
> composed word (déjà-vu ... ) and the support told us there were variations.
>
> Indeed it seems that depending on the charset the base is in, mapping is not
> always the same (maybe it was an example with portuguese, I'm not sure).
>
> It is a specificity for 9i.
>
> --
> Damien

text/x-vcard attachment: Card for Lee Horowitz

Received on Thu Dec 06 2001 - 18:43:20 CST