Re: help searching text with accents and umlauts

From: lee <lee_at_jamtoday.com>
Date: Thu, 06 Dec 2001 15:36:17 -0500
Message-ID: <3C0FD6C1.2DE61AE3@jamtoday.com>

Thanx for the reply. Let me see if I understand you properly....

I hear you saying that we can escape having to store the target strings twice (once plain, once fancy) by letting intermediaText/iText/context ... whatever Larry is calling it these days create a map so that it "knows" that a plain vanilla "e" is related to an
e accent aigue, e accent grave, e umlaut (is that allowed?) e circumflex and so on.

I suppose the plain vanilla "e" would be the "base letter". I dont undertand what you
mean by "following the NLS settings"

. I've been reading various of Oracle's white papers on localization, so I'm vaguely familiar with the role of various NLS parameters in setting the date formats, currency formats, numerical display formats , collation sequences, and so on, but I didnt see anythig about NLS parameters that control or influence whether an accented "e" would match a query for a plain "e". Perhaps there are more roles for NLS parameters inside of "intermediaText/Context/iText" ?

Damien Salvador wrote:

> On Thu, 06 Dec 2001 14:05:16 -0500, lee
> <lee_at_jamtoday.com> a écrit:
> >My english speaking user community wants to store names of books and
> >articles, "bibliographic reference data" ,so called, which may be in
> >western european languages other than English, or transliterated into a
> >latin character set from non latin alphabets possibly decorated with all
> >manner of accents, umlauts, and other diacritical (sp?) marks.
>
> Hi,
>
> you should try to look at InterMediaText , now iText in 9i. (was ConText
> before).
>
> It's purpose is to index in full-text.
>
> What you have to do, is to create a "lexer" with your "stop-words", and
> other particuliaritie.
>
> Especially meaningful to you is the "BASE_LETTER" atrtibute
> (9i uses a second value : BASE_LETTE_TYPE , to choose if you want to do the
> mapping by base forms, or following your NLS settings.
>
> Beware : there seem to be a bug when trying to use "BASE_LETTER" and
> "whitespace" simultaneously
>
> (the purpose was to ignore hyphens, that is
> Jean-Fr'ed'eric was considered the same as "Jean Frederic")
>
> Here is an example :
> SQL> begin
> 2 ctx_ddl.create_preference('TEST_LEXER','BASIC_LEXER') ;
> 3 CTX_DDL.set_attribute('TEST_LEXER','whitespace','-') ;
> 4 CTX_DDL.set_attribute('TEST_LEXER','base_letter', 'YES');
> 5 CTX_DDL.set_attribute('TEST_LEXER','base_letter_type','specific');
> 6 end;
> 7 /
>
> and then
> SQL> create index t_w on test_w(text)
> 2 indextype is ctxsys.context
> 3 parameters ('storage CTX_STORAGE lexer TEST_LEXER') ;
>
> (beware, this is 9i syntax ! On 8i you've got to connect as ctxsys/ctxsys)
>
> --
> Damien
Received on Thu Dec 06 2001 - 14:36:17 CST