Re: help searching text with accents and umlauts

From: Damien Salvador <damien.salvador_at_via.ecp.fr>
Date: 6 Dec 2001 19:16:07 GMT
Message-ID: <slrna0vgvm.s3t.damien.salvador@zen.via.ecp.fr>

On Thu, 06 Dec 2001 14:05:16 -0500, lee <lee_at_jamtoday.com> a écrit:
>My english speaking user community wants to store names of books and
>articles, "bibliographic reference data" ,so called, which may be in
>western european languages other than English, or transliterated into a
>latin character set from non latin alphabets possibly decorated with all
>manner of accents, umlauts, and other diacritical (sp?) marks.

Hi,

you should try to look at InterMediaText , now iText in 9i. (was ConText before).

It's purpose is to index in full-text.

What you have to do, is to create a "lexer" with your "stop-words", and other particuliaritie.

Especially meaningful to you is the "BASE_LETTER" atrtibute (9i uses a second value : BASE_LETTE_TYPE , to choose if you want to do the mapping by base forms, or following your NLS settings.

Beware : there seem to be a bug when trying to use "BASE_LETTER" and "whitespace" simultaneously

(the purpose was to ignore hyphens, that is Jean-Fr'ed'eric was considered the same as "Jean Frederic")

Here is an example :
SQL> begin

2 ctx_ddl.create_preference('TEST_LEXER','BASIC_LEXER') ;
3 CTX_DDL.set_attribute('TEST_LEXER','whitespace','-') ;
4 CTX_DDL.set_attribute('TEST_LEXER','base_letter', 'YES');
5 CTX_DDL.set_attribute('TEST_LEXER','base_letter_type','specific');

6 end;
7 /

and then
SQL> create index t_w on test_w(text)
2 indextype is ctxsys.context
3 parameters ('storage CTX_STORAGE lexer TEST_LEXER') ;

(beware, this is 9i syntax ! On 8i you've got to connect as ctxsys/ctxsys)

-- 
Damien

Received on Thu Dec 06 2001 - 13:16:07 CST