Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
Home -> Community -> Usenet -> c.d.o.misc -> Re: help searching text with accents and umlauts
On Thu, 06 Dec 2001 14:05:16 -0500, lee
<lee_at_jamtoday.com> a écrit:
>My english speaking user community wants to store names of books and
>articles, "bibliographic reference data" ,so called, which may be in
>western european languages other than English, or transliterated into a
>latin character set from non latin alphabets possibly decorated with all
>manner of accents, umlauts, and other diacritical (sp?) marks.
Hi,
you should try to look at InterMediaText , now iText in 9i. (was ConText before).
It's purpose is to index in full-text.
What you have to do, is to create a "lexer" with your "stop-words", and other particuliaritie.
Especially meaningful to you is the "BASE_LETTER" atrtibute (9i uses a second value : BASE_LETTE_TYPE , to choose if you want to do the mapping by base forms, or following your NLS settings.
Beware : there seem to be a bug when trying to use "BASE_LETTER" and "whitespace" simultaneously
(the purpose was to ignore hyphens, that is Jean-Fr'ed'eric was considered the same as "Jean Frederic")
Here is an example :
SQL> begin
2 ctx_ddl.create_preference('TEST_LEXER','BASIC_LEXER') ; 3 CTX_DDL.set_attribute('TEST_LEXER','whitespace','-') ; 4 CTX_DDL.set_attribute('TEST_LEXER','base_letter', 'YES'); 5 CTX_DDL.set_attribute('TEST_LEXER','base_letter_type','specific');6 end;
and then
SQL> create index t_w on test_w(text)
2 indextype is ctxsys.context
3 parameters ('storage CTX_STORAGE lexer TEST_LEXER') ;
(beware, this is 9i syntax ! On 8i you've got to connect as ctxsys/ctxsys)
-- DamienReceived on Thu Dec 06 2001 - 13:16:07 CST