Home » Server Options » Text & interMedia » Is there a way to customize the BASE_LETTER conversions that oracle text does (Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production)
Is there a way to customize the BASE_LETTER conversions that oracle text does [message #618947] Wed, 16 July 2014 12:55
orauser001
Messages: 13
Registered: April 2013
Location: us
Junior Member
Hi - we use the BASE_LETTER attribute of lexer to be able to search accented characters/diacriticals in our oracle text searches. Works quite well with unicode codepoints in the Latin Extended A unicode range,

for example all these forms of O in the Latin Extended A are indexed as a simple O

014C LATIN CAPITAL LETTER O WITH MACRON - Ō
014D LATIN SMALL LETTER O WITH MACRON - ō
014E LATIN CAPITAL LETTER O WITH BREVE - Ŏ
014F LATIN SMALL LETTER O WITH BREVE - ŏ
0150 LATIN CAPITAL LETTER O WITH DOUBLE ACUTE - Ő
0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE - ő

but in Latin Extended B, there are various forms of O, and only one of them (01A0 LATIN CAPITAL LETTER O WITH HORN - Ơ ) is indexed as O, rest are stored as is and so dont support the multilingual searches

0186 LATIN CAPITAL LETTER OPEN O - Ɔ
019F LATIN CAPITAL LETTER O WITH MIDDLE TILDE - Ɵ
01A0 LATIN CAPITAL LETTER O WITH HORN - Ơ
01A1 LATIN SMALL LETTER O WITH HORN - ơ
01D1 LATIN CAPITAL LETTER O WITH CARON - Ǒ
01D2 LATIN SMALL LETTER O WITH CARON - ǒ
01EA LATIN CAPITAL LETTER O WITH OGONEK - Ǫ
01EB LATIN SMALL LETTER O WITH OGONEK- ǫ
01EC LATIN CAPITAL LETTER O WITH OGONEK AND MACRON- Ǭ
01ED LATIN SMALL LETTER O WITH OGONEK AND MACRON- ǭ
01FE LATIN CAPITAL LETTER O WITH STROKE AND ACUTE- Ǿ
01FF LATIN SMALL LETTER O WITH STROKE AND ACUTE - ǿ
020C LATIN CAPITAL LETTER O WITH DOUBLE GRAVE - Ȍ
020D LATIN SMALL LETTER O WITH DOUBLE GRAVE - ȍ
020E LATIN CAPITAL LETTER O WITH INVERTED BREVE - Ȏ
020F LATIN SMALL LETTER O WITH INVERTED BREVE - ȏ
022A LATIN CAPITAL LETTER O WITH DIAERESIS AND MACRON - Ȫ
022B LATIN SMALL LETTER O WITH DIAERESIS AND MACRON - ȫ
022C LATIN CAPITAL LETTER O WITH TILDE AND MACRON - Ȭ
022D LATIN SMALL LETTER O WITH TILDE AND MACRON - ȭ
022E LATIN CAPITAL LETTER O WITH DOT ABOVE - Ȯ
022F LATIN SMALL LETTER O WITH DOT ABOVE - ȯ
0230 LATIN CAPITAL LETTER O WITH DOT ABOVE AND MACRON - Ȱ
0231 LATIN SMALL LETTER O WITH DOT ABOVE AND MACRON - ȱ

Question: Is there a way to customize the BASE_LETTER conversions that oracle text does

Thanks in advance
Previous Topic: Find Words in a String
Next Topic: extract information from an unstructured email into tables
Goto Forum:
  


Current Time: Sat Nov 18 13:39:40 CST 2017

Total time taken to generate the page: 0.02099 seconds