Oracle FAQ | Your Portal to the Oracle Knowledge Grid |
![]() |
![]() |
Home -> Community -> Usenet -> c.d.o.misc -> WE8ISO8859P1 and quasi-"conventional" conversion to US7ASCII for searches
Our database character set is WE8ISO8859P1, with the bulk of data in
basic US7ASCII letters. One exception is names.
Different users have different knowledge of how to create accented
characters (from the range chr(128) through chr(255)) on the keyboard,
so for the same name, one might enter "Jose" (no accents) while
another
might enter "José" (e with acute accent). Even between two people who
happen to have that name, one might be in the habit of entering it
with
the accent, and the other in the habit of entering it without the
accent, choices which we want to respect, meaning we cannot
standardize on "always with accents" or "always without accents".
This inconsistency complicates looking up data, since people looking
for
"Jose" wouldn't turn up "José" if the search mechanism matches names
directly on string equality.
We are considering using
CONVERT(<string>, 'US7ASCII', 'WE8ISO8859P1')
but only *after* applying several
REPLACE(<string>, <source letter>, <replacement letter(s)>)
operations (described below) as part of the search mechanism so that searches will succeed regardless of whether the user typing in the search query enters accented characters and whether the data they're looking for has accented characters.
The CONVERT operation simply strips off the accent mark for many
characters (e.g., é becomes e), but it does not convert ã, ñ, or õ
(with
tildes) to a, n, or o, respectively, so those would be so changed by
the
preliminary REPLACE.
We would also change æ (a & e smashed together) to ae (two letters),
by
using a preliminary REPLACE.
Thus, with this type of search mechanism, if someone looked up "Jose" or "José", any entry for either would be returned.
All the conversions mentioned so far are informal conversions of accented characters but they are conversions which seem sufficiently conventional here in the U.S.A. that they would be typically used by anyone who didn't know how to create the "special" character on the keyboard.
Not being sufficiently familiar with the languages from which the
following characters (which I've described in case they don't come
through properly) come, I am less sure about appropriate equivalents
[in
square brackets] and would appreciate comments on them:
ß (capital B with a long "tail") I believe this is a German letter
which
is often converted (perhaps only informally) to "ss" for US7ASCII.
[ss?
sz?]
Ð (D with a short dash through the left vertical line) [D? TH?]
ð (curly d with a short dash through its upper line, apparently the
lower
case version of the immediately preceding letter) [d? th?]
Ø (O with a forward slash in it) [O? OE?] ø (o with a short forward slash in it) [o? oe?]
Does anyone know of a preferable solution to this issue, keeping in
mind
that both names stored in the database and names entered for search
queries could be accented or not?
-Ken Ho Received on Thu Oct 03 2002 - 13:39:25 CDT
![]() |
![]() |