Re: ODBC and Unicode

From: Martin T. <bilbothebagginsbab5_at_freenet.de>
Date: 8 Jan 2007 07:21:08 -0800
Message-ID: <1168269668.059444.129530@38g2000cwa.googlegroups.com>

Alain Migeon wrote:
> In article <enm6m6$3b9$1_at_news2.zwoll1.ov.home.nl>,
> frank.van.bortel_at_gmail.com says...
> > Alain Migeon schreef:
> > > Hi
> > >
> > > I have a database on Oracle 10g using AL32UTF8 character set.
> > > The NLS_LANG is also set to this character set.
> > >
> > > The database contains data with cyrillic characters.
> > >
> > > I am accessing my database in C++ through ODBC. There is no problem to
> > > get the correct string that contains these cyrillic characters.
> > >
> > > However, when reusing without modification these cyrillic characters
> > > within a query through a where clause passed through SQLExecute, it
> > > doesn't return anything.
> >
> > And how did you do that? using some client side tool?
> > cut-n-paste? Di you realize the OS you work on actually has
> > to map those characters? By doing so, they may have gotten
> > a different codepoint.
> >
> > Best to set NLS_LANG to what is actually running on your client,
> > e.g. WE8MSWIN1254 (if that is the correct codepage for cyrillic)
> >
> > There's a fine note on Metalink regarding Facts, Myths and Errors
> > on NLS_LANG - a must read. Misconception #1: NLS_LANG must
> > match the DB charset.
>
> My goal is to have something generic that doesn't depend on specific
> locales. I am talking about cyrillic characters, but I could have say
> the same with chinese ones.
>
> I am not doing any cut and paste.
>
> I am getting my strings by doing a SQLBindColumn, a SQLFetch and a
> SQLExecute. If I set my NLS_LANG to the same as the database (AL32UTF8),
> there is no character conversion, and I am getting the set of characters
> that corresponds to the UTF8 encoding.
>
> The problem is when I reuse my strings a query. In that case, the copy
> paste is only a memory copy of a sequence of bytes.
> Here I can see, if I try to insert, that a conversion is made between
> each single char, resulting in corruption of what is inserted.
>
> For example, the character U+041F (CYRILLIC CAPITAL LETTER PE), is coded
> as D09F in UTF8.
>
> What happened? When inserting, D0 is inserted as C390 in UTF8, which
> correcponds to U+00D0.
>
> I'll like have a solution where no conversion is made, which Oracle
> accepting the query as a UTF8 string. I don't know if that is possible.
>
> Alain
>

If your client character set is the same as the DB character set there should be no conversion (in neither direction). Are you sure/can you check if the conversion happens on Oracle's side and not in the application?

br,
Martin Received on Mon Jan 08 2007 - 09:21:08 CST