Re: ODBC and Unicode

From: Alain Migeon <migeon.alain_at_tdcspace.dk>
Date: Mon, 8 Jan 2007 11:00:07 +0100
Message-ID: <MPG.200c348e99e8614f9896d2@news.free.fr>

In article <enm6m6$3b9$1_at_news2.zwoll1.ov.home.nl>, frank.van.bortel_at_gmail.com says...
> Alain Migeon schreef:
> > Hi
> >
> > I have a database on Oracle 10g using AL32UTF8 character set.
> > The NLS_LANG is also set to this character set.
> >
> > The database contains data with cyrillic characters.
> >
> > I am accessing my database in C++ through ODBC. There is no problem to
> > get the correct string that contains these cyrillic characters.
> >
> > However, when reusing without modification these cyrillic characters
> > within a query through a where clause passed through SQLExecute, it
> > doesn't return anything.
>
> And how did you do that? using some client side tool?
> cut-n-paste? Di you realize the OS you work on actually has
> to map those characters? By doing so, they may have gotten
> a different codepoint.
>
> Best to set NLS_LANG to what is actually running on your client,
> e.g. WE8MSWIN1254 (if that is the correct codepage for cyrillic)
>
> There's a fine note on Metalink regarding Facts, Myths and Errors
> on NLS_LANG - a must read. Misconception #1: NLS_LANG must
> match the DB charset.

My goal is to have something generic that doesn't depend on specific locales. I am talking about cyrillic characters, but I could have say the same with chinese ones.

I am not doing any cut and paste.

I am getting my strings by doing a SQLBindColumn, a SQLFetch and a SQLExecute. If I set my NLS_LANG to the same as the database (AL32UTF8), there is no character conversion, and I am getting the set of characters that corresponds to the UTF8 encoding.

The problem is when I reuse my strings a query. In that case, the copy paste is only a memory copy of a sequence of bytes. Here I can see, if I try to insert, that a conversion is made between each single char, resulting in corruption of what is inserted.

For example, the character U+041F (CYRILLIC CAPITAL LETTER PE), is coded as D09F in UTF8.

What happened? When inserting, D0 is inserted as C390 in UTF8, which correcponds to U+00D0.

I'll like have a solution where no conversion is made, which Oracle accepting the query as a UTF8 string. I don't know if that is possible.

Alain

Alain Migeon
Please reverse alain and migeon for replying. Received on Mon Jan 08 2007 - 04:00:07 CST