Re: Oracle ODBC / UNICODE Non Conformance

From: <jocave_at_my-deja.com>
Date: Wed, 11 Oct 2000 01:15:34 GMT
Message-ID: <8s0evj$5ql$1@nnrp1.deja.com>

In article <8s0b1s$2kt$1_at_nnrp1.deja.com>, David Murphy <DavidMurphy_at_j-netdirect.com> wrote:
> Joe - thanks for the explanation - this is good info..
>
> For simplicity lets assume that all character data handled by the app
> is UNICODE double byte. The app needs to deal with non ANSI data
always.
> Therefore all binding must be done with SQL_C_WCHAR

OK, so presumably the application stores all its character data in two-byte character strings (i.e. TCHAR or WCHAR)?

>
> We need to insert into columns (either CHAR or NCHAR) using
> SQLBindParam. We need to read data (CHAR & NCHAR) using SQLBindColumn.
> Trouble is we dont know the column type before the insert since the
> underlying db column types are unknown to us due to the nature of the
> app.
>
> 1. My first question is what is the difference between NCHAR and CHAR
> in Oracle? I assumed NCHAR was the only one allowing double byte chars
> like UNICODE

I believe this will become a true statement with Oracle 9, but not at the moment. When NCHAR was introduced originally, it was for 'National character' data, essentially to allow you to have a second language defined. I don't know the history, but I assume that this designation was done before Unicode came out as a standard, or before it was widely embraced. Otherwise, it's a pretty pathetic data type.

I've never seen anyone using NCHAR for any sort of benefit.

If someone on the newsgroup has a little more background on NCHAR, or some idea for where it might be useful, I'd be interested.

> 2. I assume that the ODBC driver hides our app from having to know the
> encoding of char data in the database - is that correct? So what is
the
> format (encoding?)
> of character data that our app must provide when we bind (params or
> columns) with SQL_C_WCHAR?. I assume the UCS-2 format you descibed -
> correct?

Correct. Everything we pass around is UCS-2, which is pretty standard.

> 3. I guess in summary what we want is that we can write char data
bound
> with SQL_C_WCHAR and read it back bound with SQL_C_WCHAR and get the
> same data back
> no matter whether the underlying column is CHAR or NCHAR. Also no
> matter the database encoding scheme. Is this possible?

When you read data back, bound as SQL_C_WCHAR, you will get back Unicode data (i.e. every second byte is 0x00 if that actual data is English). If you bind the data as SQL_C_CHAR, you'll get back data in the local code page (i.e. ANSI 1-byte characters), but there may be conversion issues if the data was inserted as Unicode (i.e. no mapping to ANSI for some Unicode characters).

> 4. What should the SQL column type be for the SQLBindParameter for
CHAR
> and NCHAR? SQL_CHAR or SQL_UNICODE_CHAR?

If you have Unicode data, and want Unicode data back, binding as SQL_C_WCHAR is the way to go. Of course, you'll want to take a look at the various length parameters to see which ask for number of characters & which ask for number of bytes.

> 5. What does 'character set mismatch' mean when trying to update NCHAR
> data?

I assume that there's a problem converting the data to whatever national code page the database is assuming. As I said before, however, NCHAR is a rather useless data type, so I've never spent any time trying to understand how to use it.

> 6. On an unrelated matter - can we tell clients to use the latest ODBC
> driver for 8.1.5 on an 8.0.x database since it fixed a problem we
> needed fixed for them.

I doubt this will work. Assuming this is the unrelated matter I'm thinking of, I sent mail to you earlier today on this point. Basically, when we release new driver versions, we release them on each supported Oracle client platform (currently 8.0.5, 8.0.6, 8.1.5, 8.1.6, and (soon to be) 8.1.7). Thus, the most recent drivers for each platform (8.0.5.9, 8.0.6.1, 8.1.5.6, and 8.1.6.1) are all identical code bases, built against different Oracle client DLL's. You should be able to fix the customer's problem by installing the 8.0.5.9 driver.

Justin Cave
Oracle ODBC Development

Opinions expressed herein are my own and may not reflect those of Oracle Corporation.

> Thanks again
> David M
> NetDirect
>
> In article <8s058i$tqp$1_at_nnrp1.deja.com>,
> jocave_at_my-deja.com wrote:
> > In article <8rt64a$fih$1_at_nnrp1.deja.com>,
> > David Murphy <DavidMurphy_at_j-netdirect.com> wrote:
> > > Hi all - I have come across what appears to be different behaviour
> > > between the Oracle ODBC drivers and the MS ODBC drivers for MS SQL
and
> > > Acccess.
> > >
> > > We now bind all input parameter and output column character data
with
> > > SQL_C_WCHAR since we handle all char data as double byte.
> > >
> > > With SQL Server if the underlying column is CHAR (not NCHAR) the
ODBC
> > > driver (or manager?) converts the data to single byte (since CHAR
is
> > > single byte characters) and stores the data correctly.
> > > When reading the data back ODBC converts the data back to double
byte
> > > and everything works correctly.
> > >
> > > Same with Access although Access has only one char type which I
assume
> > > is alwasy double byte. IE it has no CHAR/NCHAR distinction.
> > >
> > > According to the ODBC doc the above behaviour is correct and
managed
> > > automatically by ODBC (Driver + manager)
> > >
> > > Now with Oracle none of this works. If the underlying column is
char
we
> > > get every 2nd byte 0 and loose half the remaining string.
> > >
> > > When reading the data back the same problem.
> > >
> > > If the underlying column is NCHAR then we just get the error
'character
> > > set mismatch' after SQLExecute.
> > >
> > > If anyone has any insight into these issues I would appreciate
it..
> >
> > I'll take a whack at it...
> >
> > First, a column declared 'CHAR' or 'VARCHAR2' in Oracle may hold
either
> > Unicode or ANSI data (or both if you were so inclined). Thus, when
you
> > do all the binding as SQL_C_WCHAR, our driver assumes you want to
insert
> > the data in it's Unicode form. The UCS-2 encoding we use for
Unicode
> > data will use two-byte characters with the second byte 0x00 if the
data
> > is ANSI. If the database were set up as a UTF-8 database, the
Unicode
> > data would be inserted as UTF-8 data (which is a different Unicode
> > encoding scheme that uses variable-length characters and should tend
to
> > save on space).
> >
> > One thing to be careful of here that declaring a column CHAR(1000)
> > indicates that you want to reserve 1000 *bytes* for the column. If
> > you're inserting two-byte characters, you can only insert 500. If
> > you're inserting variable-byte characters, you'll be able to insert
a
> > variable number (somewhere between 333 and 1000 for UTF-8 data).
> >
> > I believe that when Oracle 9i is released, it will allow you to
declare
> > a column explicitly Unicode. I haven't been following the
developments
> > on this front terribly closely, so I don't know exactly what is
coming
> > nor exactly how it will affect the ODBC driver.
> >
> > I'd suggest that you change your program to bind the columns as
> > SQL_C_CHAR if you want ANSI data to be inserted.
> >
> > Justin Cave
> > Oracle ODBC Development
> >
> > Opinions expressed herein are my own and may not reflect those of
Oracle
> > Corporation.
> >
> > Sent via Deja.com http://www.deja.com/
> > Before you buy.
> >
>
> Sent via Deja.com http://www.deja.com/
> Before you buy.
>

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Tue Oct 10 2000 - 20:15:34 CDT