Re: Oracle ODBC / UNICODE Non Conformance

From: <jocave_at_my-deja.com>
Date: Tue, 10 Oct 2000 22:29:46 GMT
Message-ID: <8s058i$tqp$1@nnrp1.deja.com>

In article <8rt64a$fih$1_at_nnrp1.deja.com>, David Murphy <DavidMurphy_at_j-netdirect.com> wrote:
> Hi all - I have come across what appears to be different behaviour
> between the Oracle ODBC drivers and the MS ODBC drivers for MS SQL and
> Acccess.
>
> We now bind all input parameter and output column character data with
> SQL_C_WCHAR since we handle all char data as double byte.
>
> With SQL Server if the underlying column is CHAR (not NCHAR) the ODBC
> driver (or manager?) converts the data to single byte (since CHAR is
> single byte characters) and stores the data correctly.
> When reading the data back ODBC converts the data back to double byte
> and everything works correctly.
>
> Same with Access although Access has only one char type which I assume
> is alwasy double byte. IE it has no CHAR/NCHAR distinction.
>
> According to the ODBC doc the above behaviour is correct and managed
> automatically by ODBC (Driver + manager)
>
> Now with Oracle none of this works. If the underlying column is char
we
> get every 2nd byte 0 and loose half the remaining string.
>
> When reading the data back the same problem.
>
> If the underlying column is NCHAR then we just get the error
'character
> set mismatch' after SQLExecute.
>
> If anyone has any insight into these issues I would appreciate it..

I'll take a whack at it...

First, a column declared 'CHAR' or 'VARCHAR2' in Oracle may hold either Unicode or ANSI data (or both if you were so inclined). Thus, when you do all the binding as SQL_C_WCHAR, our driver assumes you want to insert the data in it's Unicode form. The UCS-2 encoding we use for Unicode data will use two-byte characters with the second byte 0x00 if the data is ANSI. If the database were set up as a UTF-8 database, the Unicode data would be inserted as UTF-8 data (which is a different Unicode encoding scheme that uses variable-length characters and should tend to save on space).

One thing to be careful of here that declaring a column CHAR(1000) indicates that you want to reserve 1000 *bytes* for the column. If you're inserting two-byte characters, you can only insert 500. If you're inserting variable-byte characters, you'll be able to insert a variable number (somewhere between 333 and 1000 for UTF-8 data).

I believe that when Oracle 9i is released, it will allow you to declare a column explicitly Unicode. I haven't been following the developments on this front terribly closely, so I don't know exactly what is coming nor exactly how it will affect the ODBC driver.

I'd suggest that you change your program to bind the columns as SQL_C_CHAR if you want ANSI data to be inserted.

Justin Cave
Oracle ODBC Development

Opinions expressed herein are my own and may not reflect those of Oracle Corporation.

Sent via Deja.com http://www.deja.com/
Before you buy. Received on Tue Oct 10 2000 - 17:29:46 CDT