Re: How to Output Query Result as UTF-16 XML

From: Malcolm Dew-Jones <yf110_at_vtn1.victoria.tc.ca>
Date: 10 Jun 2008 17:31:30 -0800
Message-ID: <484f1ce2$1_at_news.victoria.tc.ca>


Bill Wordsworth (bill.wordsworth_at_gmail.com) wrote:
: This is what I have done so far:

: In Stored Procedure, I set-
: * v_xml OUT NVARCHAR2
: * v_xml := '<?xml version="1.0" encoding="UTF-16" ?>...'; -- without
: any chr(10) newline

: In Application, I set-
: * header('Content-Type: application/xml; charset=UTF-16');
: * mb_convert_encoding($v_xml, "UTF-16");

You can convert the data all you like while it is in the database environment, but as soon the data is being extracted from the database environment into the layer that is retrieving the data from the database, the character data will be converted from what ever it was in the database into the charset of the calling environment.

If you are using the application server then the NLS_LANG set at the operating system level before apache is started will control that final implicit conversion. Other tools likewise.

Presumably if you store the XML as a BLOB then the bytes in the blob will be extracted without that implicit conversion. In that case you could send it to the client as-is and the client would see exactly the byte sequence you prepared for them in the database (you would of course ensure the bytes were valid utf-16).

In my experience, when the data is extracted than Oracle insists on doing this implicit conversion, but if you have control over the environment then you maybe able to set the settings you need.

Perhaps using JAVA in the database to extract the data, for example into a file, writing the bytes yourself, thereby bypassing what oracle does, would get around this problem. Perhaps using the C interface and extracting the data using a C program (perhaps running on the application server as a CGI program for example), would provide more flexibility. Perhaps there is some setting I do not know about in the normal oracle tools that move data from the database into the "outside world".

Otherwise, you can do something like what I do with utf-8 xml. You prepare your data (outside the database) using the correct byte sequences to be utf-8. Then you load it into the database using the same NLS_LANG setting as the applicaiton server will use to extract it later. Then you manipulate the data in the data base, _almost_ as normal. The _BIG_ caveat is that your manipulations must not rearrange the bytes in a way that messes up the byte sequences. In utf-8 using xml that is not as hard as it sounds. As one of many examples, attribute names can be manipulated as long as the name is always treated as a single unit, because the original byte sequence within the name will be unaltered. Then you extract the data. The database "converts" the data using the reverse of the original tranformation, but the result is simply to restore all the original byte sequences. The tool that receives the data doesn't care what charset the database thought it was sending - as long as the bytes are valid for what it thinks it is receiving.

utf-16 would be harder but the underlying principle would be the same - as long as your manipulations keep the required bytes together then the final extraction will convert the data back to the original (valid) utf-16.

$0.25 Received on Wed Jun 11 2008 - 03:31:30 CEST

Original text of this message